You are not logged in.
i am sure this discussion has happen in some other post by someone by i cannot find it on my searches.
i have fact tables comiing through that have values that should correspond to values in our lookup tables (dimensions)
as an example:
car_maker; date; total_sales
as you can see, 'ferrari' has appeared on the fact table but does not yet exist on our lookup.
thus when loading and parsing my fact table, i would end up having;
car_maker_id; date; total_sales
ideally, i would like to load 'ferrari' in the dim_car_maker (with car_maker_id = 4) and having this id in the fact table as well.
i tried searching for it in the wiki, under user cases.
but i could not find anything.
your suggestions will be much appreciated.
I'm no talend expert.
Yet that would be how I would do it :
- use facts_loaded as input
- create a tmap with dim_car_maker as lookup.
-- Join the lookup as "inner join" with "unique match". Create an output flow, and press the "purple arrow" that is, indicate that the flow you created recieve lines that fails the inner join.
-- this output flow columns should have 2 columns : "car_maker" and "car_maker_id", and name it say "new car makers". "car_maker" is the value found in the file. Id is left empty.
- at this point, "new car makers" flow might have the same "car maker" more than once (f.e. your facts has 2 lines with ferrari). So use a component to remove duplicates (I think that's the tUnique component)
- then finish with a "sql output" into dim_car_makers (make sure the car_maker_id is auto-increment in your database).
At this point, your dim_car_maker now includs the new car makers.
Now you can read your input again, and your lookup will always succeed
i have considered something similar to that - but it sort of starts to become fiddly when dealing with many dimensions.
the first tMap that check for available keys will up having an exception out put for each dimension used.
then you will have to update these dimensions and send all data through another tMap.
i have also considered scanning each dimension at the beginning of the job for their MAX() and storing into a variables (var_max_car_maker_id) value for their warehouse keys and creating sequences in tMap when the lookup value was null, like:
dim.car_maker_id != null ? dim.car_maker_id : Numeric.sequence("car_maker_id", var_max_car_maker_id, 1 )
and then as you suggested, sending these records that are not in the lookup into a separated output, through a tUnique and loading them into their respective dimension.
BUT it turns out that it does not work when other jobs try to insert into the same dimension as the warehouse key ends up being duplicated.
maybe we need components that do a similar job to these:
- http://wiki.pentaho.com/display/EAI/Dim … kup-Update
many thanks for your suggestion,