You are not logged in.
Hello all talend community,
working with TALEND i am in trouble for some days now and i would be grateful to have help as soon as possible.
concretely, i have an issue when loading Excel files into MySql database, below iw ill describe my problem:
Excel files are loaded into Mysql DB , the structure of my BD is a star schema as described below:
-1 central table centrale containing numerical data ( like a fact table)
- many reference tables linked to the central table (dimensions) with foreign keys
In my loading process, i load monthly 2 excel files by a tMap component into a global table witch groups the data from the 2 excel files.
then, i load the BD with the global table previously loaded trough a tMap again,using following rules:
a) 'insert ignore' option and 'fileds options ' in each table with the following rule:
- if the data is already existing , the reference table is not loaded with this information
- if the data is not existing in the monthly loading then a new row is inserted in the target table
b) Loading all data systematically, in the central (fact) because the numerical dat ahas to be historized.
One of the reference table called "BU" regroups the business unit, but there a group information that is not defined in the input source excel files.
Semantically there are 2 modes of group:
- the group according to business unit (ex: from 1 month to anither 2 business units maybe regrouped into a new business unit)
- the group according geographical situation (north, south,...)
To implement this , i decided to create an excel files with 3 colomns:
1 column for the BU name
1 column for the group name (if the business unit still the same, then the group will be the business unit itself)
1 column for the group name according to geography (north, south,)
Question: Is my implemtantion correct according the sematical definition i gave above ? if not, how do you think i can implement it?
a rule says that a "business unit" can change group(can be regouped or attached to other geographical region) fom one year to another. My idea is to change the data by the user in the excel file containing the groups mapping.
To implement this change in the BD at the table level, i wanted to use the component tMySqlSCD.
Question: this choice is it adequate for this case?
ISSUE 3 : loading if the BU table
1) how can i load "bu" table from my 2 excel files and the new groups mapping excel file by having a historised mapping in "bu" table ?
Here is the actual flow:
a) tMysqlinput(1 & 2 )----->tMap--------tMysqloutput (global table)
b) tMysqlinPut(global table)-------->tMap------------>reference tables(including "bu" table)
c) tMysqlinPut(global table)-------->tMap------------>central table (fact table)
2) if tMysqlSCD choice is right where can i put this component ?
3) if i use tMysqlSCD, any change will put the old row as 'inactive', can i despite this make queries using olds mappings (not active ?)
For Example: the user wants to compare data for groupe A for 2012 with data for groupe A for 2011 (the group A contains differents business units in 2011 and 2012)
Sorry if i was long , but i thought it is necessary to explain in detail to avoid any misunderstanding.
than in advance for your help
Welcome to Talend Community!
Question 1: To make it clear, could you show me the source data and your expected data? Then I can create a job for you. Try to simplify your business logic into some examples.
Question 2: Yes. tMysqlSCD is a good choice for your requirement. Save all output data into tMysqlSCD, the end of this job.
Question 3: There are several methodologies for SCD. You might try Type 3. Download document [TalendOpenStudio_Components_RG_50b_EN] here and get instructions about tMysqlSCD and SCD methodologies.
thank you for your answer pedro,
Just for you to have a general overview , i will send u all the job i am using to load my DB ,a sample of one of the input excel files(followup) , and the DB schema.
As said in my previous post ,i have 2 excel input files. The 2nd input excel file is called KPI and it is almost the same with "followup" the only difference is just that it has additional columns.
The 2 files are loaded (using tMap and a common column) into a table called "unit_kpi_follwoup" using the job called "unit_kpi_followup_21mai".
Then the next 2 jobs are loading the DB:
-charg_DW_1_21mai: loading of the reference tables including the "Unit" table witch is giving me problems.
-charg_DW_2_21mai; loading of the "detailsprojet" table witch is the central table
Actually the table "unit" has only 2 columns :'UnitCOde' and 'UnitName'.
The new requirement is to append new information to the unit related to the group witch it is attached to. For this purpose i created a new excel file with the mapping (see attached mapping.xls) and i want the information in this file to be loaded in the "unit" table.
My problem is i don't know how to implement this to have in the "unit" table all the information: 'UnitCode','unitname','bu_group','bu_geo_grop', 'startdate', 'enddate'
I hope its easy understandeable
thank you in advance
Please point out whether I understand it right or not.
Table 'unit': two columns 'UnitCode' and 'UnitName'.
New excel file: three columns 'UnitCode', 'bu_group',,'bu_geo_grop'
Now you want to get a new table with columns 'UnitCode','unitname','bu_group','bu_geo_grop', 'startdate', 'enddate'
I think you can simply inner join Table 'unit' and this excel file with the tMap component.
Uhhh. I think I misunderstand you again.
But how can I get values of 'startdate', 'enddate'?
what i want is that any time the user changes an information in the new excel file(for example a group), automatically TALEND inserts a row in the new table with 'startdate' as the change date and 'enddate' ŗ limit date like '2099' for example.
TO summarize i want to simulate the tMsqoutputSCD component, but from what i see i cannot use this component because my DB is a relationnal DB not a star schema. I read that this component is used with dimensions (star schema).