You are not logged in.
Announcement
Unanswered posts
|
Pages: 1
I can't figure out how to design a simple job in an efficient way. Let say there are two (very large!) table
person
--------
ID
NAME
AGE
phones
--------
PERSON_ID
PHONE
I need to process part of the data (for example WHERE AGE = 20) and output it into another table like this:
personDenorm
-----------------
ID
NAME
PHONE_LIST
where the phone_list field will contain a delimited list of all phone numbers found for this person in the phones table
I was able to do this by using a tMap with the reload at each row lookup (phone table) and a globalMap key; and a tAggregateRow. But this is slow, as it executes a query for each person.
In a pure Java app I would write two queries:
SELECT * FROM PERSON WHERE AGE = 20
SELECT * FROM PHONES WHERE PERSON_ID IN (...)
The second query may be executed several times if number of loaded persons is larger than 1000 (I'm using Oracle)
How should I do this with Talend?
Offline
Hello
Your jobs looks like:
tOracleInput(select ID,NAME from person where AGE=20)--main---tMap(do a inner join based on ID and PERSON_ID, output ID,NAME,PHONE_LIST)--tOracleOutput
|
lookup
|
tOracleInput(select* from phones)--tDenormalize(denormalize PHONE column with a delimiter)
Best regards
shong
Offline
Hello
Does table person and phone exist in the same db?
Best regards
shong
Offline
Hello
But I don't want to load the entire phones table into memory, is there any other way?
You can modify the job like this:
tOracleInput(select ID,NAME from person where AGE=20)--main---tMap(do a inner join based on ID and PERSON_ID, output ID,NAME,PHONE_LIST)--tOracleOutput
|
lookup
|
tOracleInput(select* from phones where PERSON_ID IN (SELECT * FROM PERSON WHERE AGE = 20) )--tDenormalize(denormalize PHONE column with a delimiter)
So now, you will not load all the phone records into memory.
Best regards
shong
Offline
Pages: 1