#1 2009-09-01 03:29:12

yrb
New member
Registered: 2009-09-01
Posts: 4

Please help a beginner

I can't figure out how to design a simple job in an efficient way. Let say there are two (very large!) table

person
--------
ID
NAME
AGE

phones
--------
PERSON_ID
PHONE

I need to process part of the data (for example WHERE AGE = 20) and output it into another table like this:
personDenorm
-----------------
ID
NAME
PHONE_LIST

where the phone_list field will contain a delimited list of all phone numbers found for this person in the phones table

I was able to do this by using a tMap with the reload at each row lookup (phone table) and a globalMap key; and a tAggregateRow. But this is slow, as it executes a query for each person.

In a pure Java app I would write two queries:

SELECT * FROM PERSON WHERE AGE = 20
SELECT * FROM PHONES WHERE PERSON_ID IN (...)

The second query may be executed several times if number of loaded persons is larger than 1000 (I'm using Oracle)

How should I do this with Talend?

Offline

#2 2009-09-01 05:20:00

shong
Talend team
Registered: 2007-08-29
Posts: 10350
Website

Re: Please help a beginner

Hello
Your jobs looks like:
tOracleInput(select ID,NAME from person where AGE=20)--main---tMap(do a inner join based on ID and PERSON_ID, output ID,NAME,PHONE_LIST)--tOracleOutput
                                                                                                  |
                                                                                                 lookup
                                                                                                  |
                                    tOracleInput(select* from phones)--tDenormalize(denormalize PHONE column with a delimiter)

Best regards

          shong


Email:shong@talend.com
Choose Talend, Enjoy Talend!
New & Event: Talend Help Center
Talend-->the leader of open source data management and application integration solutions!

Offline

#3 2009-09-01 05:57:33

yrb
New member
Registered: 2009-09-01
Posts: 4

Re: Please help a beginner

Thanks!

But I don't want to load the entire phones table into memory, is there any other way?

Offline

#4 2009-09-01 06:50:21

shong
Talend team
Registered: 2007-08-29
Posts: 10350
Website

Re: Please help a beginner

Hello
Does table person and phone exist in the same db?

Best regards

          shong


Email:shong@talend.com
Choose Talend, Enjoy Talend!
New & Event: Talend Help Center
Talend-->the leader of open source data management and application integration solutions!

Offline

#5 2009-09-01 07:36:28

yrb
New member
Registered: 2009-09-01
Posts: 4

Re: Please help a beginner

yes

Offline

#6 2009-09-01 18:08:56

yrb
New member
Registered: 2009-09-01
Posts: 4

Re: Please help a beginner

shong, were you going to say something? smile

Offline

#7 2009-09-02 04:17:02

shong
Talend team
Registered: 2007-08-29
Posts: 10350
Website

Re: Please help a beginner

Hello

But I don't want to load the entire phones table into memory, is there any other way?

You can modify the job like this:
tOracleInput(select ID,NAME from person where AGE=20)--main---tMap(do a inner join based on ID and PERSON_ID, output ID,NAME,PHONE_LIST)--tOracleOutput
                                                                                                  |
                                                                                                 lookup
                                                                                                  |
                                    tOracleInput(select* from phones where PERSON_ID IN (SELECT * FROM PERSON WHERE AGE = 20) )--tDenormalize(denormalize PHONE column with a delimiter)

So now, you will not load all the phone records into memory.

Best regards

          shong


Email:shong@talend.com
Choose Talend, Enjoy Talend!
New & Event: Talend Help Center
Talend-->the leader of open source data management and application integration solutions!

Offline

Board footer

Powered by FluxBB