Post a reply

Write your message and submit

Options

Click in the dark area of the image to send your post.

Go back

Topic review (newest first)

nicolasdiogo
2011-07-06 11:13:18

using Talend is fine but databases were designed to operate over large datasets
so use the best tool for the job at hand.

try running your inner join in the database - make that your DB input and replace the previous tMap and its inputs.
sure it will be way faster.

JohnGarrettMartin
2011-07-05 17:35:50

nicolasdiogo's suggestion is a good one. If you can post some screenshots of your job (during or after it runs so we see the stats) we may be able to provide more help.

also, when it comes to preformance- dont forget the physical component of what you are doing. If you can provide us some detail on the network setup between your inputDB -->ETL server-->outputDB it will help.

Lionel203
2011-07-05 17:09:00

nicolasdiogo wrote:

apologies for the question, if it seems basic

but are these two tables in the same DB - if so could you not have a SQL statement joining them?

if not, could you tell the row count in each table?

Hello,

Yes both the tables are in the same database. The operation made by this job is a part of many others that are already done with Talend, that's why I want to make this job with Talend too, but I'm realy a novice with this tool.

tMap seems to be the most simple component to do that cause it finds the relation between each table making an inner join on the keys, and for each record, it copies data. But if there is another way or another component to make it faster, I'm interested

Thanks

nicolasdiogo
2011-07-05 16:42:04

apologies for the question, if it seems basic

but are these two tables in the same DB - if so could you not have a SQL statement joining them?

if not, could you tell the row count in each table?

Lionel203
2011-07-05 16:35:21

Hello,

I have a job that takes a lot of time to run and I'd like to optimize it but I don't know how.


The job is very easy, it consists in reading 2 tables, and copying datas of 5 columns from the first table to 5 columns of the other (where the keys are the same in both tables -> tMap)
The problem is that each column of the origin table has a size of 4000 characters (the fields are not filled with 4000 characters and the columns of the destination table have a size of 2500, without losing datas)

The component used to do the copy of datas from a table to the other is a tMap where the 'store of temp data' is activated (if not activated the job refuses to launch)

I used all the optimizations that I know : using cursor with the best values (I made many tests with different values), deleting spaces in the columns in the tOracleInput (but curiously it doesn't change performances), etc...

The job is running and takes 35 minutes
1.800.000 records are copied, and for each record, 5 fields of 4000 characters are copied

I think that the cause of the long time is the size of the columns because the machine used is a Xeon with 24 Go RAM, and Talend is configured to use all the RAM.


Is there a better way, other components to do it faster ?

Thanks

Board footer

Powered by FluxBB