• Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » tAggregateRow performance issue - cannot process >800K rows

#1 2010-10-07 23:19:15

Peter
Member
Registered: 2010-07-29
Posts: 74

tAggregateRow performance issue - cannot process >800K rows

I am having trouble aggregating more than 800K rows. I am running on Windows XP 32-bit with 3 GB RAM, TIS 4.0.2
I have a table with 1.5M records which tAggregateRow is going through. After extracting about 900K rows I am getting Java heap error and execution aborts.
Runnign the same job on 64-bit Windows 7 with 4 GB resolves the problem but I have to make it work on 32-bit system.

As a workaround I splitted the table in two halves with around 750K rows each. tAggregateRow works fine in this scenario.

Are there any performance settings I can tune to resolve the problem?

Thank you,
Peter.

Last edited by Peter (2010-10-07 23:19:40)

Offline

#2 2010-10-08 15:26:04

JohnGarrettMartin
Member
Registered: 2009-01-07
Posts: 762

Re: tAggregateRow performance issue - cannot process >800K rows

Have you tried increasing the max heap size under the run tab under 'JVM arguments'

the default is 1024 which may be too small to hold your recordset.

Offline

#3 2010-10-08 20:21:03

Peter
Member
Registered: 2010-07-29
Posts: 74

Re: tAggregateRow performance issue - cannot process >800K rows

Yes, I have. I set JVM arguemnts to max allowed - it didn't help. However, on 64-bit, I set -Xmx to over 3 GB and it helped.

Offline

#4 2010-10-08 20:53:32

cantoine
Talend team
Registered: 2006-09-19
Posts: 715
Website

Re: tAggregateRow performance issue - cannot process >800K rows

You have some workaround ways to make it happen with your 750K rows.

If the data is already into a Database, you can use the component tELTAgregate; it would generate the SQL and the aggregate is performed by the Database itself without this memory limitation. Otherwise you have to first push the Data into one database and then do you Aggregation. For the load you can use BULK_EXEC mechanism to accelerate the Load.

One of the other solution in into our Talend Integration Suite MPX edition which involved a component called : tFSAggregate to perform aggregation on large files.

Offline

  • Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » tAggregateRow performance issue - cannot process >800K rows

Board footer

Powered by FluxBB