You are not logged in.

Unanswered posts

Important! This site has been replaced. All content here is read-only. Please visit our brand-new community at We look forward to hearing from you there!

#1 2012-03-15 00:04:11


fpaganel said:

Parallel job in ETL

I'm a newbie developer in Talend but I have developed for two years in DataStage.
I would like replace the parallel job in Datastage with similar job in Talend.
In particular I did't find out the feature parallel.
For example: If I have two table, one is the source and other is the destination, I would like more reads in the source and more writes in the destination:

           3 reads| 3 trasformation |3 writes           
    E       ----->            T             ----->         L
source   ----->    Trasformation   ----->  destination
            ----->                            ----->
In Datastage this is possible, but I don't know if it is here in Talend

Thank you

#2 2012-03-15 00:26:09

Talend Team

shong said:

Re: Parallel job in ETL

You are able to execute multiple jobs/subjobs parallel in Talend, simply to do: go to job settings-->Extral panel and check the option 'Multi thread execution'.

Best regards
Choose Talend, Enjoy Talend!
New & Event: Talend Help Center
Talend-->the global leader of open source data management and application integration solutions!


#3 2012-03-15 11:38:00


FPaganel said:

Re: Parallel job in ETL

I have tried to check "Multi thread execution", but the performance are the same before and after.
Could you suggest me something else, please?


Best regards

#4 2012-03-15 15:30:25


fpaganel said:

Re: Parallel job in ETL

I would like replace this feature in datastage:

Partition parallelism
When large volumes of data are involved, you can use the power of parallel
processing to your best advantage by partitioning the data into a number of
separate sets, with each partition being handled by a separate instance of the
job stages. Partition parallelism is accomplished at runtime, instead of a
manual process that would be required by traditional systems.

The DataStage developer only needs to specify the algorithm to partition the
data, not the degree of parallelism or where the job will execute. Using
partition parallelism the same job would effectively be run simultaneously by
several processors, each handling a separate subset of the total data. At the
end of the job the data partitions can be collected back together again and
written to a single data source. This is shown in following figure.
taken from article => … ssing.html

Best Regards

Board footer

Talend Contributor Agreement - Talend Website Privacy Policy