#1 2012-03-15 00:04:11

fpaganel
Guest

Parallel job in ETL

Hi,
I'm a newbie developer in Talend but I have developed for two years in DataStage.
I would like replace the parallel job in Datastage with similar job in Talend.
In particular I did't find out the feature parallel.
For example: If I have two table, one is the source and other is the destination, I would like more reads in the source and more writes in the destination:

           3 reads| 3 trasformation |3 writes           
    E       ----->            T             ----->         L
source   ----->    Trasformation   ----->  destination
            ----->                            ----->
In Datastage this is possible, but I don't know if it is here in Talend


Thank you
FPaganel

#2 2012-03-15 00:26:09

shong
Talend team
Registered: 2007-08-29
Posts: 11170
Website

Re: Parallel job in ETL

Hi
You are able to execute multiple jobs/subjobs parallel in Talend, simply to do: go to job settings-->Extral panel and check the option 'Multi thread execution'.

Best regards
Shong


Email:shong@talend.com
Choose Talend, Enjoy Talend!
New & Event: Talend Help Center
Talend-->the leader of open source data management and application integration solutions!

Offline

#3 2012-03-15 11:38:00

FPaganel
Guest

Re: Parallel job in ETL

I have tried to check "Multi thread execution", but the performance are the same before and after.
Could you suggest me something else, please?

Thanks

Best regards

#4 2012-03-15 15:30:25

fpaganel
Guest

Re: Parallel job in ETL

I would like replace this feature in datastage:

Partition parallelism
When large volumes of data are involved, you can use the power of parallel
processing to your best advantage by partitioning the data into a number of
separate sets, with each partition being handled by a separate instance of the
job stages. Partition parallelism is accomplished at runtime, instead of a
manual process that would be required by traditional systems.

The DataStage developer only needs to specify the algorithm to partition the
data, not the degree of parallelism or where the job will execute. Using
partition parallelism the same job would effectively be run simultaneously by
several processors, each handling a separate subset of the total data. At the
end of the job the data partitions can be collected back together again and
written to a single data source. This is shown in following figure.
[img]http://3.bp.blogspot.com/_0KPqtEryCp8/S6JR56t_29I/AAAAAAAAAJQ/DV4Cc1P6dgY/s1600-h/1.jpg[/img]
taken from article => http://datastage-tutorials.blogspot.com … ssing.html

Thanks
Best Regards

Board footer

Powered by FluxBB