You are not logged in.

Unanswered posts



Important! This site has been replaced. All content here is read-only. Please visit our brand-new community at https://community.talend.com/. We look forward to hearing from you there!



#1 2017-03-01 15:21:58

KMRX812
Member
1 post

KMRX812 said:

Design to load millions of rows in minutes

Tags: [database, mysql]

Hi Everyone,
We are having around 75 millions of rows in our source Redshift based table and planning to load into Mysql db table ,
could some one please suggest a better design to load this in minutes through talend job as currently its taking long time even if we tried the source data in to chunks to process at a time (for ex:- process 400000 rows at a time and loop through remaining chunks).
I need to first update existing  (if there is a any changes in source) else insert new rows.
Both update and Insert operations being performed using sql statements in tmysqlrows instead of tmap lookups etc...Still job is taking long time.
It will be very useful if anyone suggest the best approach to deal millions of rows to update or insert logic.

Talend is enterprie dataintgeration  edition 6.2.1 that we are using.

Thanks,
kmrx

Offline

#2 2017-03-01 16:43:42

rhall_2.0
Member
1251 posts

rhall_2.0 said:

Re: Design to load millions of rows in minutes

You need to give us more info.
1) How fast is it at the moment?
2) How fast do you want it?
3) Are the job, Redshift and mysql in the same environment or does the data have to cross the internet?
4) Are indexes (in MySQL) used? Can they be switched off for the load?

You say you want 75,000,000 rows to be processed in minutes. You do realise that for this to be done in 1 hour (for example) would require 20833 rows per second. While that is certainly not unachievable, it would still be hard in an hour using a single MySQL instance in a remote location to the Redshift box and the job. 


rilhiaSolutionsLogo_0.png

Offline

Board footer

Talend Contributor Agreement - Talend Website Privacy Policy