We are having around 75 millions of rows in our source Redshift based table and planning to load into Mysql db table ,
could some one please suggest a better design to load this in minutes through talend job as currently its taking long time even if we tried the source data in to chunks to process at a time (for ex:- process 400000 rows at a time and loop through remaining chunks).
I need to first update existing (if there is a any changes in source) else insert new rows.
Both update and Insert operations being performed using sql statements in tmysqlrows instead of tmap lookups etc...Still job is taking long time.
It will be very useful if anyone suggest the best approach to deal millions of rows to update or insert logic.
Talend is enterprie dataintgeration edition 6.2.1 that we are using.
You need to give us more info.
1) How fast is it at the moment?
2) How fast do you want it?
3) Are the job, Redshift and mysql in the same environment or does the data have to cross the internet?
4) Are indexes (in MySQL) used? Can they be switched off for the load?
You say you want 75,000,000 rows to be processed in minutes. You do realise that for this to be done in 1 hour (for example) would require 20833 rows per second. While that is certainly not unachievable, it would still be hard in an hour using a single MySQL instance in a remote location to the Redshift box and the job.