• Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » Efficiently load and compare data

#1 2012-05-13 05:38:54

williamwang
New member
Registered: 2012-05-10
Posts: 3

Efficiently load and compare data

Hi,
This is what I want to achieve.
I load in a set of csv files and try to find out if any row exists in one of the tables (LOOKUP TABLE) in MYSQL.
Presume the lookup table has millions of rows, if I load everything using tMySQLInput, I am worried about the performance. Is that possible for me to feed the csv file entries as part of the QUERY string and let MySQL do the process which might be more efficient? If yes, how do I do that?

It will also be much appreciated if any one can suggest different ways of doing this.
Please see my solution attached.

Thank you very much.

William


Uploaded Images

Offline

#2 2012-05-14 04:32:06

pedro
Member
Registered: 2011-11-17
Posts: 3682

Re: Efficiently load and compare data

Hi William

Without considering this job logic, for large tables, you might use 'Store on disk' feature of tMap and 'Enable stream' of tMysqlInput for better performance.

Besides, I know you want to feed the csv file entries to narrow down the rows from mysql.
But it might be a little difficult and cost extra memory.
If you can give the range of ids directly, it will be fine.

Regards,
Pedro


Only Paranoid Survive.

Offline

#3 2012-05-14 04:53:51

williamwang
New member
Registered: 2012-05-10
Posts: 3

Re: Efficiently load and compare data

Thank you very much for your reply, Pedro.
When you say give a range of IDs directly, do you mean I build a SELECT QUERY dynamically?
If yes, what components should I use? I believe tMySQLInput will be required with a WHERE clause.
How can I construct or build the WHERE clause in the query as a parameter? Do I have to use Context?
Thanks.

Regards,
William

pedro wrote:

Hi William

Without considering this job logic, for large tables, you might use 'Store on disk' feature of tMap and 'Enable stream' of tMysqlInput for better performance.

Besides, I know you want to feed the csv file entries to narrow down the rows from mysql.
But it might be a little difficult and cost extra memory.
If you can give the range of ids directly, it will be fine.

Regards,
Pedro

Offline

#4 2012-05-14 04:57:53

pedro
Member
Registered: 2011-11-17
Posts: 3682

Re: Efficiently load and compare data

Hi William

I mean maybe all entries from csv files are between 1 and 1000000.
So we can edit the query of tMysqlInput.

Code:

Select * from table where id between 1 and 1000000

Regards,
Pedro


Only Paranoid Survive.

Offline

#5 2012-05-14 08:50:36

saukema
Member
Company: XSed
Registered: 2011-09-14
Posts: 127
Website

Re: Efficiently load and compare data

I remember that we once did a compare by dumping the MySQL table data to disk an make a compare on OS level

Offline

  • Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » Efficiently load and compare data

Board footer

Powered by FluxBB