#1 2008-06-10 09:10:26

suzchr
Member
Registered: 2008-04-30
Posts: 147

Performance on a job

Hi,
For me, Talend is a very good solution but now, I try to do performance tests. And, my results are disastrous.
I join a screen of my job which discribe my business rules.

My project is to replace a loader done with Access.

With my current loader, the execution time of this job is 4minutes and 30 seconds.
With Talend, the execution time of this job is 1 hour and 28 minutes.

How can I improve the performance ?

Thx.


Uploaded Images

Last edited by suzchr (2008-06-11 11:42:31)

Offline

#2 2008-06-10 09:22:59

maverick
Member
Company: 1913
Registered: 2008-03-27
Posts: 71
Website

Re: Performance on a job

Hi,

Some questions :
- How many rows are treated by your job ?
- Do you launch routines in your tMap ?
- Your databases are local or on network ?

And an experience :
Before launching the job, check the "Statistics" box to see where the dataflow is slow


Matthieu Garde
Software development
1913 - L'expertise des PME
www.1913.fr

Offline

#3 2008-06-10 09:36:38

suzchr
Member
Registered: 2008-04-30
Posts: 147

Re: Performance on a job

There are 421 000 rows on my job.
Yes I launch some routines in my two tmap but not complex.

My database Oracle is on network and access is local.
It's exactly the same configuration that my other loader.

More precisions : on my job the tAccessInput and the tAccessOutput is the same database Access.

Offline

#4 2008-06-10 09:42:45

maverick
Member
Company: 1913
Registered: 2008-03-27
Posts: 71
Website

Re: Performance on a job

Before launching the job, check the "Statistics" box to see where the dataflow is slow
With this, you'll be able to see where things are slow (is it on oracle or access ??)

Once you have check with statistics, try to separate access input and output in two files


Matthieu Garde
Software development
1913 - L'expertise des PME
www.1913.fr

Offline

#5 2008-06-10 09:51:45

suzchr
Member
Registered: 2008-04-30
Posts: 147

Re: Performance on a job

Show the statistics is not very interisting because the stat are identical in all the job about 120 rows/seconds

Offline

#6 2008-06-10 09:55:32

maverick
Member
Company: 1913
Registered: 2008-03-27
Posts: 71
Website

Re: Performance on a job

it is interesting, it shows that dataflow is slowing at the entry point : your oracle database on network
Could you show a screenshot of your tOracleInput component's configuration ?


Matthieu Garde
Software development
1913 - L'expertise des PME
www.1913.fr

Offline

#7 2008-06-11 11:42:10

suzchr
Member
Registered: 2008-04-30
Posts: 147

Re: Performance on a job

Precision :
Oracle is connect by ODBC. I try by JDBC and it's about the same perfs.
Access is local : my file is on my computer.
Oracle is distant is on my network.

I add a screen of my param in my first post. See top.

Offline

#8 2008-06-11 12:51:18

mhirt
Talend team
Registered: 2006-09-19
Posts: 1635

Re: Performance on a job

Talend Open Studio generate Java or Perl code.
None of these language manage Access database natively.
Java Access DB Components communicate through an ODBC Bridge.
It will never be as fast as a native connexion or as fast as a real JDBC connexion.

Generally speaking, Input is not the problem. Writing is always longer...
That's why when you change tOracleInput, to ODBC it doesn't change the results.

You can try to tweak Avanced Settings / Autocommit value in your tAccessOutput : increase this value.

HTH,

Offline

#9 2008-06-11 13:35:34

maverick
Member
Company: 1913
Registered: 2008-03-27
Posts: 71
Website

Re: Performance on a job

For me, it's much faster to read/write an SQL database than a flat file (I can reach 40000 rows/sec against 3000 for a flat file)
All my jobs use Java


Matthieu Garde
Software development
1913 - L'expertise des PME
www.1913.fr

Offline

#10 2008-06-11 22:57:44

mhirt
Talend team
Registered: 2006-09-19
Posts: 1635

Re: Performance on a job

I only said that reading is faster than writing.

About writing to file, on my laptop with a quite slow hard disk, I can easily go up to 75000 rows per second.
My Input file has 1 000 000 rows and 11 columns with different data types (Integer, String, and Date).
I write to a simple tFileOuptutDelimited without any CSV options...
My only "special" configuration is to temporary disable my Antivirus.

Can you give me more details about your own tests ?

Offline

#11 2008-06-11 23:28:22

SMaz
Member
Registered: 2008-04-21
Posts: 186
Website

Re: Performance on a job

Maybe I'm missing something - sorry to intrude. 

However, it was mentioned earlier that one of the databases in Oracle, yet I see no tOracle components in the job. How come?

Offline

#12 2008-06-12 00:21:50

suzchr
Member
Registered: 2008-04-30
Posts: 147

Re: Performance on a job

You don't see tOracleInput because I use the component ODBC to connect to the data warehouse Oracle. The tInput component is DWH on my screen.

For Mhirt : You can load at 75000 rows/s on what type of database ? Oracle ? You display the statistics to see the performance or not ?

Thx

Offline

#13 2008-06-12 00:37:20

mhirt
Talend team
Registered: 2006-09-19
Posts: 1635

Re: Performance on a job

sorry suzchr, I get 75000 rows / s with file to file. My message was for Maverick (he is limited to only 3000 rows persecond and I don't understand why)

For Databases, the best performance are obtained with bulk components (not available for Access).
Otherwise, it's mainly relative to Autocommit tweaking

In Java you can show statistics, it don't affect much performance.
In Perl, it has more impacts..

HTH,

Offline

#14 2008-06-12 10:01:57

maverick
Member
Company: 1913
Registered: 2008-03-27
Posts: 71
Website

Re: Performance on a job

I'll checked again.
I was exagerating with 2000 rows/sec :s
It's 7000 rows/sec when reading from a delimited flat file with following specs :
- Number of rows : 7 000 000
- Number of columns : 12
- My job write to an excel file, If I write to a delemited file, the number of rows/sec is growing to 15000
- HDD speed : 7200 RPM
- I have an antivirus, but I cant disable it (SBS behind smile)

But nevermind, I dont have any problem with this smile


Matthieu Garde
Software development
1913 - L'expertise des PME
www.1913.fr

Offline

#15 2008-06-12 10:48:10

suzchr
Member
Registered: 2008-04-30
Posts: 147

Re: Performance on a job

mhirt, I have a question for you ! I see that your status is Talend Team. Do it significate that you work for Talend company ?

Offline

#16 2008-06-12 10:52:39

suzchr
Member
Registered: 2008-04-30
Posts: 147

Re: Performance on a job

I have an other question according to my job. To improve performance, I need to modify the commit on tAccessOutput. However I don't know if it's better with a big commit (each 20000 rows for example) or a little commit (each 10 rows for example).
I use an computer with 1Go of ram memories and my process write 422188 rows in my database Access.

Offline

#17 2008-06-12 15:28:11

suzchr
Member
Registered: 2008-04-30
Posts: 147

Re: Performance on a job

Somebody know how the commit is done if I write 0 like value in commit every ?

Offline

#18 2008-06-13 00:41:03

mhirt
Talend team
Registered: 2006-09-19
Posts: 1635

Re: Performance on a job

suzchr,

I have a question for you ! I see that your status is Talend Team. Do it significate that you work for Talend company ?

Yes I'm working for Talend ! :-)

However I don't know if it's better with a big commit (each 20000 rows for example) or a little commit (each 10 rows for example).

In general, it's better with a big "commit every" value, but it not as simple as that.
You may have better performance with a commit every of 40000 than with a comit every of 50000.
You have to make tests to find the better value.

Somebody know how the commit is done if I write 0 like value in commit every ?

With 0 or empty, there won't be any commit at all.

HTH,

Offline

#19 2008-06-13 10:44:08

suzchr
Member
Registered: 2008-04-30
Posts: 147

Re: Performance on a job

Thank you for all your answers !
I realise benchmark in my job and after I will give my results.
My first impression is with Access the most efficiant is to commit every 1 values. It's rare but in my case it's like this.

Offline

#20 2008-06-16 14:37:38

suzchr
Member
Registered: 2008-04-30
Posts: 147

Re: Performance on a job

So I am realising my benchmark and the result are not good...

In fact I realize two types of benchmark. The first is the job complete and the best time is get with a commit value on the tAccessOutput at 10. The best time is 22 minutes versus 9 minutes with my loader in Access.

Then, I create the same job without write on Access (I delete the tComponentOutput). The time is 4min 40 seconds. This is very good.

Then, I create a job where I just write on Access. I write 500 000 lines generated by the tRowGenerator. The best time is get by a commit value at 125 000. This best time is 5 minutes 39 secondes. This is also efficient.

All in all, I create two job one which extract only the data and finish by the tBufferOutput component and an other which get the data of the job and write on Access. But the performance are bad. After two hours I have just write 125 000 rows.


How can I do to improve my performance ? Someone has a good idea ?

Offline

#21 2008-06-16 15:03:31

catounz
New member
Registered: 2008-05-26
Posts: 7

Re: Performance on a job

Hi suzchr,

There is a special reason using a tBufferOutput ? Can't you load directly in Access instead of loading in the buffer first ?

Offline

#22 2008-06-16 15:39:08

suzchr
Member
Registered: 2008-04-30
Posts: 147

Re: Performance on a job

Yes, it's done but the best time get is 22 minutes. I try to use 2 jobs to improve this time.
After my test I can say that when you write on Access, the best commit value is 125 000 rows, but if I write on Access in the same job that I read in my data warehouse, with a commit of 125 000 rows the time is 6h 40minutes...

Offline

#23 2008-06-16 15:43:29

suzchr
Member
Registered: 2008-04-30
Posts: 147

Re: Performance on a job

For help I add two screen shot which shows the two jobs.


Uploaded Images

Offline

#24 2008-06-16 16:01:07

GéomatiLux
Member
Registered: 2007-12-24
Posts: 19

Re: Performance on a job

Hi suzchr,

Try to test with only one tMap,it's accept many input and output ?
you may put a better Expression key to improve performence.

Regards


“About eighty percent of all data stored in corporate databases has a spatial component” [Franklin 1992]

Offline

#25 2008-06-16 16:12:35

suzchr
Member
Registered: 2008-04-30
Posts: 147

Re: Performance on a job

I don't have a key define on my schema. Do you think if I define a key I improve the performance ?

Offline

Board footer

Powered by FluxBB