You are not logged in.
Announcement
Unanswered posts
|
Pages: 1
I've built a simple xml processing job in version 2.2.4:
Gets 7 nodes values from an XML of about 1.4MB. In version 2.2.4 it runs in no time processing about 200 rows/sec according to the stats.
I've tried also to change the memory settings to match the ones of the previous version: no success either.
Downloaded the latest version re-done the job element by element... improved the process a little bit and surprise! It takes for ever to run. I check the stats again and it now the performance it's 8 rows/sec... Tried to modify the settings, deleted some elements, did some quadruple checks... still the same.
Finally, imported the 2.2.4's workspace into version 2.3.1. Same performance: 6-8 rows/sec. Back on the 2.2.4 still 200 rows per sec...
So... It's version 2.3.1 a stable, production version? If so is there any xml related library you have changed from version 2.2.4 to 2.3.1/2.3.0 that could affect performance so bad? Is there any work around for this?
By the way I've done the same testing with version 2.3.0 too and still 8 rows/sec.
Thank you in advance.
Last edited by Petrutz (2008-02-26 19:13:19)
Offline
UPDATE: it's actually 100+ time worst...
I've just downloaded the latest 2.2.4 version and loaded the job that was running at 8 rows per second. I get now 807 rows per second completing 2271 rows in 2.81 sec.
Offline

Hello Petrutz,
are you sure that tFileInputXML is the problem? I had yesterday a performance problem too after removing component for component I found tUnite as the "problem-maker". If I remember right this is a known problem (for large files).
Do you have any more information for us?
Bye
Volker
Offline
Hello Volker,
I'm a Talend noob so not really sure it's because of tFileInputXML but it looks like that...
Basically what the process does is:
1. getting a list of values from a database
2. saves current value to the context variable (i just need to pass that variable to the tFileFetch)
3. fetches the a file and saves it to a filename that includes that variable
4. parses the xml file with the xpath expressions
5. does some transformation with the numbers
6. writes the records into the database
The thing is that disabling all the components before tFileInputXML the performance it's just the same: 8 rows/s
Now running this into the 2.2.4 version goes at 800 rows/sec.
Is there any mean to detect a bottleneck apart from enabling statistics?
I attach a screen shot of the job.
Thank you very much!
Offline

Hello,
I agree with you tFileInputXML is slower than it was.
I do not have the same performance gap than you have (only x2) but I have open [Bugtracker, bug 3205, fixed] Enhance tFileInputXML speed.
We are going to work on this problem as soon as possible.
Thanbks for your support,
Offline
mhirt wrote:
Hello,
I agree with you tFileInputXML is slower than it was.
I do not have the same performance gap than you have (only x2) but I have open [Bugtracker, bug 3205, fixed] Enhance tFileInputXML speed.
We are going to work on this problem as soon as possible.
Thanbks for your support,
I have done some more testing in the latest hours and there are some interesting observations:
- Performance degradation seems to be proportional to the ammount of processed nodes. It starts processing at 200 rows/sec, it goes abruptly to around 50 rows/sec then it slowly degrades until 8 rows/sec.
- Performance difference seems to be dependent on the structure of the file as for similar sized xml files (around 10MB) but with different number of levels/nodes it seems to perform very different: in one case it goes at 50 rows/sec in another one at 8 rows/sec.
What XML libraries are you exactly using in each of the releases? Could you also specify version numbers please.
Thank you!
Offline

I've the same effect with another job and a csv-files starting with a high throughput and then slow down every row. I haven't found the time to analyze the problem in detail until now. I'll give feedback if I've a result.
Offline

hi everyone,
i'm using a tFileInputXML and realize that performance decreases as well as i have so many rows in my XML file.
...starting at 116 rows/s, decreases and goes to around 5 rows/s.
any solution??
Offline
Pages: 1