You are not logged in.
Announcement
Unanswered posts
|
Pages: 1
We are getting incredible poor performance on using the append option on the component tAdvancedOutputXML. The first XML is created from a CSV source and looks something like with one loop called Race and looks like below. The second source is a CSV that has the ID field and another loop element (reason we can't use the MSXML because it appends all loops to the bottom where we need race in the middle). The append actually appends to the top under id instead of the bottom in 4.0.3 and 4.2.3, however, in 4.2.2 it works correctly. The problem is the append runs at like .01 rows per second. We really need this append to run much quicker. Any ideas? Have tried temp directories, buffer increase, etc. Right now, running XP on 4 gig memory, 32 bit, but soon going to 64 bit, 8 gig memory. Also, need to open a bug on 4.2.3 in which the append doesn't appear to work..appends to top of student instead of the bottom like 4.2.2 does.
<Student id="SID_71346">
<StudentUniqueStateId>71346</StudentUniqueStateId>
<StudentId>764750431</StudentId>
<LocalId>71346</LocalId>
<Name>
<FirstName>Marguerite</FirstName>
<LastSurname>Lindline</LastSurname>
</Name>
<Race>
<RacialCategory>White</RacialCategory>
<RacialCategory>Hispanic</RacialCategory>
</Race>
<BirthData>
<BirthDate>2004-03-15</BirthDate>
<BirthCity>Trenton</BirthCity>
<BirthCountry>United States</BirthCountry>
<BirthState>TX</BirthState>
</BirthData>
<LimitedEnglishProficiency>Limited</LimitedEnglishProficiency>
The second XML to append looks like:
<Student id="SID_71346">
<Languages>
<Language>Spanish</Language>
<Language>English</Language>
</Languages>
</Student>
Offline
Hi
Thanks for your first post on forum!
.01 rows per second is unacceptable number, I think there must be a job design problem in the job, can you upload some screenshot of jobs? So that we could know more details on the job.
Best regards
Shong
Offline
Hi
It is a big job, so many components in the job and many columns on the schema, and you add the following expression on each column, it cut down the performance apparently.
(Disability_CSV_In.DISABILITY == null || Disability_CSV_In.DISABILITY.equals("")) ? null : Disability_CSV_In.DISABILITY
Do you really need the expression on each column?
If you have enough memory available, go to windows-->preference-->talend-->run/debug and allocate more memory to execute the job.
Best regards
Shong
Offline
From this image you can see that we do not want to create an empty element. It appears that the way Talend or java works on that check box is that if the field is null then the element doesn't print, however, if the element is empty string, then it does print the unwanted tag so you have to account for that in the tmap. Also, the expression logic is actually in the first non append xml creation that runs fast. It's only the append stream that ones slow in which it has only has one element loop tied to the student group and no expressions in tmap. Sounds like memory is the issue although I have tried unsuccessfully changing that. The pc has 4 gig but bumping run time memory causes Talend to crash. Our work around right now will be to go back to using the merge (xmlms..) even though that throws all loops elements to the bottom, so we have developed a "xml re-arranger" routine to move the loop elements to the correct location. The other possible solution will be moving to a 64 bit architecture with 8 gig memory. Let me know if anyone else has other possible solutions.
Thanks,
Jay
Offline
Pages: 1