Post a reply

Write your message and submit

Options

Click in the dark area of the image to send your post.

Go back

Topic review (newest first)

cahsohtoa
2008-03-05 16:50:29

Are your data sorted?

pegaz
2008-03-05 16:45:05

hi everyone,
i'm trying to compare two XML files that have the same structure. My question is: How can i do to show the difference between the two XML files. Any solutions?
Best regards.

Pegaz

javydreamercsw
2008-03-05 14:49:30

I'm not aware of how does Talend get the metadata info but it looks like querying a select * from table in order to get the metadata afterwards when you just need one record.

That's the only reason I see for the hellish wait time as this happens also on databases. For delimited files I suggest that a sample file is used instead since I don't think is possible to get one row for this purposes.

bubbles
2008-03-05 14:36:42

The file contains 99,800 records not fields, one of which is a header record. The individual lines contain around 100 fields.

It is a Java project.

So is it typical to use a subset of the file in order to get thru the metadata setup ?

Volker Brehm
2008-03-05 07:26:38

Hello bubbles,

I've sometimes problem with the execution time of a "guess schema" in the Metadata section to (primary with xml). As a workaround i reduce the file to a limit number of lines myself and process them. In job execution I use the full file (without problems).

In fairness I've to say that opening my (large) files in Notepad or UltraEdit let them in special cases crash too.

Bye
Volker

plegall
2008-03-04 23:42:08

Can you confirm that you have 99,800 fields on each line of your tab delimited file? It really sounds a lot and I don't remember any test with such a huge schema.

Reading your previous topics, I think you're using a Java project, it's a good thing to remind it when you ask a question. On this kind of operation, Perl and Java won't behave the same.

bubbles
2008-03-04 23:06:32

Hi,

I went through the process of creating a Metadata -> File Delimited schema.

The file I selected is a tab delimited 52.9 MB file and contains just under 99,8000 records/lines. When I click next on Step 3 of 4, Talend just sits there and I eventually have to kill the process. If however, I limit the number of lines to 100 Talend I do get to step 4 of 4.

Is there a limit on the size of the file I can select to create a schema from ?

Thx
bubbles..

Board footer

Powered by FluxBB