• Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » Metadata -> File Delimited - Problem with large files?

#1 2008-03-04 23:06:32

bubbles
Member
Registered: 2008-01-22
Posts: 12

Metadata -> File Delimited - Problem with large files?

Hi,

I went through the process of creating a Metadata -> File Delimited schema.

The file I selected is a tab delimited 52.9 MB file and contains just under 99,8000 records/lines. When I click next on Step 3 of 4, Talend just sits there and I eventually have to kill the process. If however, I limit the number of lines to 100 Talend I do get to step 4 of 4.

Is there a limit on the size of the file I can select to create a schema from ?

Thx
bubbles..

Offline

#2 2008-03-04 23:42:08

plegall
Member
Registered: 2006-09-19
Posts: 1586
Website

Re: Metadata -> File Delimited - Problem with large files?

Can you confirm that you have 99,800 fields on each line of your tab delimited file? It really sounds a lot and I don't remember any test with such a huge schema.

Reading your previous topics, I think you're using a Java project, it's a good thing to remind it when you ask a question. On this kind of operation, Perl and Java won't behave the same.

Offline

#3 2008-03-05 07:26:38

Volker Brehm
Member
Registered: 2007-04-03
Posts: 1139
Website

Re: Metadata -> File Delimited - Problem with large files?

Hello bubbles,

I've sometimes problem with the execution time of a "guess schema" in the Metadata section to (primary with xml). As a workaround i reduce the file to a limit number of lines myself and process them. In job execution I use the full file (without problems).

In fairness I've to say that opening my (large) files in Notepad or UltraEdit let them in special cases crash too.

Bye
Volker

Offline

#4 2008-03-05 14:36:42

bubbles
Member
Registered: 2008-01-22
Posts: 12

Re: Metadata -> File Delimited - Problem with large files?

The file contains 99,800 records not fields, one of which is a header record. The individual lines contain around 100 fields.

It is a Java project.

So is it typical to use a subset of the file in order to get thru the metadata setup ?

Offline

#5 2008-03-05 14:49:30

javydreamercsw
Member
Registered: 2007-08-09
Posts: 83

Re: Metadata -> File Delimited - Problem with large files?

I'm not aware of how does Talend get the metadata info but it looks like querying a select * from table in order to get the metadata afterwards when you just need one record.

That's the only reason I see for the hellish wait time as this happens also on databases. For delimited files I suggest that a sample file is used instead since I don't think is possible to get one row for this purposes.

Offline

#6 2008-03-05 16:45:05

pegaz
Member
Registered: 2008-03-05
Posts: 34

Re: Metadata -> File Delimited - Problem with large files?

hi everyone,
i'm trying to compare two XML files that have the same structure. My question is: How can i do to show the difference between the two XML files. Any solutions?
Best regards.

Pegaz

Offline

#7 2008-03-05 16:50:29

cahsohtoa
Member
Company: AEFE
Registered: 2008-02-19
Posts: 261
Website

Re: Metadata -> File Delimited - Problem with large files?

Are your data sorted?

Offline

  • Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » Metadata -> File Delimited - Problem with large files?

Board footer

Powered by FluxBB