#1 2008-02-20 23:22:02

mzk
New member
Registered: 2008-02-20
Posts: 8

Filter delimited file

New to TOS, but I love it already !

I'm having a little trouble with a delimited file - example data:

Org Id,f1,f2
ORG001,abc,3
ORG001,ghi,5
ORG001,mno,7
     ,(bla),(bla)
Org Id,f1,f2
ORG002,abc,8
ORG002,ghi,9
ORG002,mno,4
     ,(bla),(bla)

I only want to process rows starting "ORG" (not "Org" or anything else)

The schema is like OrgId:String,F1:String,F2:Int,...

Obviously, applying the schema to rows not starting with "ORG" results in a type conversion error!

The question is can I use a component to filter the rows BEFORE TOS applies the schema WITHOUT writing a temporary file first (I can do that, but these are large files)

Any help would be much appreciated

Offline

#2 2008-02-21 03:46:32

shong
Talend team
Registered: 2007-08-29
Posts: 10294
Website

Re: Filter delimited file

Hi

Use a tFilterRow component to filter the rows.
source file:
Org Id,f1,f2
ORG001,abc,3
ORG001,ghi,5
ORG001,mno,7
     ,(bla),(bla)
Org Id,f1,f2
ORG002,abc,8
ORG002,ghi,9
ORG002,mno,4
     ,(bla),(bla)

Result:

Code:

Starting job test at 10:43 21/02/2008.
.------+---+--.
|  tLogRow_3  |
|=-----+---+-=|
|org_id|f1 |f2|
|=-----+---+-=|
|ORG001|abc|3 |
|ORG001|ghi|5 |
|ORG001|mno|7 |
|ORG002|abc|8 |
|ORG002|ghi|9 |
|ORG002|mno|4 |
'------+---+--'

Job test ended at 10:43 21/02/2008. [exit code=0]

Best regards

          shong


Uploaded Images


Email:shong@talend.com
Choose Talend, Enjoy Talend!
New & Event: Talend Help Center
Talend-->the leader of open source data management and application integration solutions!

Online

#3 2008-02-21 07:42:42

Volker Brehm
Member
Registered: 2007-04-03
Posts: 1139
Website

Re: Filter delimited file

Hello 'mzk',

welcome to TOS.

Just to show you that there are several ways to reach your goal:

Alternative you can use tFileInputRegex. With this component (and a little bit regex-know-how) you can directly check the format of your file. Lines not matching the regular expression are ignored.

Bye
Volker


Uploaded Images

Offline

#4 2008-02-21 09:36:18

mzk
New member
Registered: 2008-02-20
Posts: 8

Re: Filter delimited file

Thank you both for your responses - both methods have their pros/cons.

In my original post I over-simplified the example data - there are actually about 50 columns beyond the "ORGnnn" start field - so both the manual type conversions or regex may be a pain (and I guess not maintainable using metadata?)

Is there an equivalent to a fileDelimited component that works downstream from the start component - i.e. applies a schema to each row (string) that's passed to it?

I imagined something like:

[text line reader] -> [filter (startswith "ORG")] -> [rowDelimited] -> [tMap] -> etc

Is this possible?

Offline

#5 2008-02-21 11:49:18

Volker Brehm
Member
Registered: 2007-04-03
Posts: 1139
Website

Re: Filter delimited file

If I understand you right the version of shong suits best your need . Or are there different numbers of attributes?

Offline

#6 2008-02-21 13:40:29

mzk
New member
Registered: 2008-02-20
Posts: 8

Re: Filter delimited file

The attributes are the same for each "ORGnnn" row - I have defined a schema in the repository for this and I wanted to use that rather than hard-code the type conversion functions.

Looks like the cleanest solution is to filter the lines first into a temporary file and then use the standard delimited reader with the repository schema.

Many thanks for your help - I've learned two useful TOS methods!

Offline

Board footer

Powered by FluxBB