You are not logged in.
New to TOS, but I love it already !
I'm having a little trouble with a delimited file - example data:
I only want to process rows starting "ORG" (not "Org" or anything else)
The schema is like OrgId:String,F1:String,F2:Int,...
Obviously, applying the schema to rows not starting with "ORG" results in a type conversion error!
The question is can I use a component to filter the rows BEFORE TOS applies the schema WITHOUT writing a temporary file first (I can do that, but these are large files)
Any help would be much appreciated
Use a tFilterRow component to filter the rows.
Starting job test at 10:43 21/02/2008. .------+---+--. | tLogRow_3 | |=-----+---+-=| |org_id|f1 |f2| |=-----+---+-=| |ORG001|abc|3 | |ORG001|ghi|5 | |ORG001|mno|7 | |ORG002|abc|8 | |ORG002|ghi|9 | |ORG002|mno|4 | '------+---+--' Job test ended at 10:43 21/02/2008. [exit code=0]
welcome to TOS.
Just to show you that there are several ways to reach your goal:
Alternative you can use tFileInputRegex. With this component (and a little bit regex-know-how) you can directly check the format of your file. Lines not matching the regular expression are ignored.
Thank you both for your responses - both methods have their pros/cons.
In my original post I over-simplified the example data - there are actually about 50 columns beyond the "ORGnnn" start field - so both the manual type conversions or regex may be a pain (and I guess not maintainable using metadata?)
Is there an equivalent to a fileDelimited component that works downstream from the start component - i.e. applies a schema to each row (string) that's passed to it?
I imagined something like:
[text line reader] -> [filter (startswith "ORG")] -> [rowDelimited] -> [tMap] -> etc
Is this possible?
The attributes are the same for each "ORGnnn" row - I have defined a schema in the repository for this and I wanted to use that rather than hard-code the type conversion functions.
Looks like the cleanest solution is to filter the lines first into a temporary file and then use the standard delimited reader with the repository schema.
Many thanks for your help - I've learned two useful TOS methods!