• Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » Check dataquality of FilePositional with header/footer row

#1 2009-03-15 19:56:33

Snyder
New member
Registered: 2009-03-14
Posts: 7

Check dataquality of FilePositional with header/footer row

Hi,

i do have a file (positional) with a header and a footer row providing information about the real data (count rows, sum amount...)
How can i validate file data with header/ footer information before processing continues.

To enable validation I have filtered the rows by type (header, footer and data). Moreover i do t_AggregateRow on data rows to determine "count rows" etc.

But i do not have any idea how to deal with. May you help me?

BR,
Sebastian

Offline

#2 2009-03-15 23:17:23

Volker Brehm
Member
Registered: 2007-04-03
Posts: 1139
Website

Re: Check dataquality of FilePositional with header/footer row

Hi Sebastian,

first I think you need to decide if you would like to parse the file and check it and than run the processing (parsing the file a second time) or do all in one step.

The design  of you job will mainly depend of your data and which values you would like to calculate. For example if it is only the number of rows you could count them in a context var and check them against your footer row at the end.

Bye
Volker

Offline

#3 2009-03-15 23:36:49

Snyder
New member
Registered: 2009-03-14
Posts: 7

Re: Check dataquality of FilePositional with header/footer row

Hi Volker,

and how do i check them. thats the biggest problem for me.

BR
Sebastian

Offline

#4 2009-03-16 03:02:01

shong
Talend team
Registered: 2007-08-29
Posts: 10308
Website

Re: Check dataquality of FilePositional with header/footer row

Hello Sebastian

Here is the way of getting the header/footer line:
Header,
tFileInputPositional--->tSampleRow(let you choose a list of line numbers and/or a list of ranges. Set range as "1" to get the first line)-->tJavaRow(do your filter/validate/processing to get expected data, then set them to context var)
Footer,
tFileRowCount(to get the line number of total line, assuming there are 8 lines in your file)
|
onSubOk
|
tJava(set the line number to a context variable, eg: context.lineNumber(String)))-->tJavaRow(do your filter/validate/processing to get expected data, then set them to context var)
|
OnSubOk
|
tFileInputPositional--->tSampleRow(Set the range as context.lineNumber to the last line data)

If you still have problem, please show us an example of your file and what are your expect result.

Best regards

         shong


Email:shong@talend.com
Choose Talend, Enjoy Talend!
New & Event: Talend Help Center
Talend-->the leader of open source data management and application integration solutions!

Offline

#5 2009-03-17 14:11:13

Snyder
New member
Registered: 2009-03-14
Posts: 7

Re: Check dataquality of FilePositional with header/footer row

Hi,

thank you for your detailed instructions.
But how do i the comparison of the two context values and go on with processing in success state?

Regards
Sebastian

Offline

#6 2009-03-17 14:19:41

shong
Talend team
Registered: 2007-08-29
Posts: 10308
Website

Re: Check dataquality of FilePositional with header/footer row

Hello Sebastian

Can you show us an example of your file and what are your expected result?

Best regards

         shong


Email:shong@talend.com
Choose Talend, Enjoy Talend!
New & Event: Talend Help Center
Talend-->the leader of open source data management and application integration solutions!

Offline

#7 2009-03-17 14:29:56

Snyder
New member
Registered: 2009-03-14
Posts: 7

Re: Check dataquality of FilePositional with header/footer row

Hi,

example file:

H20090317123000
Dabcd0000120.50
Dwxyz0000099.99
F000000200000000000220.49

Explanation of rows
H -> Header Line with Timestamp 2009-03-17 12:30:00
D -> Data Record with description and amount , Desc: abcd, Amount: 0000120.50 -> 120.50 €
F -> Footer Line with count rows of data records and overall amount, 0000002 -> 2 Rows (D), 00000000000220.49 -> 220.49 Amount


I want to check the data records based on information in the footer line and do the real processing (database etc) if data quality is ok.


Regards
Sebastian

Offline

#8 2009-03-17 21:34:03

Volker Brehm
Member
Registered: 2007-04-03
Posts: 1139
Website

Re: Check dataquality of FilePositional with header/footer row

Hi Sebastian,

I would propose the following job:

tFileInputRegex("(.)(.*)") --(row)--> tMap --(type = D)--> tAggregateRow(fix key, sum amount) --(row)--> tJavaRow_1
                                                             --(type = F)--> tJavaRow:2
Code in tJava Row:

Code:

// code in tJavaRow_1
context.sumOfAmount= input_row.sumOfAmount;
context.fileValid= false;

//code in tJavaRow_2
context.fileValid= context.sumOfAmount == input_row.sumOfFooter;

Now you could use a second job and trigger it based on context.fileValid. In the second job you have to do the parsing again, unless you are able to process the file in step one and do a rollback in case of errors.

@Shong: I'm not sure: Does a "run if ..." work in this case (instead of tInSubJobOk and/or a thrown exception if checks in tJavaRow_2 fails)?

Bye
Volker

Offline

  • Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » Check dataquality of FilePositional with header/footer row

Board footer

Powered by FluxBB