Use Case 12: Consolidate n rows to one

(Actual) TOS is mainly focused at row level. This means: One row in, one row through and one row out. In the last releases there are several components which break with this limitation. Examples are tNormalize, tMap, and some more. But altogether are primary based on a single row to. Meaning they are not able to handle them differently.

The following example will show you a way how you can break with this limitation and merge rows together.

This example is based on TOS 2.3.2 java generation.

Limitations

Because there is actual no standard component for this task, we have to use tJavaRow to merge the rows together. This means we would use the standard components for data input and have to find the highest common denominator for our data (and so additional work). Because we cant create additional rows, we have to know when a data block ends.

Example

In the following example we will use one file with multi line data to create one file with a row per input object.

Input data

product.txt

:use_case:use_case_12-5.png

expected output data

product.csv

:use_case:use_case_12-6.png

Creating the job

The whole job will consists of four logical parts:

  1. input the data in a universal way
  2. parse the input rows (because we cant use the standard components)
  3. create the merged output row
  4. identification of finishing the output row
  5. write the output data

:use_case:use_case_12-3.png

Data input

Because we have different formats in the input file for each row we will parse the data in a own function. Depending on your need you can use all standard components . Additional it may possible to read this file with a “multi line regex”. But this worth an own use case…

:use_case:use_case_12-4.png

Decomposition and merging of the rows

:use_case:use_case_12-7.png

helper function for decomposition

As you see you need (sometimes, depending on your input) to decompose the attribute on your own. For xml you could sometimes use tParseXmlRow. But in other cases this is not so easy. So if you need to decompose a attribute the best way should be to write a generic custom function (and share it later with the community). Additional if you search in the forum you may find one for your need.

filter for finally merged rows

At the end we have now one output row for each input row. But only a limited number are completely filled with the data. So we need a rule to filter the unfinished rows out. In this case this is very easy: Because we set quantity as the last attribute only rows with this value set are correctly filled and could pass the tFilterRow. :use_case:use_case_12-8.png

Data output

There is no “magic”, just normal business of a tFileOutputDelimited… :use_case:use_case_12-9.png

Summary

Talend Open Studio is limited to a row based flow triggered by the input. But this could be easy bypassed with a little bit code. This is only a short example to give you a hint in which direction you can go. You can change each component for input, output and inside the flow.

Additional Information

If you would like to discuss this solution or have an optimization (or even a better idea), join the following thread: Talend forum thread 2017

 
use_case/12.txt · Last modified: 2011/12/17 12:52 (external edit)
 
 
Recent changes RSS feed Driven by DokuWiki