(Actual) TOS is mainly focused at row level. This means: One row in, one row through and one row out. In the last releases there are several components which break with this limitation. Examples are tNormalize, tMap, and some more. But altogether are primary based on a single row to. Meaning they are not able to handle them differently.
The following example will show you a way how you can break with this limitation and merge rows together.
This example is based on TOS 2.3.2 java generation.
Because there is actual no standard component for this task, we have to use tJavaRow to merge the rows together. This means we would use the standard components for data input and have to find the highest common denominator for our data (and so additional work). Because we cant create additional rows, we have to know when a data block ends.
In the following example we will use one file with multi line data to create one file with a row per input object.
The whole job will consists of four logical parts:
Because we have different formats in the input file for each row we will parse the data in a own function. Depending on your need you can use all standard components . Additional it may possible to read this file with a “multi line regex”. But this worth an own use case…
As you see you need (sometimes, depending on your input) to decompose the attribute on your own. For xml you could sometimes use tParseXmlRow. But in other cases this is not so easy. So if you need to decompose a attribute the best way should be to write a generic custom function (and share it later with the community). Additional if you search in the forum you may find one for your need.
At the end we have now one output row for each input row. But only a limited number are completely filled with the data. So we need a rule to filter the unfinished rows out. In this case this is very easy: Because we set quantity as the last attribute only rows with this value set are correctly filled and could pass the tFilterRow.
Talend Open Studio is limited to a row based flow triggered by the input. But this could be easy bypassed with a little bit code. This is only a short example to give you a hint in which direction you can go. You can change each component for input, output and inside the flow.
If you would like to discuss this solution or have an optimization (or even a better idea), join the following thread: Talend forum thread 2017