Post a reply

Write your message and submit

Options

Click in the dark area of the image to send your post.

Go back

Topic review (newest first)

LouisLOurson
2011-07-08 15:08:15

Why yes, yes I could have...

Heh.My job just got 2 components simpler !

alevy
2011-07-06 01:17:40

Could you not just use tDenormalize to achieve your output?

LouisLOurson
2011-07-05 15:21:56

Sorry to bring back an old topic, but this is the only forum post I could find that described my problem.

I had a similar need : ie a "custom aggregate" function for the tAggregateRow. I found out a neat way of doing this, so I'm sharing it here, this seems like a pretty standard thing you'd want to do with Talend.

Basically I had a input row like this :

Code:

id;info
-------
1;"a"
1;"b"
1;"c"
2;"a"
2;"d"

And my desired output was this :

Code:

id;infos
--------
1;"a:b:c"
2;"a:d"

I needed to aggregate the "info" data grouped by the "id" data, separated by the ":" character.

The solution I found looks like this :

Code:

input ----> tJavaRow1 ----> tAggregateRow ----> tJavaRow2

tJavaRow1 :

In this tJavaRow, I input data in the globalMap to achieve what I want :

Code:

String mapIndex = input_row.id.toString();

if (globalMap.get(mapIndex) == null)
    globalMap.put(mapIndex, input_row.info.toString());
else
    globalMap.put(mapIndex, globalMap.get(mapIndex).toString() + ":" + input_row.info.toString());    

output_row.id= input_row.id;

After this code I have in my global map the correct data for each id (ie each info is concateneted and separated by ":").

The tAggregateRow serves a dual purpose : i only fill in the "group by" part of the component, (no operations), so the lines are effectively grouped. Also, the tAggregateRow "breaks" the normal data flow of Talend (begin / main / end), so I can re-use another tJavaRow after it, and it will only start once all the lines from my first input have been processed by the first tJavaRow.

tJavaRow2 :

In this tJavaRow, I fetch what's in the globalMap for my id, and I output it to the output_row :

Code:

String mapIndex = input_row.id.toString();

output_row.id= input_row.id;
output_row.infos= globalMap.get(mapIndex).toString();

Using this technique i was able to concatenate strings, but really you can define any user fonction you want. (You could store integer values as an array in the globalMap, and then in the final tJavaRow you could perform operations on this array to emulate sum, max, min, avg, std, ...)

Are there any big drawbacks to using this technique ?

emendez
2008-04-11 19:04:56

plegall wrote:

emendez wrote:

The tricky point here is that there may be a different function for each field.

This is not a tricky point, this is the way it already works. The tricky point is to enable specific aggregate function.

Sorry, I know nothing about the internal implementation of TOS.

Done with the feature request. See:

[Bugtracker] feature 3540

Regards,


Eduardo.

plegall
2008-04-11 15:02:08

emendez wrote:

The tricky point here is that there may be a different function for each field.

This is not a tricky point, this is the way it already works. The tricky point is to enable specific aggregate function.

emendez
2008-04-11 14:10:45

@Volker: thanks, I'll try with use case 12.

@plegall: the latter: the posibility of deciding may be very useful. For example: I may decide that a surname with more characters is "better" than another with less characters. The tricky point here is that there may be a different function for each field. I'll post a feature request.

Thanks,

Eduardo.

plegall
2008-04-11 11:42:53

What you ask is a new aggregate function in tAggregateRow (as far as I understand) that would be "first not null/empty". Please create a bugtracker for this :-)

Now I remind that when I coded tAggregateRow in late 2006, I had in project to let the user create a "personnal aggregate function" (and I wanted the same for tSortRow). I could be good to reactivate this project. Can you also create a feature request? (feature requests created by users have more weight)

Volker Brehm
2008-04-10 23:46:36

Hello Eduardo,

if you have different inputs you can use a tMap and "merge" them with the lookup functionality.

Alternatively if you have your data in a flow spread over multiple rows the following article in the wiki could help you (after sort your data based on id): http://www.talendforge.org/wiki/doku.php?id=use_case:12

Bye
Volker

emendez
2008-04-10 12:39:48

Author: emendez

Sorry for asking this question but I cannot go further than this:

I have managed to collect information about an employee from several databases and I have managed to "unite" them. Something like this:

id  name  surname  age
1   John
1   John                 23
1              Smith
2  Alice     Peters     31
2  Alice                   32

The only field I can rely on is "id". With the others I have to aggregate them "wisely": when there is no value in a row for a column but there is something for that column in another row I should take the latter; when there is conflicting data I just take the first one. So the desired result should be this:

id  name  surname  age
1   John    Smith      23
2  Alice     Peters     31

I think it is not a tMap or tUniqRow or tNormalize or tAggregateRow. What is it?.

Thanks in advance,


Eduardo.

Board footer

Powered by FluxBB