• Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » How to use functions such as SUBSTR, UC, LENGTH etc.

#1 2008-03-24 22:16:56

psm2000
Member
Registered: 2008-02-25
Posts: 108

How to use functions such as SUBSTR, UC, LENGTH etc.

I am using Excel/Flat files (no databases at this point) and I need to create additional columns that have either substrings of original columns etc. I am using TOS 2.3.1 with PERL project.

E.g.
Class, Subsection, Description
1000, 1-1, Saturn
1001, 1-2, Nissan
1002, 2-3, Honda

And I want output like this:
Class, Subsection, Description, Substring_Section, Upper_Case_Description, length_Description
1000, 1-1, Saturn, 1,SATURN, 6
1001, 3-2, Nissan,3,NISSAN, 6
1002, 2-3, Honda,2,HONDA, 5

So I am keeping my current columns and want to add more columns with functions like substring, uc, split etc. There are no joins or anything. I am not replacing original columns or filtering on any particular value (that will be later).

Q1. Will a component like tPERL or tMap handle this? A line by line conversion and creation of additional columns as the rows are processed?
Q2. If I can use any PERL function on the flow and create new column, then I've all PERL string functions available. Is there an example for this? This might be simpler for me.
Q3. Can I split the subsection in the example above based on the presence of "-"? So if I have input 23-1, I get 23. If I have 1-23, I get 1 etc. (I think I can figure this out in PERL using rindex or something).


Thanks in advance for your help.
Sean

Offline

#2 2008-03-25 03:59:31

shong
Talend team
Registered: 2007-08-29
Posts: 10305
Website

Re: How to use functions such as SUBSTR, UC, LENGTH etc.

Hello Sean

It is very easy to fit your requirement in Java version. In Perl project, Plegall(Perl project developer) will give you an answer. if need, I will show you the Java example.

Best regards

         shong


Email:shong@talend.com
Choose Talend, Enjoy Talend!
New & Event: Talend Help Center
Talend-->the leader of open source data management and application integration solutions!

Offline

#3 2008-03-25 11:59:54

rbillerey
Talend team
Registered: 2006-09-22
Posts: 150

Re: How to use functions such as SUBSTR, UC, LENGTH etc.

Hi Sean,


Talend Perl version also "makes the simple things easy and the hard things possible" :

use tPerlRow component to process your input columns and fill your additional columns. In tPerlRow, you can have an output schema different from input schema and define specific code that is processed for each row.

Code:

# fill output columns with input columns
@output_row = @input_row;

# split returns a list, we pick the first element
$output_row[substring_section] = ( split('-',$input_row[subsection] ))[0];

$output_row[uc_description] = uc $input_row[description];

$output_row[length_description] = length $input_row[description];

Hope it helps.

Richard

For shong : could you please give us your java version ?


Uploaded Images

Offline

#4 2008-03-25 17:30:26

psm2000
Member
Registered: 2008-02-25
Posts: 108

Re: How to use functions such as SUBSTR, UC, LENGTH etc.

Richard,
This will work perfectly for me and is easy to implement.

I refer to users guide 2.3_a quite often and it has no reference to tPerlRow, tJoin etc. Is there a newer in-progress doc anywhere? I know it is always going to be a moving target with such a rapidly evolving code base.

Thanks for your answer. You guys are great!
Regards,
Sean

Offline

#5 2008-03-26 00:08:09

plegall
Member
Registered: 2006-09-19
Posts: 1586
Website

Re: How to use functions such as SUBSTR, UC, LENGTH etc.

psm2000 wrote:

I refer to users guide 2.3_a quite often and it has no reference to tPerlRow, tJoin etc. Is there a newer in-progress doc anywhere?

Yes, 23b is available at download, but there is no tPerlRow and tJoin inside. I know esabot (in charge of documentation) as worked with rbillerey (Perl team developer) about tJoin so it should come as soon as tJoin will be a priority :-) About tPerlRow, I have no info. This component is available since TOS 1.0.0 but is not documented yet.

I know it is always going to be a moving target with such a rapidly evolving code base.

You're so right... documentation is a never ending piece of work. In addition to general features, the number of components as increased faster and faster since TOS 2.0.0

Offline

  • Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » How to use functions such as SUBSTR, UC, LENGTH etc.

Board footer

Powered by FluxBB