• Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » Collect Items and "group" selects

#1 2012-04-27 23:31:50

rbecker
New member
Registered: 2012-04-27
Posts: 3

Collect Items and "group" selects

Hello together,

I am new to talend but managed to create first scenarios. I like the tool.

Now I am at a point I do not know how to manage this with talend. I have to transfer mass data from a progress database (not postgress) to SAP. The systems are in different parts of the world so I have a runtime and network bandwidth and performance issue. I have a working scenario doing this:

tJDBCInput1 -> Call Subjob
                             |
                             |
                          tJDBCInput2-> tMap -> tSAPOutput

tJDBCInput1 just selects all keys from the database, every key is passed to the subjob, there tJDBCInput2 selects the complete line (around 80 fields) and sends them to SAP (SAP can be exchanged by a WebService, same problem)
This works but as you can think is not very efficient.

What I want to do now is group the result from tJDBCInput1 by for example 100 rows and give this to the SubJob. With tJDBCInput2 I can then select these 100 records at once, collect them and send with tSAPOutput.

The key of the database is material number. With the first tJDBCInput I get around 20.000 material numbers. They are sorted but a mixture of numeric and alphanumeric numbers. My idea is now to finded the first number and the number from row 100, pass these 2 values to the SubJob, there in tJDBCInput2 use "where matnr between 4711 and abc23" to get the complete data from these 100 records. Next use number from row 101 to 200 etc.

Collecting the data for the use in tSAPOutput can be done like described here:
http://www.talendforge.org/forum/viewtopic.php?id=10992

I hope I could explain my problem a way you are able to understand.

How can I archive the grouping of 100 records and work with them in a subjob?

Any idea or maybe better solutions are welcome.

Many thanks for help.

Best regards

Roman

Offline

#2 2012-05-02 05:09:12

pedro
Member
Registered: 2011-11-17
Posts: 3682

Re: Collect Items and "group" selects

Hi Roman

I think using two tJDBCInput won't make the job efficient.
Especially there isn't a way to increase the keys by adding a number which are sorted but a mixture of numeric and alphanumeric numbers.
In mysql, there is 'LIMIT' as below to get specified range of rows.

Code:

SELECT * FROM table   LIMIT [offset,] rows | rows OFFSET offset

So there may be the same syntax under your DB. It must be more efficient because only one tJDBCInput is enough.

Regards,
Pedro


Only Paranoid Survive.

Offline

#3 2012-05-02 12:28:40

rbecker
New member
Registered: 2012-04-27
Posts: 3

Re: Collect Items and "group" selects

Thanks for this answer. Progress SQL in the version I have to connct does not support LIMIT or TOP etc. SQL Statement.
So I thought where <field> between <xx> and <yy> would be best to use.

Any further idea?

Best regards

Roman

Offline

#4 2012-05-03 03:51:34

pedro
Member
Registered: 2011-11-17
Posts: 3682

Re: Collect Items and "group" selects

Hi Roman

As far as I am concerned, there isn't any specified component for progress DB now.
The normal JDBC connection won't support better performance. Even though you want to get a range between <xx> and <yy>, it doesn't help a lot.

Usually what you can do is using tMap with 'Store on disk', inner join Table1 and Table2.

Besides, the material numbers are not sequence. Is there a rule to add some value to get the next material number?

Regards,
Pedro


Only Paranoid Survive.

Offline

#5 2012-05-03 10:20:00

rbecker
New member
Registered: 2012-04-27
Posts: 3

Re: Collect Items and "group" selects

Hello Pedro,

the problem of the performance has 2 sites.

The way it is now:

1. I just select all material numbers select matnr from mat_table - This is ok
2. I iterate over every material number and select the complete line and send this to SAP by tSAPOutput - This is slow because of the RFC overhead to SAP is big. So I have to send a number of lines that is fast in select and fast in RFC (e.g. 100 or 500). For this i need the grouping of the lines from 1. to get the number of complete lines at once in 2. I am sure this will cut the tSAPOutput time a lot a s I can send many lines at once.

Any idea?

Best regards

Roman

Offline

  • Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » Collect Items and "group" selects

Board footer

Powered by FluxBB