You are not logged in.
Hi, I just wanted to thank everybody at Talend for making such a great open source ETL tool!
I'm working with some requirements, converting an excel spreadsheet to an ETL based automated update and alert using Talend and MySQL. I have some functions currently in excel that create averages and standard deviations amongst other things.
I need to create a lookback array, for the following requirements:
' create a 1000 period rolling average '
' create standard deviation of x at row i for the last i-1000th values in a dataset using the mean' and store that back in the database.
Whats the best way to do this by reading the source data once and then looping through using a tArray or global context and iterating back at each row i for the last 1000 values.
I have an example up and running that selects data from MySQL and then for each row, iterates over the same data again and selects the last 1000 records from the database again. This is grossly inefficient, considering the data required is already select. I can't figure out how to loop over the tArray or buffer if I write the data into memory and loop over that instead of selecting it from the database again.
The rough outline would be something like this:
tInputMySQL (Extract data : day(date), x(BigDecimal) )
tArray or Buffer
tJavaFlex or tJava loop over tArray
tLoop Nested loop
f i ... 1000
Pass the array of the last 1000 values of x from a date to tJavaFlex or tAggregate to create the mean and standard deviation)
tOutputMySQL insert the fields (day, x, mean, s.d., y
This post is useful Topic 8219 Group list row into Java as List
This post is also useful regarding uses of global contexts Topic 8356 Outputing a data flow from a loop in a java component
I'm interested to know if this is possible using tArray as I am having some trouble getting it to work.
I also upload a screenshot of the excel spreadsheet I have been given for the requirements documentation:
The highlight values show the current values for the 1000 period mean (x) and standard deviation (y) at the selected row.
Last edited by booobah (2009-10-18 14:31:06)