You are not logged in.
In the process of trying to use a tGroovy component to try to process rows, I found out how to use a tGroovy component to process rows. So, for those that don't know the answer, I provide this explanation:
#1 use a tFlowToIterate component
#2 map each rowX.1 column from the component going into your tFlowIToIterate into a variable in the tGroovy component
#3 write your Groovy script
If there is a more efficient way than this I would be happy to find out. The problem with this method is that the groovy script gets reloaded every time because the code generated binds variables each time with the tGroovy object that is created on the fly. Better would be to allow for keeping the instance of the tGroovy object in memory and then the user could just grab the variables out of globalMap since they're put there by the tFlowToIterate anyway.
Also, what tripped me up originally is that I could join a tMap component to a tGroovy component but this did not create code inside the mapping loop to call the tGroovy component. It seems odd if an output row connected to a tGroovy component does not execute the component for that row and the only way to execute on a row is to turn the flow into an iterate. Perhaps then you should not be able to join the output row from the tmap (or main, etc) onto the tGroovy component, no?
Build id: r22164-20090226-0025
Last edited by cbpusa (2009-03-21 03:23:14)
Well I don't have any problem getting data into the tGroovy component using your method, but getting data out seems to be a different story.
There is a "flow" output of tGroovy, but I can't figure out how to "push" anything to it. Even if I could, though, it doesn't seem like a good idea because only one row at a time is coming through due to the tFlowToIterate.
I added a tIterateToFlow component after my tGroovy via in "iterate" link and that fixed the iterator problem, but I still can't get any data.
Does anyone know if there is a way for a groovy script to write to the globalMap? It can read from it just fine via the "Variables" list in the component settings, but unfortunately the assignment doesn't appear to go both ways.
Any help or advice will be appreciated. Thank you.
There are ways. First off, remember that Groovy is compiled into JCode and runs inside the JVM so you can essentially do anything that you can do in Java.
Ergo, you should be able to bind globalMap and pass it into your code to do what you like and then process the results that you put in globalMap after the script completes in the flow.
On the other hand, as I mentioned, this is a bit expensive on a row per row basis. So, alternatively you can do this:
Write either a Java or Groovy program to create a row list wherein each row is also a list the first time through and put it in the globalMap
The second time through (i.e. the next file/flow instance, etc) retrieve the list and clear it.
Process rows in your script and put the results into the list in global map
Connect your groovy job to a tFixedFlowInput which uses the list size (e.g. ((List)globalMap.get(yourmap)).size()) to specify the number of rows and then assign values for your columns in the fixedflowinput by oing gets such as:
The downside to this approach is that it fills up memory with all the rows being processed, which can be handled by writing the output of your Groovy job to a delimited or xml file, for instance.
Anyway, I used this approach because I have incoming files where I need to add a row in the middle of the flow based upon the contents of one other row such that the new row is looked up using the singleton row as a key. I had to split the flow so that this row could be created using a tMap while the rest of the rows flow along through normal processing. I used the approach described above to collect the completed set of rows together with the new row added into the correct spot for subsequent load into a dbms.
I mention all of that because I therefore know that this works - it's fast enough as well - 4000-5000 rows per second on my notebook and about that on a linux server where I'm developing (I have a zippy notebook).
Anyway, hope this helps.
Thank you. Binding the globalMap will work for my relatively small data set. I can't believe I didn't think of that.
I kept trying to bind immutable string variables that were stored in the globalMap and wondering why altering them in the groovy code wasn't having any affect outside the script. It's amazing what a good night's sleep and some good advice can do.
Ah, that reminds me - another option for you is to write a groovy wrapper class for your script code and then call it from a java job - this has the advantage of avoiding the constant script recompile that is performed by the groovy jobs and then you can pass arguments in and out as bean properties. You would have to load the class library in a pre-job, naturally enough.
Groovy is just Java on steroids without all the clutter after all
Fortunately (I think) I found a way to do what I need in tMap. That component is more powerful that I'd realized.
Building a custom class (in java or groovy) is a good idea though. I will use it in the future. Actually, I've been exploring that sort of job customization lately. That's why I was looking for a groovy equivalent of the tJavaRow and found your post. I found that the feature request for the "tGroovyRow" was closed as undoable. I'm not sure I understand why it was though. I understand that binding the input and output rows would be quite a bit more complex than in a tJavaRow, but it's not impossible. I mean, you've proven that with you're workaround....though I realize the recompilation problem is an issue.
Anyway, thank you for your help and good suggestions.
Actually, it is far from undoable. I do not think that they have considered the fact that a script when compiled is turned into a class and that likewise they could provide the limitation that the top level Groovy class should be declared in the job. Thus, it is, in fact, nothing more than a Java class and can thus have properties set and retrieved like any Java class can. Ergo, given Groovy's dynamic nature, it would be simple enough to generate and compile a class instance automatically whose properties contained a row object along the lines of row6 that is set and then retrieved at the end of a standard call.
I think it became "undoable" because from what I see of the Groovy support it is rudimentary to the level of an example that was hacked to work and thus, I would tend to think this is a lack of expertise in this case. It is certainly not at all "undoable" - it would just appear that no one there knows how to do it.
If I ever get some time between writing web service calls and gwt screens and writing dbms schemas and so forth, I will look into the ecosystem thing to see what can be done. Implementing a dynamic call interface to a groovy class that is a wrapper for a class or script provided by a user is fairly straightforward to the point of being trivial; however, figuring out how to stuff that seamlessly into TOS is far less trivial as I just *use* it - I really haven't had the time to investigate it to see how to extend the functionality in this way. If a doc happened to be sitting around somewhere explaining the right way to create a new job type to be added to the studio seamlessly, I would probably take the time to do it because I use groovy quite extensively anyway and it would be more convenient for *me* - naturally, I have no problem giving the fruits of such labors to others to use for free.
Anyway, class or no class, your code is wrapped in a class in all cases and that class is callable - one need not use the binding mechanism at all, and I do not do so. The example that I gave you of writing a class that encapsulates all the groovy code and then calling it from a Java job is not only practical, it's what I'm doing since direct calls and direct assignment of properties with compiled groovy classes sitting in a jar combined with the -server flag results in free optimizations occurring as the jobs run and I'm all for free optimizations when I can get them.
For example, here is the trivial code from the Java job that makes the call to Groovy to perform duplicate detection of EDI 835 transactions in one of my jobs:
DedupEdiFile dedupEdiFile = (DedupEdiFile)globalMap.get("DedupEdiFile");
String infile = context.edi_spool_temp_dir.concat((String)globalMap.get("CurrentOboeXmlFile"));
String outfile = context.edi_spool_temp_dir.concat(((String)globalMap.get("tFileList_5_CURRENT_FILE"))).concat("dedup.xml");
String dupfile = context.edi_spool_temp_dir.concat(((String)globalMap.get("tFileList_5_CURRENT_FILE"))).concat("dup.xml");
dedupEdiFile.processFile((String)globalMap.get("TransactionType"), infile, outfile, dupfile, 1);
DeDupEdiFile is a Groovy class. In the PreJob within the job, the Groovy class is instantiated, passed the connection for the job and stored in globalMap thusly:
DedupEdiFile dedupEdiFile = new DedupEdiFile(((java.sql.Connection)globalMap.get("conn_tJDBCConnection_1")));
Thus, all the Groovy goodness involving getting necessary fields for deduping via XmlSlurper'ing and a sql builder to generically generate the appropriate database call to do the check for each document element for duplication is all handled in Groovy. The equivalent code in Java would be several times larger than the Groovy solution and the current XML jobs are both inadequate and slow.
In any event, yes, globalMap is very powerful as is any "global" thing in making it easier to transfer things around without a direct call interface to provide the data. It is more than adequate for most things - the thing I don't like is that the way TOS handles Groovy is to recompile and rebind each time through and that is sloooooooooooooooooooooooow. Thus, I do not recommend you use that approach if you're processing lots of data, which I am, as it's just not going to keep up and the advantages of Groovy can become secondary to the implementation overhead introduced by the current TOS approach.
good luck in your endeavors,
Last edited by cbpusa (2009-04-15 21:26:04)
Hi: I am trying to join two MySql (Ver 5.1) tables using tMap based on two columns, it appears that joining tables on varchar types columns do not work well where as joining int/long columns works well. I am using Talend community vesion 4.1.1
Because being varchar type columns, I have tried to use column.equals also and it just does not seem to work. I have read the Talend manuals as well as scavenging the forum but did not find any proper help.
Thanks in advance.
Last edited by sabybose (2010-12-19 21:36:11)