You are not logged in.
Announcement
Unanswered posts
|
Pages: 1
Hi all,
I previously worked with DataStage and am just starting with Talend. I`m not quite sure how to do certain things I did with Datastage. For example, I created hash-files for lookups. All these hashfiles were created in seperate jobs and could be run parallel in the job sequence. In the main job I could use them again as lookups.
In Talend, I used the data-source (Informix database) directly for the lookup, since there are no real hash-files (I think). Is this the best way to do it?
I hope I explained my problem well enough for someone to be able to help me.
Thanks,
Heidi
Offline

Of course, Talend and Datastage are different products with a different philosophy.
Talend doesn't need you to manually create HashFiles for lookups operations.
You can still do it if you want to, but this is simply not required.
HTH,
Offline

Hi dsg78,
I can resume the current behavior of Talend about the lookup subject:
- Talend store by default lookups data in memory once (before start the current subjob). Hash is implicit, so you needn't to use a specific component before tMap or tJoin lookups.
- since TOS 2.4, tMap component allows to store on disk lookup data (and temporary join data) by checking the 'Store on disk' option on the lookup as you want.
- you can't reuse already stored data in memory for the moment, this is a planned feature.
- you can't reuse already stored on disk data for the moment like DataStage, this is a planned feature.
I hope these informations will help you.
amaumont
Offline

Hi,
I am currently using TOS 2.3.3. I am wondering if TOS 2.4 can help me solve my issue:
My hash lookup file are a few GBs in size (more than physically available memory), would it be better to move over to TOS 2.4 and use the "store on disk" option?
Previously, I have tried to use a DB lookup table but the performance was horribly slow (I can't remember how slow exactly but it was slower by a few magnitudes as compared to using in-memory hashes). I resorted to splitting the hash by some predetermined categories and it helped to reduce memory consumption.
Please advise if I should move over to TOS 2.4
Thank you ![]()
Offline

thanks.. i will try it once i have ported to 2.3.3 from 2.1.4 successfully.. running into some problems with my custom components ![]()
writing custom hashing and mapping processing is a lot of trouble and tough to maintain...
Offline

amaumont wrote:
- you can't reuse already stored data in memory for the moment, this is a planned feature.
- you can't reuse already stored on disk data for the moment like DataStage, this is a planned feature.
Have those features been implemented or are they still planned ?
Thanks,
Nicolas
Offline

Hi.
Datastage HASH is used not only for lookups, but also to implement other techniques.
For example, wrting and reading from the same hash enable you to know wich records you inserted in the hash up to the previous record. This technique helps you in many cases.
It's a sort of a db insert commiting every record.
I need to undestand how to implement this technique with Talend.
Many thanks,
Andrea
Offline

Hi
use tHashInput/output components
Offline

lijolawrance wrote:
Hi
use tHashInput/output components
I Agree.
Sorry I've also implemented thiswith hash ... that's the reason I've put the post here (I'm familiar with datastage hash).
I was confusing with my another post http://www.talendforge.org/forum/viewtopic.php?id=22948
where I'm asking about the possibility to update the KEY of an "in memory" lookup.
A sort of
update lookup set KEY= X where KEY= Y
Please if you want write there.
Many many thanks!
Andrea
Last edited by ponzio (2012-03-28 11:39:35)
Offline
Pages: 1