#1 2008-06-05 14:13:33

dsg78
New member
Registered: 2008-06-05
Posts: 3

Hash-file as lookup

Hi all,

I previously worked with DataStage and am just starting with Talend. I`m not quite sure how to do certain things I did with Datastage. For example, I created hash-files for lookups. All these hashfiles were created in seperate jobs and could be run parallel in the job sequence. In the main job I could use them again as lookups.
In Talend, I used the data-source (Informix database) directly for the lookup, since there are no real hash-files (I think). Is this the best way to do it?
I hope I explained my problem well enough for someone to be able to help me.

Thanks,
Heidi

Offline

#2 2008-06-10 00:46:52

mhirt
Talend team
Registered: 2006-09-19
Posts: 1635

Re: Hash-file as lookup

Of course, Talend and Datastage are different products with a different philosophy.
Talend doesn't need you to manually create HashFiles for lookups operations.
You can still do it if you want to, but this is simply not required.

HTH,

Offline

#3 2008-06-10 10:21:51

amaumont
Talend team
Registered: 2006-09-20
Posts: 471

Re: Hash-file as lookup

Hi dsg78,

I can resume the current behavior of Talend about the lookup subject:
- Talend store by default lookups data in memory once (before start the current subjob). Hash is implicit, so you needn't to use a specific component before tMap or tJoin lookups.

- since TOS 2.4, tMap component allows to store on disk lookup data (and temporary join data) by checking the 'Store on disk' option on the lookup as you want.

- you can't reuse already stored data in memory for the moment, this is a planned feature.

- you can't reuse already stored on disk data for the moment like DataStage, this is a planned feature.

I hope these informations will help you.

amaumont

Offline

#4 2008-06-10 16:45:26

dsg78
New member
Registered: 2008-06-05
Posts: 3

Re: Hash-file as lookup

Thank you both for answering. It was very helpful smile

Offline

#5 2008-06-10 17:35:30

tlittle
Member
Registered: 2007-08-21
Posts: 21

Re: Hash-file as lookup

Hi,

I am currently using TOS 2.3.3. I am wondering if TOS 2.4 can help me solve my issue:

My hash lookup file are a few GBs in size (more than physically available memory), would it be better to move over to TOS 2.4 and use the "store on disk" option?

Previously, I have tried to use a DB lookup table but the performance was horribly slow (I can't remember how slow exactly but it was slower by a few magnitudes as compared to using in-memory hashes). I resorted to splitting the hash by some predetermined categories and it helped to reduce memory consumption.

Please advise if I should move over to TOS 2.4

Thank you smile

Offline

#6 2008-06-10 18:18:39

mhirt
Talend team
Registered: 2006-09-19
Posts: 1635

Re: Hash-file as lookup

tlittle,

Yes you can try 2.4 !!!
You can have both 2.3.3 and 2.4 on the same computer.
Use import project, and install TOS 2.4 in antoher folder.

Regards,

Last edited by mhirt (2008-06-10 18:18:49)

Offline

#7 2008-06-11 04:54:32

tlittle
Member
Registered: 2007-08-21
Posts: 21

Re: Hash-file as lookup

thanks.. i will try it once i have ported to 2.3.3 from 2.1.4 successfully.. running into some problems with my custom components smile

writing custom hashing and mapping processing is a lot of trouble and tough to maintain...

Offline

#8 2010-10-07 18:03:31

NSaumande
Member
Company: Sopra Group
Registered: 2009-04-03
Posts: 13

Re: Hash-file as lookup

amaumont wrote:

- you can't reuse already stored data in memory for the moment, this is a planned feature.
- you can't reuse already stored on disk data for the moment like DataStage, this is a planned feature.

Have those features been implemented or are they still planned ?

Thanks,
Nicolas

Offline

#9 2012-03-24 12:26:57

ponzio
Member
Registered: 2008-10-01
Posts: 13

Re: Hash-file as lookup

Hi.
Datastage HASH is used not only for lookups, but also to implement other techniques.
For example, wrting and reading from the same hash enable you to know wich records you inserted in the hash up to the previous record. This technique helps you in many cases.
It's a sort of a db insert commiting every record.

I need to undestand how to implement this technique with Talend.

Many thanks,
Andrea

Offline

#10 2012-03-26 10:37:32

janhess
Member
Company: Newcastle University
Registered: 2009-05-19
Posts: 1122

Re: Hash-file as lookup

use a database table?

Offline

#11 2012-03-28 10:28:14

ponzio
Member
Registered: 2008-10-01
Posts: 13

Re: Hash-file as lookup

janhess wrote:

use a database table?

OYes, I didi in it in this way.
I'd like to know if exists a way to do this "in memory" ....

Arrays could help ?

Offline

#12 2012-03-28 11:00:50

lijolawrance
Member
Registered: 2010-01-27
Posts: 364

Re: Hash-file as lookup

Hi

use tHashInput/output components


Regards
Lijo Lawrance

Offline

#13 2012-03-28 11:35:48

ponzio
Member
Registered: 2008-10-01
Posts: 13

Re: Hash-file as lookup

lijolawrance wrote:

Hi

use tHashInput/output components

I Agree.
Sorry I've also implemented thiswith hash ... that's the reason I've put the post here (I'm familiar with datastage hash).

I was confusing with my another post http://www.talendforge.org/forum/viewtopic.php?id=22948
where I'm asking about the possibility to update the KEY of an "in memory" lookup.
A sort of

update lookup set  KEY= X where KEY= Y

Please if you want write there.


Many many thanks!
Andrea

Last edited by ponzio (2012-03-28 11:39:35)

Offline

Board footer

Powered by FluxBB