Post a reply

Write your message and submit

Options

Click in the dark area of the image to send your post.

Go back

Topic review (newest first)

ponzio
2012-03-28 11:35:48

lijolawrance wrote:

Hi

use tHashInput/output components

I Agree.
Sorry I've also implemented thiswith hash ... that's the reason I've put the post here (I'm familiar with datastage hash).

I was confusing with my another post http://www.talendforge.org/forum/viewtopic.php?id=22948
where I'm asking about the possibility to update the KEY of an "in memory" lookup.
A sort of

update lookup set  KEY= X where KEY= Y

Please if you want write there.


Many many thanks!
Andrea

lijolawrance
2012-03-28 11:00:50

Hi

use tHashInput/output components

ponzio
2012-03-28 10:28:14

janhess wrote:

use a database table?

OYes, I didi in it in this way.
I'd like to know if exists a way to do this "in memory" ....

Arrays could help ?

janhess
2012-03-26 10:37:32

use a database table?

ponzio
2012-03-24 12:26:57

Hi.
Datastage HASH is used not only for lookups, but also to implement other techniques.
For example, wrting and reading from the same hash enable you to know wich records you inserted in the hash up to the previous record. This technique helps you in many cases.
It's a sort of a db insert commiting every record.

I need to undestand how to implement this technique with Talend.

Many thanks,
Andrea

NSaumande
2010-10-07 18:03:31

amaumont wrote:

- you can't reuse already stored data in memory for the moment, this is a planned feature.
- you can't reuse already stored on disk data for the moment like DataStage, this is a planned feature.

Have those features been implemented or are they still planned ?

Thanks,
Nicolas

tlittle
2008-06-11 04:54:32

thanks.. i will try it once i have ported to 2.3.3 from 2.1.4 successfully.. running into some problems with my custom components smile

writing custom hashing and mapping processing is a lot of trouble and tough to maintain...

mhirt
2008-06-10 18:18:39

tlittle,

Yes you can try 2.4 !!!
You can have both 2.3.3 and 2.4 on the same computer.
Use import project, and install TOS 2.4 in antoher folder.

Regards,

tlittle
2008-06-10 17:35:30

Hi,

I am currently using TOS 2.3.3. I am wondering if TOS 2.4 can help me solve my issue:

My hash lookup file are a few GBs in size (more than physically available memory), would it be better to move over to TOS 2.4 and use the "store on disk" option?

Previously, I have tried to use a DB lookup table but the performance was horribly slow (I can't remember how slow exactly but it was slower by a few magnitudes as compared to using in-memory hashes). I resorted to splitting the hash by some predetermined categories and it helped to reduce memory consumption.

Please advise if I should move over to TOS 2.4

Thank you smile

dsg78
2008-06-10 16:45:26

Thank you both for answering. It was very helpful smile

amaumont
2008-06-10 10:21:51

Hi dsg78,

I can resume the current behavior of Talend about the lookup subject:
- Talend store by default lookups data in memory once (before start the current subjob). Hash is implicit, so you needn't to use a specific component before tMap or tJoin lookups.

- since TOS 2.4, tMap component allows to store on disk lookup data (and temporary join data) by checking the 'Store on disk' option on the lookup as you want.

- you can't reuse already stored data in memory for the moment, this is a planned feature.

- you can't reuse already stored on disk data for the moment like DataStage, this is a planned feature.

I hope these informations will help you.

amaumont

mhirt
2008-06-10 00:46:52

Of course, Talend and Datastage are different products with a different philosophy.
Talend doesn't need you to manually create HashFiles for lookups operations.
You can still do it if you want to, but this is simply not required.

HTH,

dsg78
2008-06-05 14:13:33

Hi all,

I previously worked with DataStage and am just starting with Talend. I`m not quite sure how to do certain things I did with Datastage. For example, I created hash-files for lookups. All these hashfiles were created in seperate jobs and could be run parallel in the job sequence. In the main job I could use them again as lookups.
In Talend, I used the data-source (Informix database) directly for the lookup, since there are no real hash-files (I think). Is this the best way to do it?
I hope I explained my problem well enough for someone to be able to help me.

Thanks,
Heidi

Board footer

Powered by FluxBB