Post a reply

Write your message and submit

Options

Click in the dark area of the image to send your post.

Go back

Topic review (newest first)

timson1
2008-05-15 00:27:00

plegall wrote:

timson1 wrote:

I also wish we could have something like this: http://search.cpan.org/~creamyg/Sort-Ex … xternal.pm
wrapped in the component.

I have this CPAN module in my TODO list. Having read the description once again, I "see also" Sort::Merge which looks a lot like what I implemented in my prototype. I'll have a look at it also.

A-ha! Great minds think alike smile
Thank you for the heads-up, I know you guys are on the right track.

plegall
2008-05-15 00:21:55

timson1 wrote:

I also wish we could have something like this: http://search.cpan.org/~creamyg/Sort-Ex … xternal.pm
wrapped in the component.

I have this CPAN module in my TODO list. Having read the description once again, I "see also" Sort::Merge which looks a lot like what I implemented in my prototype. I'll have a look at it also.

timson1 wrote:

(unfortunately I can not use the tExternalSortRow, because I want to run my code on Windows as well as on linux/unix)

GNU sort is available for Windows in GNU core utilities

timson1 wrote:

Well, I guess I need to learn how to submit feature requests now smile

Yes, that's an important skill for community members :-)

timson1
2008-05-15 00:09:20

plegall wrote:

tips to reduce memory usage:

1. try to reduce the number of rows in the lookup, load in memory only what you'll use later, nothing else
2. choose the smaller filer for lookup if you can choose

I've recently written a Perl prototype for "sorted join" wich takes two sorted files as argument. Performances are really interesting, it has not been implemented in TOS yet, because we have a problem to make the prototype match the code generation model if we want to use the "lookup" links. A workaround would be to have a tSortedMerge with a lookup file as property and no lookup link. This could be done quite fast (less than 2 days of coding and testing), please post a feature request if you're interested.

Another solution is to load files in two database tables (with tMysqlBulkExec for example) and then perform a join query with tMysqlInput (don't forget to have an index on the lookup key).

amaumont and slanglois have recently implemented a sorted join in 2.4.0RC1 and Java project. If you give it a try, I'm sure they are interested in any feedback from users.

Thanks for the great reply. I really appreciate it.
I'll wait for the sorted join component.

I also wish we could have something like this: http://search.cpan.org/~creamyg/Sort-Ex … xternal.pm
wrapped in the component.
(unfortunately I can not use the tExternalSortRow, because I want to run my code on Windows as well as on linux/unix)

Well, I guess I need to learn how to submit feature requests now smile

plegall
2008-05-15 00:01:21

tips to reduce memory usage:

1. try to reduce the number of rows in the lookup, load in memory only what you'll use later, nothing else
2. choose the smaller filer for lookup if you can choose

I've recently written a Perl prototype for "sorted join" wich takes two sorted files as argument. Performances are really interesting, it has not been implemented in TOS yet, because we have a problem to make the prototype match the code generation model if we want to use the "lookup" links. A workaround would be to have a tSortedMerge with a lookup file as property and no lookup link. This could be done quite fast (less than 2 days of coding and testing), please post a feature request if you're interested.

Another solution is to load files in two database tables (with tMysqlBulkExec for example) and then perform a join query with tMysqlInput (don't forget to have an index on the lookup key).

amaumont and slanglois have recently implemented a sorted join in 2.4.0RC1 and Java project. If you give it a try, I'm sure they are interested in any feedback from users.

timson1
2008-05-14 23:32:28

Hello,

I have 2 really big files that won't fit in memory and I am trying to join these files by the common key.

It is not possible for me to use ELT for some reason.

I was trying to find something like "Map Sorted" component that would allow to join data without loading it into memory before joining, but could not find anything.
So what's the best way to do this kind of a job?

Thank you!

Board footer

Powered by FluxBB