Post a reply

Write your message and submit

Options

Click in the dark area of the image to send your post.

Go back

Topic review (newest first)

Piero001
2012-06-12 16:41:16

Hey arno,

that seems to be a problem. I think the ID resets after the cash machine restarts... damn

Hmm maybe we can try another method: Setting the DATE as uniqe identifier seem to sufficent, because I want to calculate the daily earnings, so a Serial Number for each receipt is not really necessary, if talend can show the Sum of all Netto1 and Netto2 for each day.

Do you think that this is possible??

best regards and lot of thanks,

Piero

avdbrink
2012-06-12 16:05:14

Hey Piero,

I analyzed your Schema_1.txt file and noticed that there are 2 "records" with an ID of #0001
Is this a mistake or can't we assume that this is a unique identifier for your records?

Still puzzling on a solution though.... it sure ain't an easy job...

Regards,
Arno

Piero001
2012-06-12 14:40:57

Hey Arno!

No that's all my file shemas. I have over 100.000 input files which are just shema_1 or shema_2. But I want one output file that looks like target_out.

Hope this is not too much work for you! But it is great that you can help me. I'm very thankfull!

thank you and regards,

Piero

avdbrink
2012-06-12 13:42:33

Hi Piero,

I got you email. Are there more files you'd like to send? Because I've already downloaded the two schema files and the target_out file.

I'll see if I can start a sample job.

Regards,
Arno

Piero001
2012-06-12 12:03:17

Hey Arno,

Did you get my email? Now, here are two example of the input file and the target output

Schema_1.txt
Schema_2.txt
target_out.csv

Regards,

Piero

avdbrink
2012-06-12 10:47:43

Hi Piero,

I'd be glad to help you out if you could supply an input file to your job.

I could try to make a working sample of the job if you'd like.

Regards,
Arno

Piero001
2012-06-12 10:20:25

Hey Arno,

thank you again for your help smile

The unique identifier of the records is 'Number'... in the screenshot it starts with '#' (top right corner)

I test the job with a delimiter of "---------------" and it is working, but i don't know how to work with tExtractPositionalFileds...

It would be the best to extract as much informations from each receipt as it is possible... because of the mass amount of data, I can't get secure how the shema of each receipts looks like..

This job is an advanced version of the job i did before (topic:24266)... The mission is again to calculate the earnings without the canceled receipts.

Do you think it is possible to work with talend in that case??

Best regards and thank you for all your help

Piero

avdbrink
2012-06-11 16:28:08

Hi Piero,

First of all: this is a challenge indeed wink

Second comes the first question: what is the unique identifier of the records? Is is the Number column in your target schema? And if so, where to find it in your screenshot?

Third: What if you read the "records" as if it is a delimited file with "------------------------------" as row seperator with just one column, than use a tExtractPositionalFields (or any of the other Extract components) to read the "complete" record from the single column?

I'm not sure this will work however smile

Regards,
Arno

Piero001
2012-06-11 15:45:30

Hey Talend Community,

I want to show you my ultimative challenge in talend.
I want to extract receipt data into a *csv file.

On the screenshot you can see the input file (TXT) and the output typ.

I think I should work with file positional Metadata... but I don't know how to create such an shema...

Any Ideas how to create a job that can do something like this?


Best regards,

Piero

Target Shema:

Code:

Person;Number;Date;Time;Netto1;VAD 7%;Netto2;VAD 19%;
Mr. Random;0001;DD/MM/YYYY;12:00;4,19;0,29;1,39;0,26;
Ms. Random;0002;DD/MM/YYYY;12:02;1,51;0,11;3,53;0,67;

Board footer

Powered by FluxBB