You are not logged in.
Announcement
Unanswered posts
|
Pages: 1

Hey Talend Community,
I want to show you my ultimative challenge in talend.
I want to extract receipt data into a *csv file.
On the screenshot you can see the input file (TXT) and the output typ.
I think I should work with file positional Metadata... but I don't know how to create such an shema...
Any Ideas how to create a job that can do something like this?
Best regards,
Piero
Target Shema:
Person;Number;Date;Time;Netto1;VAD 7%;Netto2;VAD 19%; Mr. Random;0001;DD/MM/YYYY;12:00;4,19;0,29;1,39;0,26; Ms. Random;0002;DD/MM/YYYY;12:02;1,51;0,11;3,53;0,67;
Last edited by Piero001 (2012-06-11 15:50:50)
Offline
Hi Piero,
First of all: this is a challenge indeed ![]()
Second comes the first question: what is the unique identifier of the records? Is is the Number column in your target schema? And if so, where to find it in your screenshot?
Third: What if you read the "records" as if it is a delimited file with "------------------------------" as row seperator with just one column, than use a tExtractPositionalFields (or any of the other Extract components) to read the "complete" record from the single column?
I'm not sure this will work however ![]()
Regards,
Arno
Offline

Hey Arno,
thank you again for your help ![]()
The unique identifier of the records is 'Number'... in the screenshot it starts with '#' (top right corner)
I test the job with a delimiter of "---------------" and it is working, but i don't know how to work with tExtractPositionalFileds...
It would be the best to extract as much informations from each receipt as it is possible... because of the mass amount of data, I can't get secure how the shema of each receipts looks like..
This job is an advanced version of the job i did before ([Forum, topic 24266] [resolved] tFilter Advanced Mode)... The mission is again to calculate the earnings without the canceled receipts.
Do you think it is possible to work with talend in that case??
Best regards and thank you for all your help
Piero
Offline

Hey Arno,
Did you get my email? Now, here are two example of the input file and the target output
Schema_1.txt
Schema_2.txt
target_out.csv
Regards,
Piero
Last edited by Piero001 (2012-06-12 12:04:35)
Offline
Hi Piero,
I got you email. Are there more files you'd like to send? Because I've already downloaded the two schema files and the target_out file.
I'll see if I can start a sample job.
Regards,
Arno
Offline

Hey Arno!
No that's all my file shemas. I have over 100.000 input files which are just shema_1 or shema_2. But I want one output file that looks like target_out.
Hope this is not too much work for you! But it is great that you can help me. I'm very thankfull!
thank you and regards,
Piero
Offline
Hey Piero,
I analyzed your Schema_1.txt file and noticed that there are 2 "records" with an ID of #0001
Is this a mistake or can't we assume that this is a unique identifier for your records?
Still puzzling on a solution though.... it sure ain't an easy job...
Regards,
Arno
Offline

Hey arno,
that seems to be a problem. I think the ID resets after the cash machine restarts... damn
Hmm maybe we can try another method: Setting the DATE as uniqe identifier seem to sufficent, because I want to calculate the daily earnings, so a Serial Number for each receipt is not really necessary, if talend can show the Sum of all Netto1 and Netto2 for each day.
Do you think that this is possible??
best regards and lot of thanks,
Piero
Offline
Pages: 1