#1 2009-05-04 16:15:19

zrl@MWV
New member
Registered: 2009-05-04
Posts: 3

Parsing pretty text files

Tags: [file, parse, text]

I am working on a project to parse a text file, the problems I face are the following: 1. The files format and content change slightly at the whim of the person generating the report (reason I am shooting down writing custom code to do the parsing), 2. The TXT file is made to be human readable, in other words pretty; all lined up so the delimiter (spaces) varies depending on the length of the data. 3. The file contains three main parts the first two are in the following format:"Name Data   Name Data     Name Data" and the third in this format: Name    Name    Name
                               Data     Data      Data

I started looking into this software because unlike myself the main users are not code monkeys so I figured with the graphical interface making a small change to the parsing would be pretty simple. What I am looking for from this post is a direction and maybe some ideas, what components would be the best fit for parsing the two formats of data that would be easily changeable by non-programmers. Usually this would be no problem but since it changes on a whim and non code monkeys have to keep up with the changes this has become a bit more difficult; look forward to hearing some input.

Thank You,

Zachary Long

Last edited by zrl@MWV (2009-05-04 16:21:34)

Offline

#2 2009-05-04 16:25:27

shong
Talend team
Registered: 2007-08-29
Posts: 10310
Website

Re: Parsing pretty text files

Hello friend

Can you show us an example of content of file?

Best regards
   
         shong


Email:shong@talend.com
Choose Talend, Enjoy Talend!
New & Event: Talend Help Center
Talend-->the leader of open source data management and application integration solutions!

Offline

#3 2009-05-04 16:37:52

zrl@MWV
New member
Registered: 2009-05-04
Posts: 3

Re: Parsing pretty text files

Here is a piece of one of the many files I need to parse this shows the three sections, also there are many of these per file, separated by "END OF REPORT", which I figure will not be to hard to implement to separate reports. Something that I did forget to mention is that there are several reports per input file, ultimate goal will be to combine all data using the data as a delimiter.




Zachary Long


Uploaded Images

Last edited by zrl@MWV (2009-05-04 16:41:59)

Offline

#4 2009-05-06 21:56:50

zrl@MWV
New member
Registered: 2009-05-04
Posts: 3

Re: Parsing pretty text files

Hello all, I am guessing by the lack of response that you everyone is just as stumped as I am ?

Thanks

Zachary Long

Offline

#5 2009-05-06 23:54:53

tnewman
Member
Company: Lunexa
Registered: 2008-11-15
Posts: 194
Website

Re: Parsing pretty text files

Hi Zachary,

Here's my 2 cents worth.

I am assuming you want each report to be a single output record (basically a many input -> one output scenario).

I would define the input at space (' ') delimited and output to a delimited file (';'). You will end up with a file with up to 7 (i think I counted correctly) columns.

Since no two lines are the same, I would then use tJavaRow to build the output record. If you do a search on talend forum you can find examples of this.

You will need check field 1 on some lines ( ie 'for MACHINE) and field 4 on others (i.e TRIM). You may also need to concatenate some fields back together to get output you need.

Since there are several reports in one input file, you will need to generate a simple sequence number for each report, and also each line of each report.

Then you can sort by report/line number (descending) and then use TUniqRow on report number (checking the 'Only once each duplicated key' option under the Advanced Tab).

It's not pretty but neither is the input.

Give it a go. If you have problems maybe you can copy/paste a file sample instead of image. I might have time to see if I can get it to work.

Bye for now,


------------------
Talend Version - TIS 4.1.2
Generated Code - Java
OS - WinXP SP3 / Linux

Offline

#6 2009-05-07 01:01:38

JohnGarrettMartin
Member
Registered: 2009-01-07
Posts: 762

Re: Parsing pretty text files

regex based Perl parsing would be a great fit for this problem.  you can use clever regex's to locate your position in the file, and then parse out the data you need.

If you're stuck with Java check out this package:

http://java.sun.com/j2se/1.4.2/docs/api … mmary.html

http://java.sun.com/developer/technical … /1.4regex/

Offline

Board footer

Powered by FluxBB