#1 2012-06-12 07:07:20

darian3118
Member
Registered: 2012-05-09
Posts: 17

What delimiter to use with hadoop files

I'm trying to parse a flat file with a special delimiter that I pulled from hadoop. It's called the ^A character but that didn't work.
I've already tried:
“\u0007″
These don't work in talend: \u0000, '\u0000', "\u0000", ^@, ^A.

Anyone got this working?

Offline

#2 2012-06-21 02:04:06

darian3118
Member
Registered: 2012-05-09
Posts: 17

Re: What delimiter to use with hadoop files

More attempts that don't work:
'\0'+""  and '\0'.  Notepad++ shows this nonprintable character as SOH character.  Any ideas how to properly identify this?

Offline

#3 2012-06-21 03:03:32

darian3118
Member
Registered: 2012-05-09
Posts: 17

Re: What delimiter to use with hadoop files

Ok finally got it thanks to http://mindprod.com/jgloss/ascii.html.
for tFileInputDelimited set the delimter as:  Character.toString('\1') . 
It chokes if you just put in '\1' so you have to turn it into a String first.

Last edited by darian3118 (2012-06-21 03:03:56)

Offline

Board footer

Powered by FluxBB