• Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » Encoding in perl file input (at least) components

#1 2008-02-18 18:39:15

btence
New member
Registered: 2008-02-18
Posts: 9

Encoding in perl file input (at least) components

Using TOS 2.3.0.

I have been trying to read a file using Perl File Input components. The file encoding is (at least seems to be) ISO-8859-1.

I can't manage to read correctly a file using tFileInputDelimited (or a tFileInputPositional) with perl component linked to a tLogRow ("é" caracters are displayed as ? with black background).
I tried to change the "encoding type" in the "Advanced settings" tab but nothing changed.

I did the same job in java and everything is OK.

I am not an expert about encoding. So is this a bug or is there something I should have done and didn't ?

Thanks for your interest.

Brice

Offline

#2 2008-02-19 02:59:02

shong
Talend team
Registered: 2007-08-29
Posts: 10310
Website

Re: Encoding in perl file input (at least) components

Hi

Can you show your file and what's the result of your job?

Best regards

          shong


Email:shong@talend.com
Choose Talend, Enjoy Talend!
New & Event: Talend Help Center
Talend-->the leader of open source data management and application integration solutions!

Offline

#3 2008-02-19 09:29:33

btence
New member
Registered: 2008-02-18
Posts: 9

Re: Encoding in perl file input (at least) components

To give a try, I use a one line file which contains (encoded in ISO-8859-1) :

Code:

 Station Piézométrique

Here is the result of the perl job :

Code:

Starting job read_file_encoded_8859 at 09:05 19/02/2008.
 Station Pi�zom�trique
Job read_file_encoded_8859 ended at 09:05 19/02/2008. [exit code=0]

Here is a result of the Java job :

Code:

Starting job read_file_encoded_8859_java at 09:16 19/02/2008.
 Station Piézométrique
Job read_file_encoded_8859_java ended at 09:16 19/02/2008. [exit code=0]

BTW, I am running under linux (Ubuntu Getsy).


Uploaded Images

Last edited by btence (2008-02-19 10:49:52)

Offline

#4 2008-02-19 13:49:43

plegall
Member
Registered: 2006-09-19
Posts: 1586
Website

Re: Encoding in perl file input (at least) components

As you can read in [Forum, topic 1262] Not able to preview UTF-16LE (w/BOM) encoded CSV file (see topics related to tags perl + encoding), the encoding property is not used in the generated code for tFileInput* and tFileOutput*. We have to work on it. I've just created [Bugtracker, feature 3130, fixed] [tFileInputPositional, tFileOutputDelimited] manage encoding. I advise you to monitor this feature to know when it will be resolved.

Your JVM under GNU/Linux is utf-8, so if you send latin1 encoded characters, you get mud, you have to tell Perl to send utf8 characters to STDOUT. For your current problem, bad characters in standard output are not the real problem I suppose, but here is a workaround: add a tPerl before (with OnSubjobOk trigger link) tFileInput* with the following code:

Code:

binmode(STDOUT, ':utf8');

Uploaded Images

Offline

#5 2008-02-19 14:23:15

btence
New member
Registered: 2008-02-18
Posts: 9

Re: Encoding in perl file input (at least) components

plegall wrote:

For your current problem, bad characters in standard output are not the real problem I suppose,

Yes indeed. I wanted to use a tReplace component on that link. I will use a different regexp or Java for the moment.

Sorry for not having found the other topic about that issue. But now there is an entry in the bugtracker ... Some day TOS's prince will come, some day he'll solve that bug.

Last edited by btence (2008-02-19 14:47:54)

Offline

  • Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » Encoding in perl file input (at least) components

Board footer

Powered by FluxBB