You are not logged in.
Using TOS 2.3.0.
I have been trying to read a file using Perl File Input components. The file encoding is (at least seems to be) ISO-8859-1.
I can't manage to read correctly a file using tFileInputDelimited (or a tFileInputPositional) with perl component linked to a tLogRow ("é" caracters are displayed as ? with black background).
I tried to change the "encoding type" in the "Advanced settings" tab but nothing changed.
I did the same job in java and everything is OK.
I am not an expert about encoding. So is this a bug or is there something I should have done and didn't ?
Thanks for your interest.
Can you show your file and what's the result of your job?
To give a try, I use a one line file which contains (encoded in ISO-8859-1) :
Here is the result of the perl job :
Starting job read_file_encoded_8859 at 09:05 19/02/2008. Station Pi�zom�trique Job read_file_encoded_8859 ended at 09:05 19/02/2008. [exit code=0]
Here is a result of the Java job :
Starting job read_file_encoded_8859_java at 09:16 19/02/2008. Station Piézométrique Job read_file_encoded_8859_java ended at 09:16 19/02/2008. [exit code=0]
BTW, I am running under linux (Ubuntu Getsy).
Last edited by btence (2008-02-19 10:49:52)
As you can read in [Forum, topic 1262] Not able to preview UTF-16LE (w/BOM) encoded CSV file (see topics related to tags perl + encoding), the encoding property is not used in the generated code for tFileInput* and tFileOutput*. We have to work on it. I've just created [Bugtracker, feature 3130, fixed] [tFileInputPositional, tFileOutputDelimited] manage encoding. I advise you to monitor this feature to know when it will be resolved.
Your JVM under GNU/Linux is utf-8, so if you send latin1 encoded characters, you get mud, you have to tell Perl to send utf8 characters to STDOUT. For your current problem, bad characters in standard output are not the real problem I suppose, but here is a workaround: add a tPerl before (with OnSubjobOk trigger link) tFileInput* with the following code:
For your current problem, bad characters in standard output are not the real problem I suppose,
Yes indeed. I wanted to use a tReplace component on that link. I will use a different regexp or Java for the moment.
Sorry for not having found the other topic about that issue. But now there is an entry in the bugtracker ... Some day TOS's prince will come, some day he'll solve that bug.
Last edited by btence (2008-02-19 14:47:54)