• Index
  •  » Usage, Operation
  •  » [resolved] tFileInputDelimited not reading international characters (UTF-8)

Post a reply

Write your message and submit

Options

Click in the dark area of the image to send your post.

Go back

Topic review (newest first)

dadumas
2012-01-25 23:34:22

So, here is what I did to resolve:

I brought up the source file in Firefox (file open). Then rt-click - View page info.  This shows the character encoding.  I saw that this file was actually encoded as UTF-16, not UTF-8, as I ad thought.  I then changed the tFileinput to custom "UTF-16" in Talend, and it worked fine.

Dave

dadumas
2012-01-24 16:28:36

I can.  However, I think the issue however, is that Talend is not resolving UTF8 encoded data.  In the screenshots, there are characters that Talend cannot resolve.  I struggle with this however, as I cannot find any posts that also share this problem.

janhess
2012-01-24 10:29:09

Have you tried giving your ACD_No a size?

dadumas
2012-01-24 00:50:57

Shong,

Well, the first column is a short (Integer).

I changed the first column to a string in the FileInput, and added a tConvertType, after the FileInput.  In the tConvertType, I convert the first column from string to short.

I now get a new error (new "Convert" "screenshots attached)

Dave

shong
2012-01-21 02:22:29

Hi Dave

From the error message, we can see that it is a Number Format exception throws on tFileInputDelimited_2, one of columns is read using Integer/int data type. Try to change it to string data type.

Best regards
Shong

dadumas
2012-01-20 22:58:47

"GBK" did not work.  I have escalated this to Talend support.
thanks,
Dave

dadumas
2012-01-19 20:31:29

thanks - I am investigating another issue with this.  If that does not work, I will definitely try this.  In any case, I will keep this post updated.

Thanks!

Dave

pedro
2012-01-19 06:17:56

Hi

Try this. Set Encoding "Custom"->"GBK".

Regards,
Pedro

dadumas
2012-01-18 16:41:56

I am reading utf-8 encoded CSV text files, but am getting errors when reading the file with tFileInputdelimited.  Once this is working, these will be saved (via a tmap -> tOracleOutput), to Oracle 11g.  I am not sure if I then need to set advanced options on the tOracleOutput. The oracle db has been configured to store muti-byte characters.

Probably something simple I am missing.

I have attached screenshots.

Dave

Board footer

Powered by FluxBB