• Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » How to handle special char �a�� as db not supporting it

#1 2011-06-13 06:04:51

Vicky
New member
Registered: 2011-04-12
Posts: 3

How to handle special char �a�� as db not supporting it

We are using sybase database and it does not support  special characters like �a��  . 
On Inserting these getting sql exception.

It seems that these special characters are unicode and our database does not support Unicode.

How can we skip records containing such special characters. Is there any setting in Talend which can help or any workaround.

I am totally stuck as source of file is not in our control and they keem on sending junk characters that fails our process. Please assist.

Offline

#2 2011-06-13 10:50:04

alevy
Member
Registered: 2009-11-20
Posts: 1478

Re: How to handle special char �a�� as db not supporting it

You could use a tFilterRow with the advanced condition:
  input_row.columnName.matches("[A-Za-z0-9]*")

This will allow through only those rows where the field columnName comprises only the characters A to Z, a to z or 0 to 9 in any combination. (It's a regular expression check.)  Just add any additional characters you want to allow between the square brackets.

But first make sure that you've set the encoding for the file and database components correctly.

Offline

#3 2011-06-13 12:52:55

walkerca
Member
Company: Bekwam, Inc.
Registered: 2011-01-12
Posts: 253
Website

Re: How to handle special char �a�� as db not supporting it

Hi,

I wrote a routine called BRules.toCharset() that will convert the characters.  It's on the Talend Exchange.  toCharset() can convert Unicode "down" to US-ASCII, ISO-8859-1, and Windows Latin-1.  The function's primary benefit is its ability to replace an unmappable character with a space or another character of your choice, making it more presentable.

http://www.talendforge.org/exchange/tos … hp?eid=354

If you don't want to bother with the function, you can call the following on your String.  However, you'll have a bunch of question marks for the unmappable chars.

String s = "Hello, World!ñ\u0100\u2122";
System.out.println( new String(s.getBytes("US-ASCII")) );
// result=Hello, World!???

-Carl

Last edited by walkerca (2011-06-13 12:53:26)


Visit bekwam.blogspot.com for Talend topics and tutorials.  Twitter @bekwaminc for updates.

Offline

  • Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » How to handle special char �a�� as db not supporting it

Board footer

Powered by FluxBB