• Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » [resolved] Read HTM tags From XML

#1 2012-05-15 06:55:35

umeshrakhe
Member
Registered: 2011-03-14
Posts: 155

[resolved] Read HTM tags From XML

Tags: [encoding, html, schema, xml]

Hi
i have xml file which contains HTML tags in in one of the column. but while creating schema of XML it removes all HTML from that column. please suggest how to solve this issue.


Skype: macwintuch

Offline

#2 2012-05-15 07:19:18

pedro
Member
Registered: 2011-11-17
Posts: 3682

Re: [resolved] Read HTM tags From XML

Hi

Could you show us more data for reproducing this issue?

Regards,
Pedro


Only Paranoid Survive.

Offline

#3 2012-05-15 08:55:50

umeshrakhe
Member
Registered: 2011-03-14
Posts: 155

Re: [resolved] Read HTM tags From XML

Here i have modified some text and pasted here but all the XML tags is same as we have in original XML except HTML Tags.  see the <GoodsBODY> for HTML tags.

<GOODS>
<Good>
<GoodsTITLE>
<![CDATA[på Force Measurement]]>
</GoodsTITLE>
<NUMBER>
<![CDATA[51728536]]>
</NUMBER>
<GoodsCOMPANY><![CDATA[Tvärleden 2]]></GoodsCOMPANY>
<GoodsLOCATION><![CDATA[Västerås/018/Sverige]]></GoodsLOCATION>
<GoodsACTION><![CDATA[add]]></GoodsACTION>
<GoodsBODY><![CDATA[&lt;p&gt;ABB <table border="0" cellspacing="0" cellpadding="0" bgcolor="#FFFFFF">
  <tr>
    <td valign="bottom" rowspan="4" align="right"><img src="images/homepage/top24.gif" width="121" height="125" border="0"></td>
    <td valign="bottom" rowspan="4" align="right" height="125"><img src="images/homepage/top23new1.gif" width="286" height="125"></td>
  </tr>
  <tr>
    <td valign="bottom" height="41" align="right" bgcolor="#FFFFFF"> </td>

    <td valign="bottom" height="41" align="right" bgcolor="#FFFFFF"></td>
  </tr>
  <tr>
    <td valign="bottom" align="left" height="65" colspan="2" bgcolor="#000066"><img src="images/homepage/top23new2.gif" width="406" height="65" usemap="#MapMap" border="0"></td>
  </tr>
  <tr>
    <td valign="bottom" align="left" bgcolor="#000066" height="19" colspan="2">&nbsp;
    </td>
  </tr>
</table>
<map name="MapMap">
  <area shape="rect" coords="277,9,349,25" href="Contactus.htm" alt="Contact Us" title="Contact Us">

  <area shape="rect" coords="9,37,49,51" href="index.asp">
</map> &lt;/p&gt;&lt;br&gt;]]></GoodsBODY>
<AREA><![CDATA[Projektledning]]></AREA>
<CONTRACT><![CDATA[Regelbundet/Permanent]]>
</CONTRACT>
<EMP_FRACTION><![CDATA[Skiftarbete]]></EMP_FRACTION>
<PUB_START_DATE><![CDATA[20120420]]></PUB_START_DATE>
<DE_FOR_APP><![CDATA[20120513]]></DE_FOR_APP>
<LAN><![CDATA[SV]]></LAN>
<COUNTRY><![CDATA[Sverige]]></COUNTRY>
<REGION_STATE><![CDATA[018]]></REGION_STATE>
<CITY><![CDATA[Västerås]]></CITY>

Last edited by umeshrakhe (2012-05-15 08:57:43)


Skype: macwintuch

Offline

#4 2012-05-15 09:14:36

pedro
Member
Registered: 2011-11-17
Posts: 3682

Re: [resolved] Read HTM tags From XML

Hi

Got you. Because this xml file is not complete. I can't reproduce it.
But in my opinion, Talend can handle XML file which is embedded with HTML well.
You need to add <![CDATA[]]> correctly.
So I think you'd better recheck your XML file. Or can you open them with browser?

Regards,
Pedro


Only Paranoid Survive.

Offline

#5 2012-05-17 06:42:14

umeshrakhe
Member
Registered: 2011-03-14
Posts: 155

Re: [resolved] Read HTM tags From XML

Hi Pedro,
i pasted wrong XML this is the correct one, here HTML Tags from <BODY> encoded, so we need to decode it in its original form (HTML)

<WORKS>
<WORK>
<WORKTITLE>
<![CDATA[Ledande El-konstruktör för]]>
</WORKTITLE>
<WORKNUMBER>
<![CDATA[51853370]]>
</WORKNUMBER>
<COMPANY><![CDATA[Tvärleden 2]]></WORKCOMPANY>
<LOCATION><![CDATA[Västerås/018/Sverige]]></WORKLOCATION>
<ACTION><![CDATA[add]]></ACTION>
<BODY><![CDATA[&lt;p&gt;Some Text.&lt;/p&gt;&lt;br&gt;&lt;p&gt;Some Text!&lt;/p&gt;&lt;br&gt;]]></BODY>
<F_AREA><![CDATA[Design och teknik]]></F_AREA>
<EMP_CONTRACT_TYPE><![CDATA[Regelbundet/Permanent]]></EMP_CONTRACT_TYPE>
<EMP_FRACTION><![CDATA[Skiftarbete]]></EMP_FRACTION>
<PUB_START_DATE><![CDATA[20120514]]></PUB_START_DATE>
<DE_FOR_APPLICATIONS><![CDATA[20120605]]></DE_FOR_APPLICATIONS>
<LAN><![CDATA[SV]]></LAN>
<COUNTRY><![CDATA[Sverige]]></COUNTRY>
<RE_STATE><![CDATA[018]]></RE_STATE>
<CITY><![CDATA[Västerås]]></CITY>
<NAME><![CDATA[Ingela Olsson]]></NAME>
<PHONE><![CDATA[+46 21 32 50 00]]></PHONE>
<TIMESTAMP><![CDATA[2012-05-14 14:38:52]]></TIMESTAMP>
</WORK>
</WORKS>

please suggest how to decode below text into its original HTML Form.

<BODY><![CDATA[&lt;p&gt;Some Text.&lt;/p&gt;&lt;br&gt;&lt;p&gt;Some Text!&lt;/p&gt;&lt;br&gt;]]></BODY>

Last edited by umeshrakhe (2012-05-17 06:43:50)


Skype: macwintuch

Offline

  • Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » [resolved] Read HTM tags From XML

Board footer

Powered by FluxBB