• Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » Change the format of an XML file using Talend

#1 2012-05-25 17:50:35

sdel
Member
Registered: 2012-05-25
Posts: 86

Change the format of an XML file using Talend

Tags: [development, xml]

I am trying to convert an XML catalogue from one xml format to another.
I receive the catalogue in the format below.
It is in two parts a list of items and a list of product questions (ProdQU) with there answers (ProdAn).
There are a maximum of two questions per item.

<Catalogue>
<Item id = “101”>
<ProdQu="1002"/>
</Item>
<Item id = “102”>
<ProdQu="1002"/>
<ProdQu="1003"/>
</Item>

<Question ProdQu ="1002" Descriptor="Wall colour" >
        <Option ProdAn ="20003" Value=" Green" "/>
        <Option ProdAn ="20004" Value=White"/>
</Question>
<Question ProdQu ="1003" Descriptor="Number of doors" >
        <Option IDAn="20005" Value="two doors"/>
    <Option IDAn="20006" Value="four doors/>
        <Option IDAn="20007" Value="six doors/>
  </Question>
< /Catalogue >

I want to end up with the XML format below.
With the items with a list of  sub categories (subcat) that are made up of the itemID with every possible question answer combination in the following format.
itemID_ProdQU_ProdAn
or
itemID_ProdQU_ProdAn_ProdQU_ProdAn


< Catalogue >
<item ID = 101>
<SubCat>101_1002_20003</subCat>
<SubCat>101_1002_20004</subCat>
</item >
<item ID = 102>
<SubCat>102_1002_20003_1003_20005</subCat>
<SubCat>102_1002_20003_1003_20006</subCat>
<SubCat>102_1002_20003_1003_20007</subCat>
<SubCat>102_1002_20004_1003_20005</subCat>
<SubCat>102_1002_20004_1003_20006</subCat>
<SubCat>102_1002_20004_1003_20007</subCat>
</item >
< Catalogue >
I have tried doing this with tFileInputXML's and tMaps but cannot get it to work.
Thanks for any help.

Offline

#2 2012-05-25 19:32:26

phobucket
Member
Company: Knoetry
Registered: 2010-07-27
Posts: 146
Website

Re: Change the format of an XML file using Talend

Hi sdel,

Welcome to the forum.

The component your job is missing is called tDenormalize.  I think this flow would work:

tFileInputXML --> tMap (create SubCat)--> tDenormalize (denormalize SubCat)--> tMap (add item_ID to SubCat) -->tFileOutputXML (convert back to XML)

In your tMap have 2 fields on the output, item_ID and SubCat.   
For SubCat set the expression to:  row1.ProdQu + "_" + row1.ProdAn

Then sent this to a tDenormalize and denormalize the subcat using "_" as the delimiter.

Then send to a tMap to add the item_ID to the beginning of the SubCat
For SubCat set the expression to:  row2.item_ID + "_" + rowx2.SubCat

Then send the output of the second tMap to a tFileOutputXML

Offline

  • Index
  •  » Talend Open Studio for Data Integration » Usage, Operation
  •  » Change the format of an XML file using Talend

Board footer

Powered by FluxBB