How to use a Multi Schema component How to use the Multi Schema Editor.

This tutorial explains how to use the Multi Schema Editor. In the exercise below, you will create a job which reads complex multi-structured data via a tFileInputMSDelimited component.

Set the tFileInputMSDelimited component properties to open a complex multi-structured file, to read its data structures (schemas) and to send the data specified in the different schemas to the next job components, via row links.

In this tutorial, the multi schema delimited file is read row by row and the fields extracted are displayed in the Run Job console as specified in the Multi Schema Editor.

Prerequisites:
To follow this tutorial, you need to extract and import the myExample.csv file from the exampleFile.zip file available for download in the Download it! section of this tutorial.

Download it!

You want to practice?

Download exampleFile.zip to get the files used for this tutorial.

You can also download tutorialProject.zip containing all the jobs needed to carry out this tutorial.

You can also:
Send it!

Share it!
Next Step: Take time to practise and repeat these exercises!

 


Create your Job


In the Repository on the left :

Right-click Job Designs.

In the menu, click Create Job to open the New Job wizard.
Next

In the New Job wizard:

In the Name field, enter the name of your Job: HowtotFileInputMSDelimited.

Click Finish to close the wizard and create your Job.

The Job Designer opens an empty Job

Next
Set the properties for the Multi Schema input component


In the Palette on the right:

To add the input component, click the File family and then the Input sub-family.

Click tFileInputMSDelimited and drop it on the Job Designer.

Next
Double-click the component to open the schema editor, or click the component and then Multi Schema Editor.

Next

In the schema editor, set the file properties:
- File name: path and name of your file.
- Encoding: select your file's encoding type.
- Use multiple separators: check this box if the separators are different from one schema to another.
- Multiple Separators: fill in the separator characters used in your schemas. (In this field, use a comma as separator)
- Key Values: enter all of the values which enable identification of your file schemas. (In this field, use a comma as separator)
- Key Index: enter the position of the key value columns.

Once you have provided all of the file information, you can preview it by clicking on Preview.

Next
If the data preview matches your file schemas, click Fetch Codes to the right.

Every schema has an associated code.

Next
Select a schema to display its properties:
- Name: the column name.
- TagLevel: indicates the schema level if there is no parent-child relationship between the schemas.
- Type: data type of the corresponding column.
- Length: length of the column.
- Pattern: data pattern of the column, if required.

Next


When you define the data structure for each of the output schemas in the Multi Schema Editor, the column names of the different schemas are automatically passed on to the next components.

However, you can still define data structures directly in the Basic settings view of each of these output components.

Configure the output components


To add the output components, click Logs & Errors in the Palette.

Click tLogRow and drop it on the Job Designer.

Repeat this operation as many times as required until you have a tLogRow component for every schema in your file (in this example two tLogRow components are required).

You can also use tFileOutputDelimited components for the output data flow instead of tLogRow components.

Next
In the Job Designer:

Right-click tFileInputMSDelimited and select Row from the menu. You will see a row link for every input schema defined.

Next

In the Job Designer:

To link the components together, right-click tFileInputMSDelimited_1, select one of the rows and click the target tLogRow component.

Next
Click each tLogRow component and synchronize the input and output schemas by clicking the Sync columns button.

Next
Run your Job


In the Job Designer:

Press Ctrl+S to save the Job.

Press F6 to run it (or click Run in the Run view).

The Run view appears at the bottom of Talend Open Studio and the console follows the execution of the Job.

Next

The file data is correctly split into two flows, one for each schema.

You now know how to use this component to read multi-structured delimited files and how to separate the fields in these files using a specified separator.



In this screenshot, the Statistics feature is enabled. You can also select the Traces feature to follow the separation of the two schemas in real time, in the Job Designer.

  Next Step: Take time to practise and repeat these exercises!

 

    Download it!     Send it!     Share it!

You want to practice?

Download exampleFile.zip to get the files used for this tutorial.

You can also download tutorialProject.zip containing all the jobs needed to carry out this tutorial.

Friends / colleagues may be interested in this tutorial? Send it to them!

You liked this tutorial ? Support it!

[ top ]