How to sort a file Create a simple Job Design sorting data.

This tutorial explains how to create a job to read data from a Delimited File, write it into a temp file, and replace the source file with the temp file. In the tutorial, only "Built-In" Schemas are used (not stored in the Repository, but related only to this Job design).

Note that you can create the same Job using Metadata files stored in the Repository.

Prerequisites:
To follow this tutorial, you need to extract and import the customer.csv file from the exampleFile.zip file available for download in the Download it! section of this tutorial.

Download it!
Send it!
Share it!
Next Step: Refer to the [How to set up a Join link on a Job Design ] tutorial to learn more about the Job Designer.

 


Create a Job Design


In the Repository on the left of Talend Open Studio main screen:

Right-click on Job Designs.

In the menu, click on Create Job to open the New Job wizard.

Next

In the New Job wizard:

In the Name field, fill in the name of the Job: howToSortFile.

Click Finish to close the wizard and create the Job.

The Job Designer opens an empty Job.

Next


In the Name field, accents, special characters and spaces are invalid. Also do not use numbers to start the field.

Set the connector reading the delimited file parameters


In the Palette on the right:

To add an input component, click the File family and the Input sub-family.

Click the tFileInputDelimited component and drop it on the Job Designer.

Next

In the Job Designer:

Double-click the tFileInputDelimited to show the corresponding Component view to define its Basic settings.

In the Component view:

To specify the path to the customer.csv file, click [...] next to the File Name field and select the file from the wizard.

To describe the structure of the file, click [...] next to the Edit schema field to open the "Schema of tFileInputDelimited_1" wizard.

Next

Set the structure of the data flow schema


In the Schema of tFileInputDelimited_1 wizard:

To describe the columns of the customer file, click (+) nine times. Nine lines are added to the schema, you can set them according to your file as shown in the next step.

Next


For schema with multiple columns, you should use Metadata.

In the Schema of tFileInputDelimited_1 wizard:

In the Column column, rename each field according to the file columns.

In the Type column, set the type field for each column.

In the Length column, fill in the length of each field of your schema.

Click Ok to close the wizard.

Next

Set the connector writing in the delimited file parameters


In the Palette on the right:

To add the output component, click on the Output sub-family.

Click on the tFileOutputDelimited component and drop it on the Job Designer.

Next

In the Job Designer:

Double-click tFileInputDelimited to show the corresponding Component view to define its Basic settings.

In the Component view:

To specify the path of the file you are creating, click [...] next to the File Name field.

In the wizard, define the same path as for the customer.csv file but name it temp.csv.

Check the Include Header box to retrieve the column names of the file.

Next
Define the processing component and link the components

In the Palette on the right:

To add the component sorting the data, click on the Processing family.

Click the tSortRow component and drop it on the Job Designer.

Next

In the Job Designer:

To link the components, right-click on tFileInputDelimited, hold and drag to the tSortRow.

Do the same to link the tSortRow to the tFileOutputDelimited.

Next


You can also right-click on the component and select Row > Main on the right-click menu to link the components.

In the Job Designer:

Double-click on the tSortRow to show the corresponding Component view to define its Basic settings.

In the Component view:

Define the sorting criteria by clicking (+) to add a line to the Criteria table.

Select the column you want to sort as shown in the screenshot.

Next
  

At this point, the Job will create a new file named temp.csv containing the sorted data.

As the purpose of the Job was to sort the source file and not to create a new one, we have to replace the source file by the new one.

Next
  
Define the file managing component and link it to the first subjob

In the Palette on the right:

To replace the source file with the new one, click the File family and the Management sub-family.

Click the tFileCopy component and drop it on the Job Designer, under the tFileInputDelimited component.

Next

In the Job Designer:

To link the first Subjob to the tFileCopy component right-click on tFileInputDelimited and select Trigger > OnSubjobOk from the menu.

Click on tFileCopy to draw the OnSubjobOk link.

In the Job Designer:

Double-click on tFileCopy to show the corresponding Component view to define the Basic settings.

Next

In the Component view:

To copy the temp.csv file containing the sorted data, click [...] next to the File Name field and specify the file path.

To specify the folder in which you want to copy the file, click [...] next to the Destination directory field and select the file path of the customer.csv source file.

To replace the source file with the sorted file, check the Rename box and enter customer.csv between quotes.

To delete the temporary file, check the Remove source file box.

Next

Run the Job


In the Job Designer:

Press Ctrl+S to save the Job.

Press F6 to run it.

The Run view displays at the bottom of Talend Open Studio and the console follows the Job execution.

Next


Check the Statistic box in the Run view and run this Job again: this option will show you how the Subjobs are orchestrated.

  

The howToSortFile Job is working!

It comprises two Subjobs:
- sorting data in a temporary file,
- replacing the source file by that temporary file.

Now you have to document it!

Next
  

Document the Job


In the Job Designer:

To document your Job, add a title to each Subjob.

To do so, click in the blue area around the first Subjob.

Click the Component view.

Check the Show subjob title box and, in the Title field, fill in the corresponding title: Sorting data in a new file.

Title the second Subjob Replacing the source file.

Save the Job again.

Next
  

This tutorial is finished.

The Job is working and it's documented.

It's your turn now!

  
  Next Step: Refer to the [How to set up a Join link on a Job Design ] tutorial to learn more about the Job Designer.

 

    Download it!     Send it!     Share it!

You want to practice?

Download exampleFile.zip to get the files used for this tutorial.

You can also download tutorialProject.zip containing all the jobs needed to carry out this tutorial.

Friends / colleagues may be interested in this tutorial? Send it to them!

You liked this tutorial ? Support it!

Thank you for your interest in the Talend Tutorials. The first two Tutorials you selected could be viewed freely, however we request that you register now to view additional Tutorials. Please fill out the form below for unlimited access to the Talend Tutorials.
Salutation:   
*First Name:
*Last Name:
Job Title:
*Company:
*Country:  
*Business Email:
*Phone:
Do you have an integration project?    
What is your primary interest?  
Comments:

 Note: fields marked with * are required.
[ top ]