Talend Exchange is the place where Talend community can share items related to Talend opensource products, such as Data Integration, Data Quality and Data Master Management. Contribution is open to any user, no specific validation is needed. As soon as you have your forum account, you automatically get a Talend Exchange account.


Version Author Released on Rating Downloads
Java

tHTTPTableInput

4 dthonon 2009-05-29
253

This Component extracts HTML-Tables from a given URL.

The \'Syntax for Table\' means:
T=3 / Third(fourth) Table inside the page
C=0,0 / The Column at the position 0,0 in the third(fourth) table
T=1 / The first(second) table inside Cell 0,0 from the third(fourth) [optional]

Exemple 1: "T=3;C=0,0;T=1"
Exemple 2: "T=2;C=0,0;" with the change of revision 3
Exemple 3: "T=2;" with the change of revision 3

Routine

Securities Validation

1 hugo 2009-05-25
206

Validate Fixed Income and Equity Securiites.
This routines calculates and compares the check digits for the follow securities

Isin
Sedol
Cusip


Java

tmOracleOutput

1.2 madamovic 2009-05-14
117

This component adds "Delete obsolete" functionality to the existing tOracleOutput component.

"Delete Obsolete" option is enabled for all variants of "Insert and Update" data actions. If Delete Obsolete is turned on, all records from the target table that were not inserted or updated from the input row set, will be deleted. In other words, records from the target table that do not exist in the source set (matched on predefined IDs) will be deleted. This is like full synchronization between the source and the target record set.

Parameters (in addition to the existing tOracleOutput parameters):
- Delete obsolete records – enable deleting the obsolete records.
- Where clause condition for delete obsolete - optional where clause for deleted records. Only records that satisfy the where clause and are not in the input set will be deleted.

The component uses HashSet to store the IDs of the processed records. The limitation is that this is not efficient for large data sets. The component can be further enhanced to use disk implementation of HashSet, or to store the IDs of processed records in a database table.

Java

tmMysqlOutput

1.2 madamovic 2009-05-13
170

This component adds "Delete obsolete" functionality to the existing tMysqlOutput component.

"Delete Obsolete" option is enabled for all variants of "Insert and Update" data actions. If Delete Obsolete is turned on, all records from the target table that were not inserted or updated from the input row set, will be deleted. In other words, records from the target table that do not exist in the source set (matched on predefined IDs) will be deleted. This is like full synchronization between the source and the target record set.

Parameters (in addition to the existing tMysqlOutput parameters):
- Delete obsolete records – enable deleting the obsolete records.
- Where clause condition for delete obsolete - optional where clause for deleted records. Only records that satisfy the where clause and are not in the input set will be deleted.

The component uses HashSet to store the IDs of the processed records. The limitation is that this is not efficient for large data sets. The component can be further enhanced to use disk implementation of HashSet, or to store the IDs of processed records in a database table.

Java

tSharepointFile

1.0 jjolley 2009-05-12
321

This component allows you to grab any file from a Sharepoint server through http.
It performs the necessary NTLM authentication. The component takes the sharepoint file and creates a temporary copy of the file. The temporary file name is stored in tSharepointFile.FILE and can be used with the rest of Talend's components. The temporary file is deleted once the job has completed. (knowledgerelay.com)

Perl

tFileInputXbase

0.3 plegall 2009-05-12
352

Read DBase and FoxPro files with the XBase Perl module.

Perl

tOneToMany

3 plegall 2009-05-12
397

This component is a Proof Of Concept : a row component (taking a data flow as input) and creating several distinct data flow as output. Each output has a distinct schema, that you can set dynamically, at design time.

This component needs at least trunk r20522 (it will be available in 3.1.0M1).

Job

CofigurableJobUsingSingleton

0.1 pravu 2009-05-11
898

Problem Definition
The problem definition of the ETL job is as mentioned below.
1. The Configuration values like database credentials, log file location and name needs to be kept in a XML file
a. Name and Location of the Log File can be changed without modifying the ETL job
b. By editing the configuration file, the user can change the database credentials for the source and target database.
2. The ETL job must support both Windows and Unix family operation system
3. Validation of configuration file needs to be done
a. Whether the mentioned database credentials in the configuration file is correct or not needs to be informed to the user in the log file. Even the database credentials is correct and still it is not possible to connect to a database because might be the database is down then also the ETL job needs to log about this in the log file.
b. The log file path mentioned in the configuration file is correct or not needs to be informed to the user in the console
4. The configuration file needs to be passed from command line because there are more than one instance of the job are expected to be executed at the same time. It means multiple instances of target database are having the same structure. So multiple instances of the same job having different configuration file can migrate the data from the source database in case we need to make the target database values same at the same time. The values in the configuration files like target database name, Ip address must be different in all the configuration files.
5. The command line configuration file name and location needs to be checked by the ETL job and should inform the user, in case it is wrong it must exit from the job. The ETL job can use the console to inform about the wrong command line configuration file name
6. The configuration file should not be loaded each and every time from the disk whenever the values in the configuration file needs to be used by any ETL sub job. It means the configuration file should not be loaded for each sub jobs those use the content of the configuration file. The configuration file must be loaded only once and the values must be kept in the memory and to be used by all sub jobs.
7. There should be a log file and that should tell about the execution of the main job and sub jobs.
a. Information about start and end of each sub job and main job with status and time information should be kept in the log file.
b. In case any record is rejected while inserting the data, it should be kept in the log file with date, time and with an error message
c. Number of records fetched from source database and number of records processed and inserted into the target database must be kept in the log file
d. The log files for each instance of the job must be different and the user needs to be advised to do so. The user should not use the same log file for all the instances get executed at the same time. Other wise the log file will contain garbage
e. The ETL should manage to create log file according to date. It means the ETL will append the date value with the log file name mentioned in the configuration file.

Java

tFileOutputPDF

1.2 cahsohtoa 2009-05-11
3378

This is the first version of the component that allow you to export your data in a PDF file.
Please have a look to the advanced settings because you would find a lot of parameters to customize your result.
I hope it will be helpfull

Java

tRunJobRow

0.1 bcourtine 2009-05-08
429

This component was created to run another job, sending to the subjob data rows, and getting back result rows :

- input and output schemas of the subjob can be different (technically, the tRunJobRow component has only an output schema)
- input and output row line numbers can be different

To work fine, this component NEEDS the tBufferCopyInput component.

User manual and explanations :
1) In the main job, data rows are sent to a tBufferOutput
2) In the subjob, data rows are read with a tBufferCopyInput. This component also cleans the global buffer for the next tBufferOutput
3) In the subjob, output data rows are sent to a tBufferOutput

See the screenshot for a real example.

Version Author Released on Rating Downloads
Regex

Only alphabetical characters not empty

1.0 dcortinovis 2013-06-19
1

Only alphabetical characters not empty.
And at least one (empty forbidden)

Indicator

EMail validation via mail server

5.3.0 mzhao 2013-06-03
193

This Java UDI check emails by sending a SMTP request to mail server. the code sample can be found at: http://www.rgagnon.com/javadetails/java-0452.html

Indicator

Frequency table of hours

2.0 scorreia 2013-04-25
277

This indicator helps to analyze the most frequent day hours that appear in date time columns.

Indicator

Sample Standard Deviation

1.1 scorreia 2013-04-25
195

This indicator computes the sample standard deviation of any numerical column

Indicator

Variance

1.1 scorreia 2013-04-25
183

This indicator computes the variance of numeric columns

Indicator

Trimmed

1.0 scorreia 2013-04-25
17

evaluate the number of data which are correctly trimmed

Indicator

Week Frequency

2.0 scorreia 2013-04-25
173

aggregates Date fields into weeks

Indicator

Duplicate Rows

2.0 scorreia 2013-04-25
599

this indicator counts the number of duplicate rows.
It's different from the system indicator called "duplicate count" because it counts the number of duplicate rows, not the number of duplicate values.

Indicator

Length Range Frequency

1.1 scorreia 2013-04-25
48

get length ranges of data.

group data according to their length range.
Ranges are the following:
data of length < 10
data of length < 20
data of length < 30
data of length >= 30
null data

Indicator

Order of Magnitude

1.1 scorreia 2013-04-25
58

measure the order of magnitude of numerical data

Product Demo


  • Author: ctoum
  • Categories: Export
  • First revision date: 2011-05-18
  • Latest revision date: 2012-05-31
  • Compatible with: Master Data Management releases 4.2.0, 4.2.0M1, 4.2.1, 4.2.2, 5.0.0, 5.0.0M1, 5.0.0M2
  • Downloads: 433

About: Product & families, with Cafepress pictures.

Revision list

expand/collapse all

Revision 3.0 128 Downloads, Released on 2012-05-31
Download revision 3.0

Compatible with: 5.0.0

The MDM Product Demo project can help you start with Talend Master Data Management.
Discover how to setup and configure the features of Talend Master Data Management via meaningful samples.

Revision 2.0 217 Downloads, Released on 2011-07-29
Download revision 2.0

Compatible with: 5.0.0, 5.0.0M2, 5.0.0M1, 4.2.2, 4.2.1, 4.2.0

New revision for 4.2 compatibility

Revision 1.0 88 Downloads, Released on 2011-05-18
Download revision 1.0

Compatible with: 4.2.0M1

Initial revision.

Reviews (3)

 get starting By zaoweiruan on March 25, 2013
thanks for share learning
 Great start By mbalkenende on August 25, 2011
This is a great start with Data Quilty and MDM processes. Make sure you have the recent patches.

MDM 5.0 By  on March 25, 2013
Hi mbalkenende, Will u please guide me for MDM A to Z. bcoz im new to this field. i dowloaded talend MDM 5.0. Thanks in advance.
 sa By jamikorn on July 20, 2011
excellent
Submit review
Name:*
Email:*
Title:*
Please select your rating*
Review:*



49 ms