Talend Exchange is the place where Talend community can share items
related to Talend opensource products, such as Data Integration, Data Quality and Data Master Management.
Contribution is open to any user, no specific validation is needed.
As soon as you have your forum account, you automatically get a Talend Exchange account.
|
|
Version |
Author |
Released on |
Rating |
Downloads |
 |
tHTTPTableInput
|
4
|
dthonon
|
2009-05-29
|
163
tos
(0 votes, average 0 out of 5)
|
253
|
|
This Component extracts HTML-Tables from a given URL.
The \'Syntax for Table\' means:
T=3 / Third(fourth) Table inside the page
C=0,0 / The Column at the position 0,0 in the third(fourth) table
T=1 / The first(second) table inside Cell 0,0 from the third(fourth) [optional]
Exemple 1: "T=3;C=0,0;T=1"
Exemple 2: "T=2;C=0,0;" with the change of revision 3
Exemple 3: "T=2;" with the change of revision 3
|
 |
Securities Validation
|
1
|
hugo
|
2009-05-25
|
161
tos
(0 votes, average 0 out of 5)
|
206
|
|
Validate Fixed Income and Equity Securiites.
This routines calculates and compares the check digits for the follow securities
Isin
Sedol
Cusip
|
 |
tmOracleOutput
|
1.2
|
madamovic
|
2009-05-14
|
151
tos
(0 votes, average 0 out of 5)
|
117
|
|
This component adds "Delete obsolete" functionality to the existing tOracleOutput component.
"Delete Obsolete" option is enabled for all variants of "Insert and Update" data actions. If Delete Obsolete is turned on, all records from the target table that were not inserted or updated from the input row set, will be deleted. In other words, records from the target table that do not exist in the source set (matched on predefined IDs) will be deleted. This is like full synchronization between the source and the target record set.
Parameters (in addition to the existing tOracleOutput parameters):
- Delete obsolete records – enable deleting the obsolete records.
- Where clause condition for delete obsolete - optional where clause for deleted records. Only records that satisfy the where clause and are not in the input set will be deleted.
The component uses HashSet to store the IDs of the processed records. The limitation is that this is not efficient for large data sets. The component can be further enhanced to use disk implementation of HashSet, or to store the IDs of processed records in a database table.
|
 |
tmMysqlOutput
|
1.2
|
madamovic
|
2009-05-13
|
150
tos
(0 votes, average 0 out of 5)
|
170
|
|
This component adds "Delete obsolete" functionality to the existing tMysqlOutput component.
"Delete Obsolete" option is enabled for all variants of "Insert and Update" data actions. If Delete Obsolete is turned on, all records from the target table that were not inserted or updated from the input row set, will be deleted. In other words, records from the target table that do not exist in the source set (matched on predefined IDs) will be deleted. This is like full synchronization between the source and the target record set.
Parameters (in addition to the existing tMysqlOutput parameters):
- Delete obsolete records – enable deleting the obsolete records.
- Where clause condition for delete obsolete - optional where clause for deleted records. Only records that satisfy the where clause and are not in the input set will be deleted.
The component uses HashSet to store the IDs of the processed records. The limitation is that this is not efficient for large data sets. The component can be further enhanced to use disk implementation of HashSet, or to store the IDs of processed records in a database table.
|
 |
tSharepointFile
|
1.0
|
jjolley
|
2009-05-12
|
158
tos
(0 votes, average 0 out of 5)
|
321
|
|
This component allows you to grab any file from a Sharepoint server through http.
It performs the necessary NTLM authentication. The component takes the sharepoint file and creates a temporary copy of the file. The temporary file name is stored in tSharepointFile.FILE and can be used with the rest of Talend's components. The temporary file is deleted once the job has completed. (knowledgerelay.com)
|
 |
tFileInputXbase
|
0.3
|
plegall
|
2009-05-12
|
104
tos
(0 votes, average 0 out of 5)
|
352
|
|
Read DBase and FoxPro files with the XBase Perl module.
|
 |
tOneToMany
|
3
|
plegall
|
2009-05-12
|
96
tos
(0 votes, average 0 out of 5)
|
397
|
|
This component is a Proof Of Concept : a row component (taking a data flow as input) and creating several distinct data flow as output. Each output has a distinct schema, that you can set dynamically, at design time.
This component needs at least trunk r20522 (it will be available in 3.1.0M1).
|
 |
CofigurableJobUsingSingleton
|
0.1
|
pravu
|
2009-05-11
|
156
tos
(0 votes, average 0 out of 5)
|
898
|
|
Problem Definition
The problem definition of the ETL job is as mentioned below.
1. The Configuration values like database credentials, log file location and name needs to be kept in a XML file
a. Name and Location of the Log File can be changed without modifying the ETL job
b. By editing the configuration file, the user can change the database credentials for the source and target database.
2. The ETL job must support both Windows and Unix family operation system
3. Validation of configuration file needs to be done
a. Whether the mentioned database credentials in the configuration file is correct or not needs to be informed to the user in the log file. Even the database credentials is correct and still it is not possible to connect to a database because might be the database is down then also the ETL job needs to log about this in the log file.
b. The log file path mentioned in the configuration file is correct or not needs to be informed to the user in the console
4. The configuration file needs to be passed from command line because there are more than one instance of the job are expected to be executed at the same time. It means multiple instances of target database are having the same structure. So multiple instances of the same job having different configuration file can migrate the data from the source database in case we need to make the target database values same at the same time. The values in the configuration files like target database name, Ip address must be different in all the configuration files.
5. The command line configuration file name and location needs to be checked by the ETL job and should inform the user, in case it is wrong it must exit from the job. The ETL job can use the console to inform about the wrong command line configuration file name
6. The configuration file should not be loaded each and every time from the disk whenever the values in the configuration file needs to be used by any ETL sub job. It means the configuration file should not be loaded for each sub jobs those use the content of the configuration file. The configuration file must be loaded only once and the values must be kept in the memory and to be used by all sub jobs.
7. There should be a log file and that should tell about the execution of the main job and sub jobs.
a. Information about start and end of each sub job and main job with status and time information should be kept in the log file.
b. In case any record is rejected while inserting the data, it should be kept in the log file with date, time and with an error message
c. Number of records fetched from source database and number of records processed and inserted into the target database must be kept in the log file
d. The log files for each instance of the job must be different and the user needs to be advised to do so. The user should not use the same log file for all the instances get executed at the same time. Other wise the log file will contain garbage
e. The ETL should manage to create log file according to date. It means the ETL will append the date value with the log file name mentioned in the configuration file.
|
 |
tFileOutputPDF
|
1.2
|
cahsohtoa
|
2009-05-11
|
33
tos
(0 votes, average 0 out of 5)
|
3378
|
|
This is the first version of the component that allow you to export your data in a PDF file.
Please have a look to the advanced settings because you would find a lot of parameters to customize your result.
I hope it will be helpfull
|
 |
tRunJobRow
|
0.1
|
bcourtine
|
2009-05-08
|
155
tos
(0 votes, average 0 out of 5)
|
429
|
|
This component was created to run another job, sending to the subjob data rows, and getting back result rows :
- input and output schemas of the subjob can be different (technically, the tRunJobRow component has only an output schema)
- input and output row line numbers can be different
To work fine, this component NEEDS the tBufferCopyInput component.
User manual and explanations :
1) In the main job, data rows are sent to a tBufferOutput
2) In the subjob, data rows are read with a tBufferCopyInput. This component also cleans the global buffer for the next tBufferOutput
3) In the subjob, output data rows are sent to a tBufferOutput
See the screenshot for a real example.
|
|
|
Version |
Author |
Released on |
Rating |
Downloads |
 |
Only alphabetical characters not empty
|
1.0
|
dcortinovis
|
2013-06-19
|
174
top
(0 votes, average 0 out of 5)
|
1
|
|
Only alphabetical characters not empty.
And at least one (empty forbidden)
|
 |
EMail validation via mail server
|
5.3.0
|
mzhao
|
2013-06-03
|
141
top
(0 votes, average 0 out of 5)
|
194
|
|
This Java UDI check emails by sending a SMTP request to mail server. the code sample can be found at: http://www.rgagnon.com/javadetails/java-0452.html
|
 |
Frequency table of hours
|
2.0
|
scorreia
|
2013-04-25
|
76
top
(0 votes, average 0 out of 5)
|
277
|
|
This indicator helps to analyze the most frequent day hours that appear in date time columns.
|
 |
Sample Standard Deviation
|
1.1
|
scorreia
|
2013-04-25
|
78
top
(0 votes, average 0 out of 5)
|
195
|
|
This indicator computes the sample standard deviation of any numerical column
|
 |
Variance
|
1.1
|
scorreia
|
2013-04-25
|
79
top
(0 votes, average 0 out of 5)
|
183
|
|
This indicator computes the variance of numeric columns
|
 |
Trimmed
|
1.0
|
scorreia
|
2013-04-25
|
170
top
(0 votes, average 0 out of 5)
|
17
|
|
evaluate the number of data which are correctly trimmed
|
 |
Week Frequency
|
2.0
|
scorreia
|
2013-04-25
|
97
top
(0 votes, average 0 out of 5)
|
173
|
|
aggregates Date fields into weeks
|
 |
Duplicate Rows
|
2.0
|
scorreia
|
2013-04-25
|
86
top
(0 votes, average 0 out of 5)
|
600
|
|
this indicator counts the number of duplicate rows.
It's different from the system indicator called "duplicate count" because it counts the number of duplicate rows, not the number of duplicate values.
|
 |
Length Range Frequency
|
1.1
|
scorreia
|
2013-04-25
|
147
top
(0 votes, average 0 out of 5)
|
48
|
|
get length ranges of data.
group data according to their length range.
Ranges are the following:
data of length < 10
data of length < 20
data of length < 30
data of length >= 30
null data
|
 |
Order of Magnitude
|
1.1
|
scorreia
|
2013-04-25
|
144
top
(0 votes, average 0 out of 5)
|
58
|
|
measure the order of magnitude of numerical data
|
Product Demo
- Author: ctoum
- Categories: Export
- First revision date: 2011-05-18
- Latest revision date: 2012-05-31
- Compatible with: Master Data Management releases 4.2.0, 4.2.0M1, 4.2.1, 4.2.2, 5.0.0, 5.0.0M1, 5.0.0M2
- Downloads: 433
About: Product & families, with Cafepress pictures.
Revision list
expand/collapse all
Compatible with: 5.0.0
The MDM Product Demo project can help you start with Talend Master Data Management.
Discover how to setup and configure the features of Talend Master Data Management via meaningful samples.
Compatible with: 5.0.0, 5.0.0M2, 5.0.0M1, 4.2.2, 4.2.1, 4.2.0
New revision for 4.2 compatibility
Compatible with: 4.2.0M1
Initial revision.
Reviews (3)
thanks for share learning
This is a great start with Data Quilty and MDM processes. Make sure you have the recent patches.
Hi mbalkenende, Will u please guide me for MDM A to Z. bcoz im new to this field. i dowloaded talend MDM 5.0.
Thanks in advance.
|