Talend Exchange is the place where Talend community can share items related to Talend opensource products, such as Data Integration, Data Quality and Data Master Management. Contribution is open to any user, no specific validation is needed. As soon as you have your forum account, you automatically get a Talend Exchange account.


Show

Category
Search
Version
Author
 

Statistics

  • 500 extensions
  • 820 revisions
  • 223 contributors
  • 109729 downloads
 

Top Contributors

Version Author Released on Rating Downloads
Java

tHTTPTableInput

4 dthonon 2009-05-29
253

This Component extracts HTML-Tables from a given URL.

The \'Syntax for Table\' means:
T=3 / Third(fourth) Table inside the page
C=0,0 / The Column at the position 0,0 in the third(fourth) table
T=1 / The first(second) table inside Cell 0,0 from the third(fourth) [optional]

Exemple 1: "T=3;C=0,0;T=1"
Exemple 2: "T=2;C=0,0;" with the change of revision 3
Exemple 3: "T=2;" with the change of revision 3

Routine

Securities Validation

1 hugo 2009-05-25
204

Validate Fixed Income and Equity Securiites.
This routines calculates and compares the check digits for the follow securities

Isin
Sedol
Cusip


Java

tmOracleOutput

1.2 madamovic 2009-05-14
117

This component adds "Delete obsolete" functionality to the existing tOracleOutput component.

"Delete Obsolete" option is enabled for all variants of "Insert and Update" data actions. If Delete Obsolete is turned on, all records from the target table that were not inserted or updated from the input row set, will be deleted. In other words, records from the target table that do not exist in the source set (matched on predefined IDs) will be deleted. This is like full synchronization between the source and the target record set.

Parameters (in addition to the existing tOracleOutput parameters):
- Delete obsolete records – enable deleting the obsolete records.
- Where clause condition for delete obsolete - optional where clause for deleted records. Only records that satisfy the where clause and are not in the input set will be deleted.

The component uses HashSet to store the IDs of the processed records. The limitation is that this is not efficient for large data sets. The component can be further enhanced to use disk implementation of HashSet, or to store the IDs of processed records in a database table.

Java

tmMysqlOutput

1.2 madamovic 2009-05-13
170

This component adds "Delete obsolete" functionality to the existing tMysqlOutput component.

"Delete Obsolete" option is enabled for all variants of "Insert and Update" data actions. If Delete Obsolete is turned on, all records from the target table that were not inserted or updated from the input row set, will be deleted. In other words, records from the target table that do not exist in the source set (matched on predefined IDs) will be deleted. This is like full synchronization between the source and the target record set.

Parameters (in addition to the existing tMysqlOutput parameters):
- Delete obsolete records – enable deleting the obsolete records.
- Where clause condition for delete obsolete - optional where clause for deleted records. Only records that satisfy the where clause and are not in the input set will be deleted.

The component uses HashSet to store the IDs of the processed records. The limitation is that this is not efficient for large data sets. The component can be further enhanced to use disk implementation of HashSet, or to store the IDs of processed records in a database table.

Java

tSharepointFile

1.0 jjolley 2009-05-12
319

This component allows you to grab any file from a Sharepoint server through http.
It performs the necessary NTLM authentication. The component takes the sharepoint file and creates a temporary copy of the file. The temporary file name is stored in tSharepointFile.FILE and can be used with the rest of Talend's components. The temporary file is deleted once the job has completed. (knowledgerelay.com)

Perl

tFileInputXbase

0.3 plegall 2009-05-12
352

Read DBase and FoxPro files with the XBase Perl module.

Perl

tOneToMany

3 plegall 2009-05-12
391

This component is a Proof Of Concept : a row component (taking a data flow as input) and creating several distinct data flow as output. Each output has a distinct schema, that you can set dynamically, at design time.

This component needs at least trunk r20522 (it will be available in 3.1.0M1).

Job

CofigurableJobUsingSingleton

0.1 pravu 2009-05-11
888

Problem Definition
The problem definition of the ETL job is as mentioned below.
1. The Configuration values like database credentials, log file location and name needs to be kept in a XML file
a. Name and Location of the Log File can be changed without modifying the ETL job
b. By editing the configuration file, the user can change the database credentials for the source and target database.
2. The ETL job must support both Windows and Unix family operation system
3. Validation of configuration file needs to be done
a. Whether the mentioned database credentials in the configuration file is correct or not needs to be informed to the user in the log file. Even the database credentials is correct and still it is not possible to connect to a database because might be the database is down then also the ETL job needs to log about this in the log file.
b. The log file path mentioned in the configuration file is correct or not needs to be informed to the user in the console
4. The configuration file needs to be passed from command line because there are more than one instance of the job are expected to be executed at the same time. It means multiple instances of target database are having the same structure. So multiple instances of the same job having different configuration file can migrate the data from the source database in case we need to make the target database values same at the same time. The values in the configuration files like target database name, Ip address must be different in all the configuration files.
5. The command line configuration file name and location needs to be checked by the ETL job and should inform the user, in case it is wrong it must exit from the job. The ETL job can use the console to inform about the wrong command line configuration file name
6. The configuration file should not be loaded each and every time from the disk whenever the values in the configuration file needs to be used by any ETL sub job. It means the configuration file should not be loaded for each sub jobs those use the content of the configuration file. The configuration file must be loaded only once and the values must be kept in the memory and to be used by all sub jobs.
7. There should be a log file and that should tell about the execution of the main job and sub jobs.
a. Information about start and end of each sub job and main job with status and time information should be kept in the log file.
b. In case any record is rejected while inserting the data, it should be kept in the log file with date, time and with an error message
c. Number of records fetched from source database and number of records processed and inserted into the target database must be kept in the log file
d. The log files for each instance of the job must be different and the user needs to be advised to do so. The user should not use the same log file for all the instances get executed at the same time. Other wise the log file will contain garbage
e. The ETL should manage to create log file according to date. It means the ETL will append the date value with the log file name mentioned in the configuration file.

Java

tFileOutputPDF

1.2 cahsohtoa 2009-05-11
3355

This is the first version of the component that allow you to export your data in a PDF file.
Please have a look to the advanced settings because you would find a lot of parameters to customize your result.
I hope it will be helpfull

Java

tRunJobRow

0.1 bcourtine 2009-05-08
426

This component was created to run another job, sending to the subjob data rows, and getting back result rows :

- input and output schemas of the subjob can be different (technically, the tRunJobRow component has only an output schema)
- input and output row line numbers can be different

To work fine, this component NEEDS the tBufferCopyInput component.

User manual and explanations :
1) In the main job, data rows are sent to a tBufferOutput
2) In the subjob, data rows are read with a tBufferCopyInput. This component also cleans the global buffer for the next tBufferOutput
3) In the subjob, output data rows are sent to a tBufferOutput

See the screenshot for a real example.

Show

Category
Search
Version
Author
 

Statistics

  • 139 extensions
  • 172 revisions
  • 23 contributors
  • 12597 downloads
 

Top Contributors

Version Author Released on Rating Downloads
Indicator

Frequency table of hours

2.0 scorreia 2013-04-25
270

This indicator helps to analyze the most frequent day hours that appear in date time columns.

Indicator

Sample Standard Deviation

1.1 scorreia 2013-04-25
185

This indicator computes the sample standard deviation of any numerical column

Indicator

Variance

1.1 scorreia 2013-04-25
176

This indicator computes the variance of numeric columns

Indicator

Trimmed

1.0 scorreia 2013-04-25
7

evaluate the number of data which are correctly trimmed

Indicator

Week Frequency

2.0 scorreia 2013-04-25
168

aggregates Date fields into weeks

Indicator

Duplicate Rows

2.0 scorreia 2013-04-25
569

this indicator counts the number of duplicate rows.
It's different from the system indicator called "duplicate count" because it counts the number of duplicate rows, not the number of duplicate values.

Indicator

Length Range Frequency

1.1 scorreia 2013-04-25
38

get length ranges of data.

group data according to their length range.
Ranges are the following:
data of length < 10
data of length < 20
data of length < 30
data of length >= 30
null data

Indicator

Order of Magnitude

1.1 scorreia 2013-04-25
49

measure the order of magnitude of numerical data

Indicator

phone_area_code_freq

1.0 scorreia 2013-04-24
4

Area codes of American phone numbers

Indicator

udi_average_yearly_income

1.0 scorreia 2013-04-24
3

parses $50K - $70K and return the average value

Show

Category
Search
Version
Author
 

Statistics

  • 5 extensions
  • 7 revisions
  • 4 contributors
  • 3382 downloads
 

Top Contributors

Version Author Released on Rating Downloads
Export

DStar

1.0 ctoum 2010-08-04
1096

D* Industries Demo Model


52 ms