First, check the documentation section: http://www.talend.com/resources/documentation.php
For the 1.0.0 version, we only had a "Getting Started" guide, now with the 1.1.0 version, we have a complete user guide.
Official tutorials are still missing, but Dylan Jones at Data Quality Pro has achieved a great work at presenting how to use Talend Open Profiler on a real data sample. Check it out at http://www.dataqualitypro.com/data-qual … al-in.html
Dylan's tutorial comes with a data sample and explains how to profile them. It also gives useful tips on how to interpret the results. Some questions like the following are easily answered with TOP: Do you need to know whether a column could serve as a primary key? Do you need to identify your duplicate data? Do you want to see unexpected data ("outliers")? Do you need to plan your data quality tasks?...
All features of Talend Open Profiler are not described in Dylan's tutorial, but it's a very good start.
Other valuable information and tools on data quality in general are available on the Data Quality Pro website. Give it a look.
Among the features not covered by the tutorial, here are a few that are worth to mention if you need to go further:
- you can set thresholds on indicators and TOP will highlight the results that do not respect your expected ranges.
- You can study slices of numeric data with the frequency indicator (for example, What is the repartition of the age of your customers in given slices: 10-20, 20-40, 40-65 year?)
- You can evaluate how many of your email (or any other kind of data) are well formed and see which ones are invalid by using the "Pattern" indicators.
- You can add your own patterns (regular expressions) to the list of the existing ones.
Thank you for your support,
Dylan Jones said:
Thanks for sharing the tutorial information on talend forge, I hope others here find it useful too.
We've got more Talend Open Profiler special tutorials coming in the next few weeks and it's really positive to see both communities interacting like this.
I think you've done a great job with the product so far and it has some really neat features, not just for analysing and detecting DQ problems but for generating a DQ workflow in the organisation which is what it's all about so well done and I wish you continued success.
We're open to suggestions for the focus of future DQ tutorials so I would ask others in this community to send me their ideas and we'll put them into the pipeline.
Just drop me a line here: http://www.dataqualitypro.com/data-quality-dylan-jones/ if you want to see anything specific on DQ in a tutorial.
Thank you again,