You are not logged in.

Unanswered posts



Important! This site has been replaced. All content here is read-only. Please visit our brand-new community at https://community.talend.com/. We look forward to hearing from you there!



#1 2008-10-31 17:22:44

EdS
Guest

EdS said:

Join analysis in Open Profiler?

Tags: [features]

We are looking into data profiling tools and one feature we specifically want is join analysis. 
For example, for a join of TableA to TableB on Criteria, we want information like:

TableA - 4002 rows
TableB - 3000 rows
Join Criteria: match by SSN and SchoolYear
Join Yield - 2793 rows
Number of entries in TableA that related to at lease one entry in TableB: 2504
Number of rows in TableA lost by join: 1498 (37%)
Number of rows in TableA that relate to MORE than one entry in TableB: 302
Multi-linked record count: 830  (e.g. a row in tableA joins with multiple rows in tableB)

This information allows us to better explore the relationships between tables as well as the data quality.  Does OpenProfiler allow anything like this?  The SQL is pretty straight-forward.  I can help out with that if necessary.

-Ed S.-
-------------------------

#2 2008-10-31 17:54:45

scorreia
Talend Team


scorreia said:

Re: Join analysis in Open Profiler?

This feature is under developement. See [Bug] 5322
If you want to help, you could start by reading the document attached to this issue and tell us whether we missed something.
Thanks.


Thank you for your support,
Sebastiao Correia.

Offline

#3 2008-11-07 23:33:49

Ed S.
Guest

Ed S. said:

Re: Join analysis in Open Profiler?

I think this is an excellent start.  It will be able to report how many items in table A match an item in table B.  However, one important piece of information not provided is how many items in table A match more than one item in table B.  This help us determine if we are using proper join criteria.  Can this feature be expanded to include the number of multi-matches?

#4 2008-11-10 13:08:53

scorreia
Talend Team


scorreia said:

Re: Join analysis in Open Profiler?

Thank you for your feedback. I agree with you: this is a "must have" feature.

What would be the SQL to get all these informations in a minimal number of queries? I can write a SQL statement for each line but I am not sure it would be the most efficient.  And what is the difference between the last two lines?

EdS wrote:

Join Yield - 2793 rows
Number of entries in TableA that related to at lease one entry in TableB: 2504
Number of rows in TableA lost by join: 1498 (37%)
Number of rows in TableA that relate to MORE than one entry in TableB: 302
Multi-linked record count: 830  (e.g. a row in tableA joins with multiple rows in tableB)


Thank you for your support,
Sebastiao Correia.

Offline

#5 2008-11-21 15:34:16

scorreia
Talend Team


scorreia said:

Re: Join analysis in Open Profiler?

A first implementation is provided in the 1.2.0M1 version, as described in [Bug] 5322
Go and try it at Talend.com
I have create the feature request for the other interesting indicators. Follow its evolution here: [Bug] 5731


Thank you for your support,
Sebastiao Correia.

Offline

Board footer

Talend Contributor Agreement - Talend Website Privacy Policy