Multi-Column Analyses (Exercise 5) Define technical and business validation rules that imply multiple tables and columns

In this exercise, you will perform several multi-column and multi-table analyses on both crm and cif catalogs. The exercise comprises four steps:
- Columns dependency verification inside a table,
- Foreign key discovery and validation,
- Cross-table data comparison with filter: on "prospects" and on "customers",
- Cross-table data quality rule.

Prerequisites:
To follow this tutorial, you need to extract and install the cif and crm databases zipped in the exampleFile.zip file available for download in the Download it! section of this tutorial.

Download it!

You want to practice?

Download exampleFile.zip to get the files used for this tutorial.

You can also download tutorialProject.zip containing all the jobs needed to carry out this tutorial.

You can also:
Send it!

Share it!
Next Step: Creating "Reports" that gather information coming from different analyses

 


Checking that a Zip code always corresponds to the same city



In the DQ Repository view:
Expand the Data Profiling node.
Right-click on Analyses and select New Analysis.

Next
In the Create New Analysis wizard:
Expand the Table Analysis node and select Functional Dependency.
Click Next.
Name it zip_city_validation.

Click Finish.

Next
The new analysis displays on the workspace.
Click on Determinant columns: Select columns for A set.

Next
In the Column Selection wizard:
Expand the DB Connections, Staging_DB, cif and Tables nodes.
Select the customer table.
In the right window, select the Column3 (Zip code) box.

Click OK.

Next
The column displays in the Left Columns panel.

Click on Dependent columns: Select columns from B set.

Next
Repeat the previous steps but select Column4 (City).
Click OK.

Next
Save and Run the analysis.

Click on the Analysis Results tab at the bottom of the window to view the results.

Right-click on the chart or the table to view valid or invalid rows and/or valid or invalid values.

Next
By selecting to view the detailed invalid value, you can see that Cranford and Solsville have the same Zip code.

Next
Now, modify the analysis by adding the columns 3 and 4 to invert the analysis.

You can either click on the Reverse columns button or,

Repeat the steps as previously explained so as to have:
  • column3 in A in front of column4 in B
  • column4 in A in front of column3 in B

To move the columns up and down, use the Move Up and Move Down buttons.

Next
Save and Run the analysis again.
Right-click on the second row of the result table.
Select View detailed invalid values.

Now you can see another error that did not appear previously:
Colebrook city appears twice with the Zip code 03567 and only once with the Zip code 62897.

Next
Checking for country code Foreign Key in Customer table in cif


Reminder: in the Customer table, column9 represents the country code.

Create a New Analysis.
In the Redundancy Analysis folder, select Column Content Comparison analysis.
This analysis is to check which field in the Country table matches the Customer country code.

Name this analysis country_fk.


Next
The new analysis displays on the workspace.

Next
In the Column Selection wizard for the A set:
In the cif database and customer table, select column9.
Click OK.


In the Column Selection wizard for the B set:
In the cif database and country table, select column0.
Click OK.

Next
Save and Run the analysis.

View the results:
There are 4 invalid values in the customer table that do not appear in the country table.
You can right-click and select View non match rows to display these invalid values.

Next
Discover and compare foreign keys for the crm catalog tables


Create Column Content Comparison analyses for each line in the screenshot, meaning, each crm catalog tables.

Run and review the analyses results.

What are you conclusions?

For the last analysis (contract_beneficiary), how could you improve the analysis?

Next
According to your analyses, are the schema foreign keys here correct and the integrity constraints respected?

Next
A Prospect should not have any Contract


Rule: If a person is tagged as prospect, he/she should not have any contract.

The tables used are cust and contract.

In the following steps, you are going to check that a prospect is not an owner of a contract. All prospect codes (primary keys of the cust table) should never appear in the owner column of the contract table.

Next
In the DQ Repository view:
Right-click on Analyses and select New Analysis.
Expand the Redundancy Analysis folder and select Column Content Comparison.
Click Next.

Name it prospect_check.

Click Finish.

Next
The new analysis displays on the workspace.
Click on Select Columns for A set.

Next
Select the cust table of the crm database.
Select Code.
Click OK.

Next
For the selection of columns of the B set, repeat the previous steps and select owner of the contract table.
Click OK.

Next
Set the filter for the customer type (prospect): Cust_type="prospect"

Next
Save and run the analysis.

View the results:
There are three matching records: this means three prospects have a contract.

Next
A customer should have at least one contract


Rule: A person tagged as a customer should have at least one contract.

1) Right-click on the propect_check analysis and select Duplicate.

Open the copy of prospect_check analysis.

2) Change the name to customer_check.


Next
In the A set, select the Code column from the cust table.

In the B set, select owner from the contract table.


Set the data filter for customer code: Cust_type="customer"

Next
View the results:
  • 29 customers are not linked to a contract.
  • 3 contracts are not linked to a customer.

Next
The date of a claim must be between its related contract begin and end dates


Expand the Libraries and Rules nodes.
Right-click on SQL and select New DQ Rule.
Name it claim_date_check.

To be valid, the claim date (Date column) must be between the contract begin (Begin_dt) and end dates (End_dt).

In the DQ rule Where Clause, enter the following:
Date >= Begin_dt and Date <= End_dt

Next
Since the DQ rule concerns two tables (claim and contract), you have to set the inner join between these tables.

Press the green plus button to add a new join condition.

Drag and drop the claim cont_num and contract number columns on the line.

Save the DQ rule.

Next
Create a new DQ Rule analysis.

Name it claim_date_check.

Click on Select tables to analyze and select the claim table.

Click the icon under DQ Rule and apply the rule created in the previous step.

Next
Save and run the analysis.

Display and look at the results: 4% of the claims are not valid.

  Next Step: Creating "Reports" that gather information coming from different analyses

 

    Download it!     Send it!     Share it!

You want to practice?

Download exampleFile.zip to get the files used for this tutorial.

You can also download tutorialProject.zip containing all the jobs needed to carry out this tutorial.

Friends / colleagues may be interested in this tutorial? Send it to them!

You liked this tutorial ? Support it!

[ top ]