Single Table Analyses (Exercise 4) After performing single column analyses, we will now define business validation rules on the tables. Data Quality rules allow to execute technical or business analyses on a table.

In this exercise, we will perform several table analyses with data quality rules on the CRM catalog in order to validate the customer and contract records. The exercise comprises three steps:
- Data validation,
- Customer age,
- Phone numbers.

Prerequisites:
To follow this tutorial, you need to extract and install the cif and crm databases zipped in the exampleFile.zip file available for download in the Download it! section of this tutorial.

Download it!

You want to practice?

Download exampleFile.zip to get the files used for this tutorial.

You can also download tutorialProject.zip containing all the jobs needed to carry out this tutorial.

You can also:
Send it!

Share it!
Next Step: [Multi-Column Analyses (Exercise 5)]: Define technical and business validation rules that imply multiple tables and columns.

 


Simple date validation on Contract table


Our first data quality rule analysis consists in creating a simple Data Quality (DQ) rule to compare the contract begin and end dates.

Rule - contract end date should be after the begin date.

To do so, we will use the Begin_dt and End_dt date columns of the contract table in the crm catalog.

Next
To create the DQ rule:

Expand the Libraries and Rules nodes.
Right-click the SQL folder and select New DQ Rule in the menu.

Name it contract_dates.

Click Next.

Set the Where clause with the following value:
Begin_dt < End_dt

Click Finish.

Next
Tips:

You can directly drag and drop the columns in the Where clause field.

Next
Now we have to create a new Table Analysis.

In the DQ Repository view:

Expand the Data Profiling node.
Right-clik Analyses.
In the menu, click New Analysis to open the Create New Analysis wizard.

Next
In the Create New Analysis wizard:

Expand Table Analysis and select DQ Rule Analysis.

Click Next.

Name it contract_dates and click Next.

Browse to the crm database and select the contract table, on which you want  to perform the analysis on.

Click Next.

Next
Expand the Rules and SQL nodes.

Select the contract_dates DQ rule that we just created.

Click Finish.

Next
Save and run the analysis.

Display the Analysis Results tab.

Next
You can now right-click on the contract_dates line or on the chart to see invalid rows by selecting View invalid rows in the menu.

Next
The invalid rows are displayed in the SQL Editor.

You can export them into a file to correct the data. To do so:

Right-click the data, select Export... in the menu.

Next
Verify that Customers are 18 years old or more


Rule: Legal constraints do not allow a person under 18 to have a contract, which means no customer can be less than 18 years old (but a minor person can be a contract beneficiary).

To do so, we will work with the Cust_type and DOB columns of the cust table.

Next
Expand the columns of the cust table of the crm database.

Right-click the Cust_type column and select Preview in the menu.

A preview of the Cust_type data displays on the SQL Editor.

We can see that Cust_type can have 3 different values: beneficiary, customer and prospect.

Next
Create a new DQ Rule named cust_age with the following Where clause:
18<=((TO_DAYS(NOW())-TO_DAYS(DOB))/365)

Create a new DQ Rule Analysis named cust_table.

Apply on the cust table and select the cust_age DQ Rule.

Click Finish.

Next
In this analysis, we want to check that customers are not under 18.

We need to exclude other person types: beneficiary and prospect.

To do this, we will add a constraint on the cust_table analysis. Add a data filter by typing:
Cust_type="customer"

Save and run this analysis.

Next
Click the Analysis Result tab to display the results of the analysis.

We can see that 16.56% of the customers are under aged.

Next
Phone number should not be empty for Customers


Rule: If a person is a customer, his phone number must be known.

1) We need to define the Where clause on Cust_type and Phone_num.

2) Create a new DQ Rule named cust_phone with the following Where Clause:
Phone_num!=""

Next
1) Drag and drop the cust_phone DQ Rule on the cust table in the cust_table analysis.

2) Save and 3) run the analysis.

Next
Click the Analysis Results tab to display the results of the analysis.

We can see that we don't have the phone number of 7.50% of our customers.

  Next Step: [Multi-Column Analyses (Exercise 5)]: Define technical and business validation rules that imply multiple tables and columns.

 

    Download it!     Send it!     Share it!

You want to practice?

Download exampleFile.zip to get the files used for this tutorial.

You can also download tutorialProject.zip containing all the jobs needed to carry out this tutorial.

Friends / colleagues may be interested in this tutorial? Send it to them!

You liked this tutorial ? Support it!

[ top ]