  |
 | Checking that a Zip code always corresponds to the same city
|
In the DQ Repository view:Expand the Data Profiling node. Right-click on Analyses and select New Analysis.
Next
|
  |
In the Create New Analysis wizard:Expand the Table Analysis node and select Functional Dependency. Click Next. Name it zip_city_validation. Click Finish.
Next
|
  |
The new analysis displays on the workspace. Click on Determinant columns: Select columns for A set.
Next
|
  |
In the Column Selection wizard:Expand the DB Connections, Staging_DB, cif and Tables nodes. Select the customer table. In the right window, select the Column3 (Zip code) box. Click OK.
Next
|
  |
The column displays in the Left Columns panel. Click on Dependent columns: Select columns from B set.
Next
|
  |
Repeat the previous steps but select Column4 (City). Click OK.
Next
|
  |
Save and Run the analysis. Click on the Analysis Results tab at the bottom of the window to view the results. Right-click on the chart or the table to view valid or invalid rows and/or valid or invalid values.
Next
|
  |
By selecting to view the detailed invalid value, you can see that Cranford and Solsville have the same Zip code.
Next
|
  |
Now, modify the analysis by adding the columns 3 and 4 to invert the analysis. You can either click on the Reverse columns button or, Repeat the steps as previously explained so as to have: - column3 in A in front of column4 in B
- column4 in A in front of column3 in B
To move the columns up and down, use the Move Up and Move Down buttons.
Next
|
  |
Save and Run the analysis again. Right-click on the second row of the result table. Select View detailed invalid values. Now you can see another error that did not appear previously: Colebrook city appears twice with the Zip code 03567 and only once with the Zip code 62897.
Next
|
  |
 | Checking for country code Foreign Key in Customer table in cif
|
Reminder: in the Customer table, column9 represents the country code. Create a New Analysis. In the Redundancy Analysis folder, select Column Content Comparison analysis. This analysis is to check which field in the Country table matches the Customer country code. Name this analysis country_fk.
Next
|
  |
The new analysis displays on the workspace.
Next
|
  |
In the Column Selection wizard for the A set:In the cif database and customer table, select column9. Click OK. In the Column Selection wizard for the B set:In the cif database and country table, select column0. Click OK.
Next
|
  |
Save and Run the analysis. View the results:There are 4 invalid values in the customer table that do not appear in the country table. You can right-click and select View non match rows to display these invalid values.
Next
|
  |
 | Discover and compare foreign keys for the crm catalog tables
|
Create Column Content Comparison analyses for each line in the screenshot, meaning, each crm catalog tables. Run and review the analyses results. What are you conclusions? For the last analysis ( contract_beneficiary), how could you improve the analysis?
Next
|
  |
According to your analyses, are the schema foreign keys here correct and the integrity constraints respected?
Next
|
  |
 | A Prospect should not have any Contract
|
Rule: If a person is tagged as prospect, he/she should not have any contract. The tables used are cust and contract. In the following steps, you are going to check that a prospect is not an owner of a contract. All prospect codes (primary keys of the cust table) should never appear in the owner column of the contract table.
Next
|
  |
In the DQ Repository view:Right-click on Analyses and select New Analysis. Expand the Redundancy Analysis folder and select Column Content Comparison. Click Next. Name it prospect_check. Click Finish.
Next
|
  |
The new analysis displays on the workspace. Click on Select Columns for A set.
Next
|
  |
Select the cust table of the crm database. Select Code. Click OK.
Next
|
  |
For the selection of columns of the B set, repeat the previous steps and select owner of the contract table. Click OK.
Next
|
  |
Set the filter for the customer type ( prospect): Cust_type="prospect"
Next
|
  |
Save and run the analysis. View the results:There are three matching records: this means three prospects have a contract.
Next
|
  |
 | A customer should have at least one contract
|
Rule: A person tagged as a customer should have at least one contract. 1) Right-click on the propect_check analysis and select Duplicate. Open the copy of prospect_check analysis. 2) Change the name to customer_check.
Next
|
  |
In the A set, select the Code column from the cust table. In the B set, select owner from the contract table. Set the data filter for customer code: Cust_type="customer"
Next
|
  |
View the results:
- 29 customers are not linked to a contract.
- 3 contracts are not linked to a customer.
Next
|
  |
 | The date of a claim must be between its related contract begin and end dates
|
Expand the Libraries and Rules nodes. Right-click on SQL and select New DQ Rule. Name it claim_date_check. To be valid, the claim date ( Date column) must be between the contract begin ( Begin_dt) and end dates ( End_dt). In the DQ rule Where Clause, enter the following: Date >= Begin_dt and Date <= End_dt
Next
|
  |
Since the DQ rule concerns two tables ( claim and contract), you have to set the inner join between these tables. Press the green plus button to add a new join condition. Drag and drop the claim cont_num and contract number columns on the line. Save the DQ rule.
Next
|
  |
Create a new DQ Rule analysis. Name it claim_date_check. Click on Select tables to analyze and select the claim table. Click the icon under DQ Rule and apply the rule created in the previous step.
Next
|
  |
Save and run the analysis. Display and look at the results: 4% of the claims are not valid.
|