|
Duplicate Check |
The Duplicate Check processor provides a simple way of checking for duplicate values across either one or many attributes.
Use the Duplicate Check to identify any duplicate values that may cause a problem for a data migration (for example, in key attributes), or as an initial check for duplicate records in the data.
All attributes that you wish to consider in the duplicate check. Records will be identified as duplicates if they are the same in all input attributes.
Option |
Type |
Purpose |
Default Value |
Consider all no data as duplicates? |
Yes/No
|
Drives whether or not values that have no data in all attributes are considered as duplicates |
Yes
|
Ignore case?
|
Yes/No
|
Drives whether or not the duplicate check should be case sensitive. |
No
|
None
Flag attribute |
Purpose |
Possible Values |
DuplicateFlag |
Indicates which data passes the Duplicate Check |
Y/N |
A Duplicate Check's results may be published to the Dashboard.
The following interpretation of results is used by default:
Result |
|
Not duplicated
|
Pass |
Duplicate |
Alert |
Execution Mode |
Supported |
Batch |
Yes |
Real time Monitoring |
Yes |
Real time Response |
No |
The Duplicate Check assesses duplication across a batch of records. It must therefore run to completion before its results are available, and is not suitable for a process that requires a real time response.
When executed against a batch of transactions from a real time data source, it will finish its processing when the commit point (transaction or time limit) configured on the Read Processor is reached. The statistics returned will indicate the number of duplicates in the batch of transactions only.
The Duplicate Check produces a summary view of its results, showing the following statistics:
Statistic |
Meaning |
Duplicated |
The records that were duplicated in the input attributes. Drill down to see each distinct value, and the number of times it occurred. Drill down again to see the records. |
Not duplicated |
The records that were not duplicate in the input attributes |
The following output filters are available from a List Check:
In this example, the Duplicate Check processor is used to look for duplicate company names in a BUSINESS attribute:
Summary View
Drilldown on Duplicated values
Oracle ® Enterprise Data Quality Help version 9.0
Copyright ©
2006,2011 Oracle and/or its affiliates. All rights reserved.