You are here: Processor Library > Audit > Duplicate Check

Duplicate Check

The Duplicate Check processor provides a simple way of checking for duplicate values across either one or many attributes.

Use

Use the Duplicate Check to identify any duplicate values that may cause a problem for a data migration (for example, in key attributes), or as an initial check for duplicate records in the data.

Configuration

Inputs

All attributes that you wish to consider in the duplicate check. Records will be identified as duplicates if they are the same in all input attributes.

Options

Option

Type

Purpose

Default Value

Consider all no data as duplicates?

Yes/No

 

Drives whether or not values that have no data in all attributes are considered as duplicates

Yes

 

Ignore case?

 

Yes/No

 

Drives whether or not the duplicate check should be case sensitive.

No

 

Outputs

Data attributes

None

Flags

Flag attribute

Purpose

Possible Values

DuplicateFlag

Indicates which data passes the Duplicate Check

Y/N

Publication to Dashboard

A Duplicate Check's results may be published to the Dashboard.

The following interpretation of results is used by default:

Result

Dashboard Interpretation

Not duplicated

 

Pass

Duplicate

Alert

Execution

Execution Mode

Supported

Batch

Yes

Real time Monitoring

Yes

Real time Response

No

The Duplicate Check assesses duplication across a batch of records. It must therefore run to completion before its results are available, and is not suitable for a process that requires a real time response.

When executed against a batch of transactions from a real time data source, it will finish its processing when the commit point (transaction or time limit) configured on the Read Processor is reached. The statistics returned will indicate the number of duplicates in the batch of transactions only.

Results Browsing

The Duplicate Check produces a summary view of its results, showing the following statistics:

Statistic

Meaning

Duplicated

The records that were duplicated in the input attributes. Drill down to see each distinct value, and the number of times it occurred. Drill down again to see the records.

Not duplicated

The records that were not duplicate in the input attributes

Output Filters

The following output filters are available from a List Check:

Example

In this example, the Duplicate Check processor is used to look for duplicate company names in a BUSINESS attribute:

Summary View

Drilldown on Duplicated values

 

Oracle ® Enterprise Data Quality Help version 9.0
Copyright © 2006,2011 Oracle and/or its affiliates. All rights reserved.