You are here: Advanced Features > Matching concept guide

Matching Concept Guide

Why people need matching

The need to match and reconcile information from one or more business applications can arise in numerous ways. For example:

Why matching can be complex

 Defining whether records should match each other is not always simple. Consider the following two records:

They are different in every database field, but on inspection there are clearly similarities between the records. For example:

Making a decision as to whether we should treat these as "the same" depends on factors such as:

Effective matching requires tools that are much more sophisticated than traditional data analysis techniques that assume a high degree of completeness and correctness in the source data. It also demands that the business context of how the information is to be used is included in the decision-making process. For example, should related individuals at the same address be considered as one customer, or two?

How Oracle solves the problem

Oracle provides a set of matching processors that are suited to the most common business problems that require matching. The matching processors use a number of logical stages and simple concepts that correspond with the way users think about matching:

Identifiers

Rather than forcing users to express matching rules at the field by field level, OEDQ's matching processors exploit the powerful concept of Identifiers.

Identifiers allow the user to map related fields into real world entities and deliver a range of key benefits:

Clustering

Clustering is a necessary part of matching, used to divide up data sets into clusters, so that a match processor does not attempt to compare every record with every other record.

In OEDQ, you can configure many clusters, using many identifiers, in the same match processor, so that you are not reliant on the data having pre-formed cluster keys.

To read more about clustering, see the Clustering Concept Guide.

Comparisons

Comparisons are replaceable algorithms that compare identifier values with each other and deliver comparison results. The nature of result delivered is dependent on the comparison. For example, a comparison result could be simply True (match), or False (no match), or may be a percentage value indicating match strength:

Match rules

Match rules offer a way to interpret comparison results according to their business significance. Any number of ordered rules can be configured in order to interpret the comparison results. Each rule can result in one of three decisions:

The use of match rules form a rule table across all comparisons to determine the decision of the match operation, for example:

Using pre-constructed match processes

OEDQ is designed to allow you to construct new matching processes quickly and easily rather than depending on pre-configured matching processes that are not optimized for your data and specific matching requirements, and which will be more difficult to modify.

However, in some cases, a matching template can be used to learn how matching in OEDQ works, and in order to provide very fast initial results to give an indication of the level of duplication in your data.

Configurability and Extensibility

OEDQ comes with a highly configurable and tuneable library of matching algorithms, allowing users to refine their matching process to achieve the best possible results with their data.

In addition, OEDQ provides the ability to define new matching algorithms and approaches. The “best” matching functions depend entirely on the problem being addressed and the nature of the data being matched. All key elements of the matching process can use extended components. For example:

The combination of configurability and extensibility ensures that the optimal solution can be deployed in the shortest possible time.

See the topic Extending OEDQ for more information about adding matching extensions into the application.

Key Features

Key features of matching in OEDQ include:

Oracle ® Enterprise Data Quality Help version 9.0
Copyright © 2006,2011 Oracle and/or its affiliates. All rights reserved.