You are here: Processor Library > Profiling > Patterns Profiler

Patterns Profiler

The Patterns Profiler analyzes data values in any number of String attributes and assigns them patterns according to the sequence of character types. For example, the value "10 Lowestoft Lane" is assigned a pattern of NN_aaaaaaaaa_aaaa, using the default Pattern Map reference list.

Note: The default *Pattern Map is designed for use with Latin-1 encoded data, but you can create new pattern maps that are suited to the character-encoding of your data, including multi-byte Unicode (hexadecimal) character references.

The Profiler then counts up the number of times each pattern occurs in each attribute, and presents its results.

Use

Use the Patterns Profiler to uncover the patterns in your data, and to create reference lists of valid and invalid patterns that can be used to validate the data on an ongoing basis, using a Check Pattern processor.

Configuration

Inputs

Any String attributes that you wish to analyze for data patterns.

Options

Option

Type

Purpose

Default Value

Character Pattern Map

Reference Data (Pattern Generation Category)

Maps each character to a pattern character

*Character Pattern Map

The default Standard Pattern Map maps characters as follows:

Character type

Representation in pattern

Alpha characters (a-z, or A-Z)

a

Number characters (0-9)

N

Punctuation characters, such as semi-colons, commas

Represented as they are

Control characters (for example, carriage returns)

C

Space

_

Characters that are not recognized by the Character Pattern Map are represented with a question mark (?) in each pattern.

You can use a different Character Pattern Map to map characters as you wish - for example to represent unusual letters such as x and z differently from more common letters.

Outputs

Data attributes

None

Flags

Flag attribute

Purpose

Possible Values

[Attribute name].Pattern

Indicates the pattern of the attribute

Patterns defined by the Pattern Map Reference data

Execution

Execution Mode

Supported

Batch

Yes

Real time Monitoring

Yes

Real time Response

Yes

Results Browsing

The Patterns Profiler presents the following statistics for each attribute it analyzes.

Note that each attribute is shown in a separate tab in the Results Browser.

Statistic

Meaning

Pattern

The generated pattern for each value

Length

The length of each generated pattern; that is, the number of characters in each value

Count

The number of records with values in the attribute that matched the pattern

%

The percentage of records with values in the attribute that matched the pattern

Example

In this example, the Patterns Profiler is used to analyze patterns in all attributes of a table of Customer records. For each attribute, the following type of view is generated:

By sorting the view by the Count column, you can quickly find the most common and least common patterns in the data, enabling you to construct valid and invalid patterns lists for use in a Pattern Check.

Oracle ® Enterprise Data Quality Help version 9.0
Copyright © 2006,2011 Oracle and/or its affiliates. All rights reserved.