Patterns Profiler |
The Patterns Profiler analyzes data values in any number of String attributes and assigns them patterns according to the sequence of character types. For example, the value "10 Lowestoft Lane" is assigned a pattern of NN_aaaaaaaaa_aaaa, using the default Pattern Map reference list.
Note: The default *Pattern Map is designed for use with Latin-1 encoded data, but you can create new pattern maps that are suited to the character-encoding of your data, including multi-byte Unicode (hexadecimal) character references. |
The Profiler then counts up the number of times each pattern occurs in each attribute, and presents its results.
Use the Patterns Profiler to uncover the patterns in your data, and to create reference lists of valid and invalid patterns that can be used to validate the data on an ongoing basis, using a Check Pattern processor.
Any String attributes that you wish to analyze for data patterns.
Option |
Type |
Purpose |
Default Value |
Reference Data (Pattern Generation Category) |
Maps each character to a pattern character |
*Character Pattern Map |
The default Standard Pattern Map maps characters as follows:
Character type |
Representation in pattern |
Alpha characters (a-z, or A-Z) |
a |
Number characters (0-9) |
N |
Punctuation characters, such as semi-colons, commas |
Represented as they are |
Control characters (for example, carriage returns) |
C |
Space |
_ |
Characters that are not recognized by the Character Pattern Map are represented with a question mark (?) in each pattern.
You can use a different Character Pattern Map to map characters as you wish - for example to represent unusual letters such as x and z differently from more common letters.
None
Flag attribute |
Purpose |
Possible Values |
[Attribute name].Pattern |
Indicates the pattern of the attribute |
Patterns defined by the Pattern Map Reference data |
Execution Mode |
Supported |
Batch |
Yes |
Real time Monitoring |
Yes |
Real time Response |
Yes |
The Patterns Profiler presents the following statistics for each attribute it analyzes.
Note that each attribute is shown in a separate tab in the Results Browser.
Statistic |
Meaning |
Pattern |
The generated pattern for each value |
Length |
The length of each generated pattern; that is, the number of characters in each value |
Count |
The number of records with values in the attribute that matched the pattern |
% |
The percentage of records with values in the attribute that matched the pattern |
In this example, the Patterns Profiler is used to analyze patterns in all attributes of a table of Customer records. For each attribute, the following type of view is generated:
By sorting the view by the Count column, you can quickly find the most common and least common patterns in the data, enabling you to construct valid and invalid patterns lists for use in a Pattern Check.
Oracle ® Enterprise Data Quality Help version 9.0
Copyright ©
2006,2011 Oracle and/or its affiliates. All rights reserved.