|
Pattern Transform |
The Pattern Transform processor provides a simple and user-friendly alternative to the use of regular expressions for transforming the format of data in a text attribute.
Pattern Transform uses a Reference Data set (Patterns Map) with two columns to reformat data matching a number of patterns (in the first ‘lookup’ column) to a new format or formats (in the second ‘map’ column).
Use Pattern Transform to standardize the format of data in attributes where data ought to, but does not, conform to a small number of standard formats, for example in Postcode, Account Number, or Product Serial Number attributes.
It is useful to use the Patterns Profiler (with the same Character Pattern Map, that is, the same way of generating character patterns) to find the invalid patterns that may exist in your data. These can then be added to the Patterns Map used in this processor, with each invalid pattern mapped to a valid pattern.
Note that Pattern Transform aims to provide as much flexibility as possible within the confines of using a simple map from one character pattern to another. There are some types of text transformation that will require the extra complexity of regular expressions (see Regex Replace).
The full logic of the processor is as follows:
Step |
Action |
1 |
Map all characters in the input value to pattern characters using the configured Character Pattern Map, to generate a pattern for the value (such as ‘AB1243-ZX’ -> ‘aaNNNN-aa’) |
2 |
Match the generated pattern against the Lookup column of the Patterns Map.
|
3 |
Where a pattern character appears in the Lookup column but not in the Map column, the underlying character is stripped from the value (for example, using a map of NN-a to NNa, the value 12-A is transformed to 12A) Where a pattern character does not appear in the Lookup column of the map, but does appear in the Map column, it is added as a literal character in the output value (for example, using a map of NNaa to NN-aa, the value 12AB is transformed to 12-AB) Where a pattern character appears in both the Lookup and Map columns of the Patterns Map:
|
Note that all characters mapping to the same pattern character must always appear in the same order in the final pattern (for example, it is possible to transform AB123CD to 123ABCD or 1ABC23, but NOT to transform AB123CD to BA123CD or AB213CD, using the default Character Pattern Map).
A single String or String Array attribute from which you wish to replace values using a patterns map.
Note that if you input an Array attribute, the transformation will apply to all array elements, and an Array attribute will be output.
Option |
Type |
Purpose |
Default Value |
Character Pattern Map |
Reference Data (Pattern Generation Category) |
To map characters in the input value to pattern characters, in order to generate a pattern to match against the Patterns list. |
*Character Pattern Map |
Patterns Map |
Reference Data (Patterns Category) |
The map of character patterns used to transform data. |
None |
Data attribute |
Type |
Purpose |
Value |
[Attribute Name].PatternTransformed |
Derived |
A new attribute with values replaced according to the Patterns Map. |
The original attribute value, transformed according to the Patterns Map. |
Flag |
Purpose |
Possible Values |
[Attribute Name].PatternTransformedFlag |
Stores the result of the PatternTransform operation on each record - that is, whether or not data was transformed |
Y - transformed N - not transformed |
Execution Mode |
Supported |
Batch |
Yes |
Real time Monitoring |
Yes |
Real time Response |
Yes |
Statistic |
Meaning |
Transformed |
The number of records that were transformed using the Patterns Map. |
Untransformed |
The number of records that were not transformed using the Patterns Map. |
The following output filters are available from the Pattern Transform processor:
In this example, the default Character Pattern Map is used to generate patterns, and the following Patterns Map is used to fix common format problems with UK Postcodes:
This transforms values as illustrated below:
Oracle ® Enterprise Data Quality Help version 9.0
Copyright ©
2006,2011 Oracle and/or its affiliates. All rights reserved.