RegEx Check |
The RegEx Check processor checks the data in an attribute against reference lists of valid and invalid regular expressions for the attribute.
The case-sensitivity and matching technique (Whole Value / Contains / Starts With / Ends With) of the check can be controlled.
Note on Regular Expressions
Regular expressions are a standard technique for expressing patterns and manipulating Strings that are very powerful once mastered.
Tutorials and reference material about regular expressions are available on the Internet, including:
and in books, including:
There are also software packages available to help you master regular expressions, such as RegExBuddy, and online libraries of useful regular expressions, such as RegExLib.
The RegEx Check processor is a powerful tool, allowing you to validate data according to its exact content, using the position of data, partial and exact values, and wild cards.
The RegEx Check is useful in order to check any data that should be in a consistent structure, for example, UK National Insurance Numbers.
The following are some examples of regular expressions that might be used to check data:
Regular Expression |
Pattern meaning |
^\d{5}$ |
5 integer US zip code |
([A-Z]{1,2}[0-9]{1,2}|[A-Z]{3}|[A-Z]{1,2}[0-9][A-Z])( |-)[0-9][A-Z]{2} |
Valid UK Postcode |
^[A-CEGHJ-PR-TW-Z]{1}[A-CEGHJ-NPR-TW-Z]{1}[0-9]{6}[A-DFM]{0,1}$ |
Valid UK National Insurance number |
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$ |
Valid email address |
Many more examples are available on the Internet and from other sources.
A single attribute that you wish to check based on lists of valid and invalid regular expressions (or both).
Valid patterns
Option |
Type |
Purpose |
Default Value |
Reference Data |
Reference Data (Regular Expressions Category) |
List of valid regular expressions for the attribute |
None |
Regular expression |
Free text |
Allows a single regular expression for valid patterns to be specified without using Reference Data. Note that if this option is used as well as Reference Data, all regular expressions (in both options) are used in the check. |
None |
Categorize unmatched as |
Selection (Unknown/Invalid) |
How to categorize values that do not match the list of valid regular expressions |
Unknown |
Invalid patterns
Option |
Type |
Purpose |
Default Value |
Reference Data |
Reference Data (Regular Expressions Category) |
List of invalid regular expressions for the attribute |
None |
Regular expression |
Free text |
Allows a single regular expression for invalid patterns to be specified without using Reference Data. Note that if this option is used as well as Reference Data, all regular expressions (in both options) are used in the check. |
None |
Categorize unmatched as |
Selection (Unknown/Valid) |
How to categorize values that do not match the list of regular expressions |
Unknown |
Match options
Option |
Type |
Purpose |
Default Value |
Ignore case? |
Yes/No |
Drives whether or not to ignore case when matching the list(s) |
Yes |
Match list by |
Selection (Whole Value/Contains/Starts With/Ends With) |
Drives how to match against the list(s) |
Whole Value |
None
Flag attribute |
Purpose |
Possible Values |
RegExValid |
Indicates which data passes the RegExCheck: Valid RegEx, Invalid RegEx and Unknown. |
Y/N/- |
A RegEx Check's results may be published to the Dashboard.
The following interpretation of results is used by default:
Result |
|
Valid |
Pass |
Unknown |
Warning |
Invalid |
Alert |
Execution Mode |
Supported |
Batch |
Yes |
Real time Monitoring |
Yes |
Real time Response |
Yes |
The RegEx Check produces a summary view of its results, showing the following statistics:
Statistic |
Meaning |
Valid records |
The records that were categorized as Valid by the RegEx Check. |
Unknown records |
The records that were categorized as Unknown by the RegEx Check. |
Invalid records |
The records that were categorized as Invalid by the RegEx Check. |
Drilling down on any of the above statistics reveals a count of the distinct values that were found to be Valid, Unknown or Invalid. You can then drill down again to see the records themselves.
The following output filters are available from a RegEx Check:
In this example, a RegEx Check is used to check the format of an Account Number attribute (CU_ACCOUNT), using a Whole Value match against the following regular expression:
^([0-9]{2})(-)([0-9]{4,5})(-)([a-zA-Z]{2})
This regular expression dictates that values must start with exactly 2 digits, following by a hyphen, followed by either 4 or 5 digits, followed by another hyphen, followed by two letters.
Summary View
Drilldown on Invalid Records
Note that in this case a few records were found with OO (two capital letter Os) instead of 00 (two zeros).
Oracle ® Enterprise Data Quality Help version 9.0
Copyright ©
2006,2011 Oracle and/or its affiliates. All rights reserved.