You are here: Processor Library > Audit > RegEx Check

RegEx Check

The RegEx Check processor checks the data in an attribute against reference lists of valid and invalid regular expressions for the attribute.

The case-sensitivity and matching technique (Whole Value / Contains / Starts With / Ends With) of the check can be controlled.

Note on Regular Expressions

Regular expressions are a standard technique for expressing patterns and manipulating Strings that are very powerful once mastered.

Tutorials and reference material about regular expressions are available on the Internet, including:

and in books, including:

There are also software packages available to help you master regular expressions, such as RegExBuddy, and online libraries of useful regular expressions, such as RegExLib.

Use

The RegEx Check processor is a powerful tool, allowing you to validate data according to its exact content, using the position of data, partial and exact values, and wild cards.

The RegEx Check is useful in order to check any data that should be in a consistent structure, for example, UK National Insurance Numbers.

The following are some examples of regular expressions that might be used to check data:

Regular Expression

Pattern meaning

^\d{5}$

5 integer US zip code

([A-Z]{1,2}[0-9]{1,2}|[A-Z]{3}|[A-Z]{1,2}[0-9][A-Z])( |-)[0-9][A-Z]{2}

Valid UK Postcode

^[A-CEGHJ-PR-TW-Z]{1}[A-CEGHJ-NPR-TW-Z]{1}[0-9]{6}[A-DFM]{0,1}$

Valid UK National Insurance number

^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Valid email address

Many more examples are available on the Internet and from other sources.

Configuration

Inputs

A single attribute that you wish to check based on lists of valid and invalid regular expressions (or both).

Options

Valid patterns

Option

Type

Purpose

Default Value

Reference Data

Reference Data (Regular Expressions Category)

List of valid regular expressions for the attribute

None

Regular expression

Free text

Allows a single regular expression for valid patterns to be specified without using Reference Data.

Note that if this option is used as well as Reference Data, all regular expressions (in both options) are used in the check.

None

Categorize unmatched as

Selection (Unknown/Invalid)

How to categorize values that do not match the list of valid regular expressions

Unknown

Invalid patterns

Option

Type

Purpose

Default Value

Reference Data

Reference Data (Regular Expressions Category)

List of invalid regular expressions for the attribute

None

Regular expression

Free text

Allows a single regular expression for invalid patterns to be specified without using Reference Data.

Note that if this option is used as well as Reference Data, all regular expressions (in both options) are used in the check.

None

Categorize unmatched as

Selection (Unknown/Valid)

How to categorize values that do not match the list of regular expressions

Unknown

Match options

Option

Type

Purpose

Default Value

Ignore case?

Yes/No

Drives whether or not to ignore case when matching the list(s)

Yes

Match list by

Selection

(Whole Value/Contains/Starts With/Ends With)

Drives how to match against the list(s)

Whole Value

Outputs

Data attributes

None

Flags

Flag attribute

Purpose

Possible Values

RegExValid

Indicates which data passes the RegExCheck: Valid RegEx, Invalid RegEx and Unknown.

Y/N/-

Publication to Dashboard

A RegEx Check's results may be published to the Dashboard.

The following interpretation of results is used by default:

Result

Dashboard Interpretation

Valid

Pass

Unknown

Warning

Invalid

Alert

Execution

Execution Mode

Supported

Batch

Yes

Real time Monitoring

Yes

Real time Response

Yes

Results Browsing

The RegEx Check produces a summary view of its results, showing the following statistics:

Statistic

Meaning

Valid records

The records that were categorized as Valid by the RegEx Check.

Unknown records

The records that were categorized as Unknown by the RegEx Check.

Invalid records

The records that were categorized as Invalid by the RegEx Check.

Drilling down on any of the above statistics reveals a count of the distinct values that were found to be Valid, Unknown or Invalid. You can then drill down again to see the records themselves.

Output Filters

The following output filters are available from a RegEx Check:

Example

In this example, a RegEx Check is used to check the format of an Account Number attribute (CU_ACCOUNT), using a Whole Value match against the following regular expression:

^([0-9]{2})(-)([0-9]{4,5})(-)([a-zA-Z]{2})

This regular expression dictates that values must start with exactly 2 digits, following by a hyphen, followed by either 4 or 5 digits, followed by another hyphen, followed by two letters.

Summary View

Drilldown on Invalid Records

Note that in this case a few records were found with OO (two capital letter Os) instead of 00 (two zeros).

Oracle ® Enterprise Data Quality Help version 9.0
Copyright © 2006,2011 Oracle and/or its affiliates. All rights reserved.