About Audit Processors |
Audit processors, or checks, check input data using business rules in order to assess whether or not it is fit for its business purpose.
The audit processors used to check data, and the rules used by them, are normally determined from the results of Profiling.
Audit processors categorize each input record as to whether it was valid or invalid according to the check. Invalid records may be handled separately from valid records in downstream processing, using an Output Filter - for example so that you only attempt to clean records that did not pass your check. Some audit processors, such as List Check, have three output filters - valid (the record passed the check), invalid (the record was positively identified as a failure), and unknown (the record was not recognized as either definitely valid, or definitely invalid).
Audit processors implicitly use the business rules that you apply to a given data attribute when profiling. For each type of business rule that you can apply, there is an audit processor - see the table below:
Type of Rule |
Example business rule |
Audit processor |
Whether or not the attribute is allowed to contain null values |
The CU_NO attribute must not be null |
|
The allowed or expected length of the data in the attribute |
The CU_ACCOUNT attribute must be between 10-11 characters in length, and must not contain spaces |
|
The data type consistency in an attribute |
There must be no numeric values in the NAME attribute |
|
The validity of values in an attribute |
Values in the TITLE attribute must match a list of valid titles |
|
The conformity to a standard character pattern |
Values in the TEL_NO attribute must conform to a standard pattern |
|
The conformity to a standard pattern, by regular expression |
UK National Insurance Numbers must match a standard regular expression |
|
The validity of specific characters in an attribute |
The values in a NAME attribute must not contain characters such as #~@;:/?.>,<%$£!^* |
|
Duplication of values in an attribute |
There must be no duplicate CU_NO values |
|
Whether or not the attribute contains any common user entry workarounds for mandatory fields |
There must be no values such as 'aaa' in the FORENAME attribute |
|
Check one attribute's value against another |
The DATE_OF_BIRTH attribute must be before the DATE_OF_DEATH attribute |
|
Check for related data in a reference table |
There must be at least one active Contact record for a Customer |
|
Check for data which passes a Logic expression |
There is a valid DATE_OF_BIRTH attribute and a valid Postcode and a valid email address |
|
Check that data has a specific value, or value range |
All male Customers must have a Gender value of 'M' |
|
Check that data conforms to a set of business rules, defined independently of OEDQ. |
If the customer is based in England, the post code must be present and must be in a valid format. |
Often, a number of such checks need to be applied to a given attribute. OEDQ allows you to group up sequences of processors and reuse them - for example to check data for the same attribute, or type of attribute, wherever it appears - either in the same project, or in other projects. To do this, group together a number of audit processors on the canvas, right-click on the group, and select the option to add the group to the palette. Provided the Reference Data used by the processors are available, you can add now add the grouped processor to another process, map the input attribute or attributes that you want to check, in order to reapply your business rules again and again. See Grouping processors.
In addition to the general-purpose audit processors above, OEDQ comes with a number of checks for specific attributes - for example, a GBR Postcode Format check, and an Email Address Check.
Note:If you cannot create your own check using the general purpose processors provided, you may either write your own check using javascript, or you may choose to extend OEDQ to add a new processor. |
Oracle ® Enterprise Data Quality Help version 9.0
Copyright ©
2006,2011 Oracle and/or its affiliates. All rights reserved.