About Transformation Processors |
Transformation processors take one or more input attributes, transform them, and output the transformed values in new attributes.
It is important to understand that transformers in OEDQ never change the input data directly. OEDQ allows you to see the effects of any transformations you apply before deciding how to use the transformed data. You may choose to use the transformed data in preference to the original data, for example before writing data out from a data cleansing process.
The most common use of transformation processors is to transform data before it is migrated to a new system, or for further data quality analysis, for example, before auditing or matching it. Transformation processors may therefore be used at any point in the process flow. You may decide, for example, to transform all text data to upper or lower case before performing any analysis, so that you are always insensitive of case.
Often, the transformations that you need to apply to the data are discovered during profiling and auditing. OEDQ therefore allows you to build transformation rules directly from the data itself. For example, you might find a set of records with an invalid value for an attribute. You can then create a Reference Data map directly from the data in order to replace the bad values with their corrected versions. You can then configure a Replace processor to use your new Reference Data map and create a new attribute with the bad values replaced.
The attributes that a transformation processor create may be Derived or Added, depending on the processor. It is important to understand this difference, as it affects the way your data flows work.
Derived Attributes are created by transformers that process each input attribute separately, and produce a new, transformed version of each input attribute. The new derived attribute will contain a transformed version of the data in the input attribute. Derived Attributes are always named in the default format [Input Attribute Name].Transformation, for example, Forename.Upper. Examples of processors that add Derived Attributes include:
Processor |
Creates Derived Attribute with default name |
[Attribute Name].Upper |
|
[Attribute Name].Trimmed |
|
[Attribute Name].Denoise |
|
[Attribute Name].Substring |
|
Replace |
[Attribute Name].Replaced |
Proper Case |
[Attribute Name].Proper |
When an attribute is transformed by a processor that adds a Derived attribute, the output attribute is named to reflect the transformation.
Downstream processors will use the latest value of the attribute for its input attribute, by default. For instance, if you insert a Denoise processor between the Reader and the Upper Case processor, the NAME attribute used as the input for the Upper case Processor will be the NAME.Denoise version of the attribute, rather than the original NAME attribute.
The
blue arrow icon indicates that the latest version of the attribute will
be used, including all the transformations that the attribute has
undergone.
This means that you do not necessarily have to get the order of processing right first time. Inserting an interim transformation before another can often be done without affecting any other processors.
Derived Attributes are displayed in the Results Browser next to the attributes that they were derived from, even if the name of the Derived Attribute is renamed from its default name format (for example, NAME.Upper is renamed to New_name).
Defined
attributes are indicated by a filled green circle. These refer to specific
versions of the attributes, such as NAME.Denoise, rather than the latest
version of an attribute.
Note: It is possible to select a defined attribute as the input for a downstream processor, rather than the latest version. In the processor configuration, under the blue arrow icon the user can expand each attribute to view the defined attributes which are available. In the example above NAME (the original source attribute) and NAME.Denoise are available. Any of the listed attributes may be selected as an input for the processor. |
Added attributes are created by transformers where the new attribute is not directly related to a single input attribute, or if there is a change of data type. Added attributes are created in the following cases:
Added Attributes are assigned a default name according to the transformation operation. For example, Concat is used for a concatenation. Examples of processors that add Added Attributes include:
Processor |
Creates Added Attribute with default name |
Concat |
|
Array |
|
MultipliedValue |
|
Add |
AddedValue |
Make Array from String |
ArrayFromString |
If you configure a processor that adds an attribute - either Derived or Added - that is named in the format [Input Attribute].[Output], the output attribute(s) created by the processor will be renamed if the input attributes are changed. This applies to all processors that add Derived attributes, and also some processors that add Added Attributes, where the outputs are related to the input attribute(s), but where there is a reason not to add a derived attribute. This is normally because there has been a change of data type, meaning that an Added, rather than Derived attribute has to be created, because otherwise the inputs to downstream processors could be invalidated.
This applies to the following processors:
Processor |
Creates Added Attribute with default name |
[Input Attribute].NumberToString |
|
[Input Attribute].DateToString |
|
Convert String to Date |
[Input Attribute].StringToDate |
Convert String to Number |
[Input Attribute].StringToNumber |
Oracle ® Enterprise Data Quality Help version 9.0
Copyright ©
2006,2011 Oracle and/or its affiliates. All rights reserved.