Group and Merge |
The Group and Merge processor provides a simple way to deduplicate records, by grouping records using an attribute or attributes, and merging these records together, outputting records that are distinct across the selected grouping attributes. Unlike other matching processors, it does not offer the ability to configure complex matching. Records are simply grouped by an exact match on the selected grouping attributes.
Use Group and Merge as a simple and efficient way to output the distinct values for an attribute or attributes.
For example, if using OEDQ on a data extract, the extract may in fact have been generated as a join across a number of database tables. This will be shown if a key column has many duplicate values. In this case, it may well be useful to 'unjoin' the data by creating a set of data with a distinct key value.
Group and Merge is also very useful when generating Reference Data in an OEDQ process. For example, it might be useful to create a set of data with all the distinct Forename values that have passed a number of checks. The records that pass the checks can be fed into Group and Merge, with the Forename attribute used to group records. The output distinct Forename values can then be written to staged data and converted to Reference Data, or used directly in lookups. Note that the output MatchGroupSize attribute will act as a count of how many times each value occurred.
There are sometimes other reasons to group records, for example to sum all records with the same attribute value. Group and Merge can be used to do this, in conjunction with the ability to create custom output selectors.
Group and Merge is a type of matching processor. Matching processors consist of several sub-processors, where each sub-processor requires separate configuration. The following sub-processors make up the Group and Merge processor.
Click on the Sub-processor for detailed information about each step, and for configuration instructions:
Icon |
Sub-processor |
Description |
Select the attributes from the data stream to be grouped. |
||
Select the attributes to group records by. |
||
Use rules to merge grouped records. |
The Group and Merge processor accepts input attributes of any type, except Arrays. As with other matching processors, only attributes that are input will be output.
The inputs are configurable in the Input sub-processor.
All options are configured within the sub-processors above.
Note that Group and Merge groups records using a simple concatenation of the selected attributes for grouping, separating each attribute using a single space characters. This means that there may be records such as the two examples below that have the same data across the grouping attributes, but in a different structure, that will be grouped:
Grouping Attribute 1 | Grouping Attribute 2 |
John Smith | |
John | Smith |
If you want Group and Merge only to group records with exactly the same data values in all the attributes you are using to group by, it is best to use the Concatenate processor to create a grouping key attribute, separating data attributes with a delimiter character such as pipe which does not occur in the data values. You can then use this key attribute to group records in Group and Merge.
The merged output data stream is configured in the Merge sub-processor above.
It is possible to use a Group and Merge processor in a Real time Response process, provided the process contains only one match processor. However, it will only group and merge records within the same input message.
Execution Mode |
Supported |
Batch |
Yes |
Real time Monitoring |
Yes |
Real time Response |
Yes |
The Group and Merge processor produces a number of views of results as follows.
Groups View
The Groups view summarises the groups by size.
Statistic |
Meaning |
Group size |
Group size (number of records) |
Count |
The number of groups of the listed size. Drill down on the Count to see the merged records for each group. |
Merged Output View
The Merged Output is a Data View of the merged output from the Group and Merge processor; that is, the record set after grouped records have been merged together. The records that are output, and their attributes, will vary depending on the options set in the Merge sub-processor.
The Group and Merge processor has a single output filter - Merged - this corresponds to the Merged Output as above.
In this example, Group and Merge is used to group and merge all records with the same Name, Date of Birth and Email address.
Groups View
In this case, 3 Groups of 2 records are created and merged:
Drilling down on the 3 groups of 2 records shows the merged records for each group:
Oracle ® Enterprise Data Quality Help version 9.0
Copyright ©
2006,2011 Oracle and/or its affiliates. All rights reserved.