Custom comparisons may be added into the match library - they are added to widgets.xml in the same way as processors (widgets). The only limitation is that a comparison must have exactly two inputs and one output. Outputs must be either strings (for Boolean comparisons) or numbers (for comparisons that use Result Bands). Boolean comparisons return “T” for True or “F” for False.
Each custom comparison must be associated with an identifier type - either an existing type (String, Number or Date), or a custom type - see Example custom identifier type.
Comparison gadgets must be associated for use with specific Identifier types. If you wish to associate new comparisons with existing system Identifiers, their names are:
dnm:string for Strings
dnm:number for Numbers
dnm:date for Dates
The following example xml represents a comparison association added to matchlibrary.xml:
<identifierComparison>
<ident>dnm:string</ident>
<gadget>dnm:exactstringmatch</gadget>
</identifierComparison>
This associates the identifier “dnm:string” with the comparison “dnm:exactstringmatch”.
The following xml represents a comparison default result band added to matchlibrary.xml for the ‘String Edit Distance’ comparison:
<comparisonReturn>
<widgetId>dnm:stringeditdistance</widgetId>
<resultBand name="exact" label="Exact Match">0</resultBand>
<resultBand name="onetypo" label="One Typo">1</resultBand>
<resultBand name="twotypos" label="Two Typos">2</resultBand>
<resultBand name="threetypos" label="Three Typos">3</resultBand>
</comparisonReturn>
The following example files may be packaged in a JAR file and used to add a custom 'Character Transposition Match' comparison to the match library. The Character Transposition Match comparison matches strings where character transpositions have occurred. For example, when comparing the values 'Michael' and 'Micheal', a single transposition will be counted, so the two values will match if the Maximum allows transpositions option is set to 1 or higher:
<?xml version="1.0" encoding="UTF-8"?>
<!--
Custom Match Library Extension
Copyright 2008 Oracle Ltd. All rights reserved.
-->
<matchLibrary>
<identifierComparison>
<ident>dnm:string</ident>
<gadget>dn:characterTranspositionMatch</gadget>
</identifierComparison>
</matchLibrary>
<?xml version="1.0" encoding="UTF-8"?>
<widgets>
<comment>Oracle Match example script widgets</comment>
<copyright>Copyright 2008 Oracle Ltd. All rights reserved.</copyright>
<widget id="dn:characterTranspositionMatch" class="com.datanomic.director.match.library.util.JavaScriptGadget">
<guidata>
<label>%characterTranspositionMatch.gadget</label>
<group>compare</group>
<icon>script</icon>
</guidata>
<!-- inputs -->
<inputs>
<input id="1" type="string" maxattributes="1">
<guidata><label>label1</label></guidata>
</input>
<input id="2" type="string" maxattributes="1">
<guidata><label>label1</label></guidata>
</input>
</inputs>
<!-- outputs -->
<outputs cardinality="1:1">
<output id="1" type="string" name="result">
<guidata><label>resultlabel</label></guidata>
</output>
</outputs>
<properties>
<property name="matchNoDataPairs" type="boolean" required="true">
<guidata>
<label>%characterTranspositionMatch.property.matchNoDataPairs.label</label>
</guidata>
<default>false</default>
</property>
<property name="ignoreCase" type="boolean" required="true">
<guidata>
<label>%characterTranspositionMatch.property.ignoreCase.label</label>
</guidata>
<default>true</default>
</property>
<property name="startsWith" type="boolean" required="true">
<guidata>
<label>%characterTranspositionMatch.property.startsWith.label</label>
</guidata>
<default>false</default>
</property>
<property name="maxAllowedTranspositions" type="number" required="true">
<guidata>
<label>%characterTranspositionMatch.property.maxAllowedTranspositions.label</label>
</guidata>
<default>1</default>
</property>
</properties>
<parameters>
<parameter name="script">
<![CDATA[
function S(s)
{
return (s == null) ? "" : s;
}
function doit()
{
// no data pairs
if (S(input1) == "" | S(input2) == "")
{
if (matchNoDataPairs)
output1 = "T";
else
output1 = "F";
return;
}
if (!startsWith)
{
if (input1.length != input2.length)
{
output1 = "F";
return;
}
}
var transpositions = 0;
var longword = input1.length > input2.length ? input1 : input2;
var shortword = input1.length > input2.length ? input2 : input1;
if (ignoreCase)
{
// convert to uppercase
longword = longword.toUpperCase();
shortword = shortword.toUpperCase();
}
for (var i = 0; i < shortword.length; i++)
{
if (shortword[i] != longword[i])
{
// are we at the end of the string?
if (i == shortword.length - 1)
{
output1 = "F";
return;
}
// not a transposition match?
if (shortword[i] != longword[i + 1])
{
output1 = "F";
return;
}
// compare the next character
if (shortword[i + 1] != longword[i])
{
output1 = "F";
return;
}
transpositions++;
// too many transpositions?
if (transpositions > maxAllowedTranspositions)
{
output1 = "F";
return;
}
// skip over the characters
i++;
}
}
output1 = "T";
}
]]>
</parameter>
<parameter name="function">doit</parameter>
</parameters>
</widget>
</widgets>
[This file was not required in this case as the comparison does not support result bands, and does not require new identifiers.]
characterTranspositionMatch.gadget = Character Transposition Match
characterTranspositionMatch.property.matchNoDataPairs.label = Match No Data pairs?
characterTranspositionMatch.property.ignoreCase.label = Ignore case?
characterTranspositionMatch.property.startsWith.label = Starts with?
characterTranspositionMatch.property.maxAllowedTranspositions.label = Maximum allowed transpositions
name=Character Transposition Match
version=v8.1.3.(175)
title=Character Transposition Match
type=GADGET
Oracle ® Enterprise Data Quality Help version 9.0
Copyright ©
2006,2012, Oracle and/or its affiliates. All rights reserved.