Manage duplicate records using SQL
Semarchy xDM supports duplicate management using SQL to programmatically perform actions that are typically available to users in an application’s duplicate manager.
This page provides the methods to perform various duplicate management operations using SQL with the UM
table.
Each method involves using a load ID (stored in a vLoadID
variable), which is initialized and submitted through a process similar to the one used for data publishing.
Make sure to use a job that processes the entities for which the match and merge will run, along with their related (child) entities. |
Consider leveraging continuous loads for executing duplicate management operations as well. |
Confirm matches
This method automatically and programmatically confirms certain matches. It mirrors the behavior of the user interface.
Below are the columns to load in the UM
table:
-
B_LOADID
: load this column with either the load ID provided by theGET_NEW_LOADID
function call or the continuous load ID. Use theGET_CONTINUOUS_LOADID
function to retrieve a continuous load’s ID from its name. -
B_SOURCEID
andB_PUBID
: load these columns with the ID and publisher of the master records that require confirmation. -
B_CLASSNAME
: load this column with the entity name for the records. -
<GoldenID>
: load this column with the ID of the golden record associated with the master records. -
B_CONFIRMEDSDPK
: load this column with the same value as<GoldenID>
.
INSERT INTO UM_CUSTOMER (
B_LOADID,
B_SOURCEID,
B_PUBID,
B_CLASSNAME,
CUSTOMER_ID,
B_CONFIRMEDSDPK
)
SELECT
vLoad_id,
B_SOURCEID,
B_PUBID,
'Customer',
CUSTOMER_ID,
CUSTOMER_ID FROM MD_CUSTOMER
WHERE
B_CONFIRMATIONSTATUS = 'NOT_CONFIRMED';
Force a new record to merge with a golden record
When loading a source record, use this method to simultaneously load a user decision that forces this record to either merge with a specific golden record or create a new golden record with the specified ID.
To execute this operation, combine data loading with the technique previously mentioned.
-- Load source record
INSERT into SD_CUSTOMER (
B_LOADID,
B_SOURCEID,
B_PUBID,
B_CLASSNAME,
-- Data fields are below
CUSTOMER_NAME,
...
) values (
vLoad_id,
123456,
'CRM',
'Customer',
-- Values are below
'GADGETRON',
...
)
-- Force merging this source record with a known golden record ID
INSERT INTO UM_CUSTOMER (
B_LOADID,
B_SOURCEID,
B_PUBID,
B_CLASSNAME,
CUSTOMER_ID,
B_CONFIRMEDSDPK
)
SELECT
vLoad_id,
123456,
'CRM',
'Customer',
2, -- Target golden record ID
2 -- Confirm the golden record ID from MD_CUSTOMER
COMMIT;
Transition master and golden records to golden records
This method automates the transition of master records to golden records. It mirrors the behavior of the user interface for transitioning master records to golden records or merging golden records into other golden records.
Below are the columns to load in the UM
table:
-
B_LOADID
: load this column with either the load ID provided by theGET_NEW_LOADID
function call or the continuous load ID. -
B_SOURCEID
andB_PUBID
: load these columns with the ID and publisher of the master records intended for transitioning. -
B_CLASSNAME
: load this column with the entity name for the records. -
<GoldenID>
: load this column with the ID of the target golden record. -
B_CONFIRMEDSDPK
: set this column to the<GoldenID>
value to mark the match as confirmed. -
B_XGRP
: load this column with a UUID value representing the exclusion group. The value should be consistent for all records within the same exclusion group (i.e., records that previously matched and should no longer match) and depends on the exclusion groups of the master and golden records:-
If neither the master record being transitioned nor the target golden record belongs to an exclusion group, this value should be null.
-
If only the target golden record has an exclusion group, this value should be the UUID of this exclusion group.
-
If only the master record being transitioned has an exclusion group, this value should be the UUID of this exclusion group.
-
If the master and golden have different exclusion groups, then both these exclusion groups need to be merged into a new one. The value to load should be a new UUID, potentially generated using the
SYS_GUID()
Oracle function. Additionnally, all the records from the two original exclusion groups should be loaded into theUM
table with this new exclusion group.
-
INSERT INTO UM_CUSTOMER (
B_LOADID,
B_SOURCEID,
B_PUBID,
B_CLASSNAME,
CUSTOMER_ID,
B_CONFIRMEDSDPK,
B_XGRP
)
SELECT
vLoad_id,
B_SOURCEID,
B_PUBID,
'Customer',
2, -- Target golden record ID
2, -- Confirm the golden record ID
NULL -- Neither master nor golden record is in an exclusion group
FROM MD_CUSTOMER
WHERE
CUSTOMER_ID = 1;
Similarly, to split a golden record by transferring its master records to new golden records, you do not specify an existing golden record ID. Instead, you seed a new golden record ID using the golden record ID generation method for the entity.
-- Store a new SYS_GUID value in a raw (16) variable that is used to create a new exclusion group
select SYS_GUID() into vSysguidXGrp from dual;
-- Split the master records using this new exclusion group
INSERT INTO UM_CUSTOMER (
B_LOADID,
B_SOURCEID,
B_PUBID,
B_CLASSNAME,
CUSTOMER_ID,
B_CONFIRMEDSDPK,
B_XGRP
)
SELECT
vLoad_id,
B_SOURCEID,
B_PUBID,
'Customer',
SEQ_CUSTOMER.NEXTVAL, -- New golden record ID generated by a sequence
SEQ_CUSTOMER.CURRVAL, -- The same golden record ID is used to confirm
vSysguidXGrp -- New exclusion group ID
FROM MD_CUSTOMER WHERE
CUSTOMER_ID = 5;
Reset user decisions
This method nullifies previous user decisions made with a duplicate manager and forces master records to merge according to the matcher’s decisions.
Below are the columns to load into the UM
table:
-
B_LOADID
: load this column with either the load ID provided by theGET_NEW_LOADID
function call or the continuous load ID. -
B_SOURCEID
andB_PUBID
: load these columns with the ID and publisher of the master records to reconsider for matching. -
B_CLASSNAME
: load this column with the entity name for the records.
|
INSERT INTO UM_CUSTOMER (
B_LOADID,
B_SOURCEID,
B_PUBID,
B_CLASSNAME
)
SELECT
vLoad_id,
B_SOURCEID,
B_PUBID,
'Customer'
FROM MD_CUSTOMER;
Replay matching
The previous method solely removes user decisions without triggering the match rules to run again.
To force the entire matching process to replay, follow these steps:
-
Reset user decisions, as described in the previous method.
-
Reload the data to re-match into the
SD
table.
-- Remove user decisions
INSERT INTO UM_CUSTOMER (
B_LOADID,
B_SOURCEID,
B_PUBID,
B_CLASSNAME
)
SELECT
vLoad_id,
B_SOURCEID,
B_PUBID,
'Customer'
FROM MI_CUSTOMER
-- Add a clause to filter the records to re-process below
-- WHERE MI_CUSTOMER.CUSTOMER_ID = ??
;
-- Reload data from the latest master record values
INSERT into SD_CUSTOMER (
B_LOADID,
B_PUBID,
B_SOURCEID,
B_CLASSNAME,
-- Data fields are below
CUSTOMER_NAME,
TOTAL_REVENUE
...
)
SELECT
vLoad_id,
B_PUBID,
B_SOURCEID
-- Data fields are below
CUSTOMER_NAME,
TOTAL_REVENUE
...
FROM MI_CUSTOMER
-- Add a clause to filter the records to re-process below
-- WHERE MI_CUSTOMER.CUSTOMER_ID = ??
;
COMMIT;