Manage duplicate records using SQL

Semarchy xDM supports duplicate management using SQL to programmatically perform actions that are typically available to users in an application’s duplicate manager.

This page provides the methods to perform various duplicate management operations using SQL with the UM table.

Each method involves using a load ID (stored in a vLoadID variable), which is initialized and submitted through a process similar to the one used for data publishing.

Make sure to use a job that processes the entities for which the match and merge will run, along with their related (child) entities.
Consider leveraging continuous loads for executing duplicate management operations as well.

Confirm matches

This method automatically and programmatically confirms certain matches. It mirrors the behavior of the user interface.

Below are the columns to load in the UM table:

  • B_LOADID: load this column with either the load ID provided by the GET_NEW_LOADID function call or the continuous load ID. Use the GET_CONTINUOUS_LOADID function to retrieve a continuous load’s ID from its name.

  • B_SOURCEID and B_PUBID: load these columns with the ID and publisher of the master records that require confirmation.

  • B_CLASSNAME: load this column with the entity name for the records.

  • <GoldenID>: load this column with the ID of the golden record associated with the master records.

  • B_CONFIRMEDSDPK: load this column with the same value as <GoldenID>.

Example 1. Confirm all unconfirmed master records
INSERT INTO UM_CUSTOMER (
	B_LOADID,
	B_SOURCEID,
	B_PUBID,
	B_CLASSNAME,
	CUSTOMER_ID,
	B_CONFIRMEDSDPK
)
SELECT
	vLoad_id,
	B_SOURCEID,
	B_PUBID,
	'Customer',
	CUSTOMER_ID,
	CUSTOMER_ID FROM MD_CUSTOMER
WHERE
  B_CONFIRMATIONSTATUS = 'NOT_CONFIRMED';

Force a new record to merge with a golden record

When loading a source record, use this method to simultaneously load a user decision that forces this record to either merge with a specific golden record or create a new golden record with the specified ID.

To execute this operation, combine data loading with the technique previously mentioned.

Example 2. Load a new customer record and force it to merge with existing golden record with CUSTOMER_ID 2
-- Load source record
INSERT into SD_CUSTOMER (
  B_LOADID,
  B_SOURCEID,
  B_PUBID,
  B_CLASSNAME,
  -- Data fields are below
  CUSTOMER_NAME,
  ...

) values (
  vLoad_id,
  123456,
  'CRM',
  'Customer',
  -- Values are below
  'GADGETRON',
  ...

)

-- Force merging this source record with a known golden record ID
INSERT INTO UM_CUSTOMER (
  B_LOADID,
  B_SOURCEID,
  B_PUBID,
  B_CLASSNAME,
  CUSTOMER_ID,
  B_CONFIRMEDSDPK
)
SELECT
  vLoad_id,
  123456,
  'CRM',
  'Customer',
  2,   -- Target golden record ID
  2    -- Confirm the golden record ID from MD_CUSTOMER

COMMIT;

Transition master and golden records to golden records

This method automates the transition of master records to golden records. It mirrors the behavior of the user interface for transitioning master records to golden records or merging golden records into other golden records.

Below are the columns to load in the UM table:

  • B_LOADID: load this column with either the load ID provided by the GET_NEW_LOADID function call or the continuous load ID.

  • B_SOURCEID and B_PUBID: load these columns with the ID and publisher of the master records intended for transitioning.

  • B_CLASSNAME: load this column with the entity name for the records.

  • <GoldenID>: load this column with the ID of the target golden record.

  • B_CONFIRMEDSDPK: set this column to the <GoldenID> value to mark the match as confirmed.

  • B_XGRP: load this column with a UUID value representing the exclusion group. The value should be consistent for all records within the same exclusion group (i.e., records that previously matched and should no longer match) and depends on the exclusion groups of the master and golden records:

    • If neither the master record being transitioned nor the target golden record belongs to an exclusion group, this value should be null.

    • If only the target golden record has an exclusion group, this value should be the UUID of this exclusion group.

    • If only the master record being transitioned has an exclusion group, this value should be the UUID of this exclusion group.

    • If the master and golden have different exclusion groups, then both these exclusion groups need to be merged into a new one. The value to load should be a new UUID, potentially generated using the SYS_GUID() Oracle function. Additionnally, all the records from the two original exclusion groups should be loaded into the UM table with this new exclusion group.

Example 3. Combining golden record with CUSTOMER_ID 1 and golden record with CUSTOMER_ID 2 by transferring all its master records
INSERT INTO UM_CUSTOMER (
	B_LOADID,
	B_SOURCEID,
	B_PUBID,
	B_CLASSNAME,
	CUSTOMER_ID,
	B_CONFIRMEDSDPK,
	B_XGRP
)
SELECT
	vLoad_id,
	B_SOURCEID,
	B_PUBID,
	'Customer',
	2,          -- Target golden record ID
	2,          -- Confirm the golden record ID
	NULL        -- Neither master nor golden record is in an exclusion group
FROM MD_CUSTOMER
WHERE
  CUSTOMER_ID = 1;

Similarly, to split a golden record by transferring its master records to new golden records, you do not specify an existing golden record ID. Instead, you seed a new golden record ID using the golden record ID generation method for the entity.

Example 4. Splitting all master records from golden record with CUSTOMER_ID 5 into different golden records
-- Store a new SYS_GUID value in a raw (16) variable that is used to create a new exclusion group
select SYS_GUID() into vSysguidXGrp from dual;

-- Split the master records using this new exclusion group
INSERT INTO UM_CUSTOMER (
	B_LOADID,
	B_SOURCEID,
	B_PUBID,
	B_CLASSNAME,
	CUSTOMER_ID,
	B_CONFIRMEDSDPK,
	B_XGRP
)
SELECT
	vLoad_id,
	B_SOURCEID,
	B_PUBID,
	'Customer',
	SEQ_CUSTOMER.NEXTVAL, -- New golden record ID generated by a sequence
	SEQ_CUSTOMER.CURRVAL, -- The same golden record ID is used to confirm
	vSysguidXGrp          -- New exclusion group ID
FROM MD_CUSTOMER WHERE
  CUSTOMER_ID = 5;

Reset user decisions

This method nullifies previous user decisions made with a duplicate manager and forces master records to merge according to the matcher’s decisions.

Below are the columns to load into the UM table:

  • B_LOADID: load this column with either the load ID provided by the GET_NEW_LOADID function call or the continuous load ID.

  • B_SOURCEID and B_PUBID: load these columns with the ID and publisher of the master records to reconsider for matching.

  • B_CLASSNAME: load this column with the entity name for the records.

  • This operation cannot be performed in the data hub application user interface.

  • All prior user decisions are lost. Exclusion groups and match groups are recomputed for these records.

  • All golden record IDs are regenerated in the process.

Example 5. Resetting all user decisions
INSERT INTO UM_CUSTOMER (
    B_LOADID,
    B_SOURCEID,
    B_PUBID,
    B_CLASSNAME
)
SELECT
    vLoad_id,
    B_SOURCEID,
    B_PUBID,
    'Customer'
FROM MD_CUSTOMER;

Replay matching

The previous method solely removes user decisions without triggering the match rules to run again.

To force the entire matching process to replay, follow these steps:

  1. Reset user decisions, as described in the previous method.

  2. Reload the data to re-match into the SD table.

Example 6. Replaying the matching process
-- Remove user decisions
INSERT INTO UM_CUSTOMER (
	B_LOADID,
	B_SOURCEID,
	B_PUBID,
	B_CLASSNAME
)
SELECT
	vLoad_id,
	B_SOURCEID,
	B_PUBID,
	'Customer'
FROM MI_CUSTOMER
-- Add a clause to filter the records to re-process below
-- WHERE MI_CUSTOMER.CUSTOMER_ID = ??
;

-- Reload data from the latest master record values
INSERT into SD_CUSTOMER (
    B_LOADID,
    B_PUBID,
    B_SOURCEID,
    B_CLASSNAME,
	-- Data fields are below
    CUSTOMER_NAME,
    TOTAL_REVENUE
	...
)
SELECT
	vLoad_id,
	B_PUBID,
	B_SOURCEID
	-- Data fields are below
    CUSTOMER_NAME,
    TOTAL_REVENUE
	...
FROM MI_CUSTOMER
-- Add a clause to filter the records to re-process below
-- WHERE MI_CUSTOMER.CUSTOMER_ID = ??
;

COMMIT;