Semarchy xDM Integration Guide

Welcome to Semarchy xDM.
This guide contains information about publishing and consuming data in an MDM hub generated by Semarchy xDM.

Preface

Overview

Using this guide, you will:

Understand the Integration Component and Jobs of Semarchy xDM.
Learn how to publish data into an MDM hub through an external load.
Learn how to consume data from an MDM hub.
Learn how to interact with applications and workflows programmatically.

Audience

This document is intended for integration architects and developers setting up an MDM hub as part of their enterprise integration architecture.

To discover Semarchy xDM, you can watch our tutorials.

The Semarchy xDM Documentation Library, including the development, administration and installation guides is available online.

Document Conventions

This document uses the following formatting conventions:

Convention Meaning

Convention	Meaning
boldface	Boldface type indicates graphical user interface elements associated with an action, or a product specific term or concept.
italic	Italic type indicates special emphasis or placeholder variable that you need to provide.
`monospace`	Monospace type indicates code example, text or commands that you enter.

boldface

Boldface type indicates graphical user interface elements associated with an action, or a product specific term or concept.

italic

Italic type indicates special emphasis or placeholder variable that you need to provide.

monospace

Monospace type indicates code example, text or commands that you enter.

Other Semarchy Resources

In addition to the product manuals, Semarchy provides other resources available on its web site: https://www.semarchy.com.

Obtaining Help

There are many ways to access the Semarchy Technical Support. You can call or email our global Technical Support Center (support@semarchy.com). For more information, see https://www.semarchy.com.

Feedback

We welcome your comments and suggestions on the quality and usefulness of this documentation.
If you find any error or have any suggestion for improvement, please mail support@semarchy.com and indicate the title of the documentation along with the chapter, section, and page number, if available. Please let us know if you want a reply.

Introduction to Semarchy xDM

Semarchy xDM is the Intelligent Data Hub platform for Master Data Management (MDM), Reference Data Management (RDM), Application Data Management (ADM), Data Quality, and Data Governance.
It provides all the features for data quality, data validation, data matching, de-duplication, data authoring, workflows, and more.

Semarchy xDM brings extreme agility for defining and implementing data management applications and releasing them to production. The platform can be used as the target deployment point for all the data in the enterprise or in conjunction with existing data hubs to contribute to data transparency and quality.
Its powerful and intuitive environment covers all use cases for setting up a successful data governance strategy.

Introduction to Integration

Integration Capabilities

Semarchy xDM certifies golden data from source applications' data and allows applications to consume this golden data.

Integration with Semarchy xDM is performed in several ways:

Publishing source data in an MDM hub deployed by Semarchy xDM, and have the hub certify golden records from this data.
Consuming golden or master data from the MDM hub.
Interacting with Workflows Instances to manage data.

Integration Methods

Semarchy xDM provides several integration services, including:

REST API, accessible from the Semarchy xDM application.
SQL Interfaces, that is a set of functions stored in the repository schema, plus a set of tables stored in the Data Location Schema.

Using these components, applications and middleware products can consume and publish data in the hub.

Integration Concepts

Publishers and Consumers

In the context of integration, we refer to applications publishing source data into the hub as Publishers.
We also refer to application consuming golden data from the hub as Consumers.

These roles are not exclusive. An application can publish data into the MDM hub and consume certified golden data from the hub to update its records.

For example, a business intelligence application is typically a consumer-only as it consumes only golden records for reporting purposes. An operational application may publish its contact information to the MDM hub for certification and update its contact information with the certified golden records.

Publishers

This term refers to the original application from which the data originates, and not necessarily to the middleware tool or user-designed program actually doing the data movement. The publisher is typically an operational application such as a Customer Relationship Management (CRM), a Product Lifecycle Management (PLM) or an Enterprise Resource Planner (ERP).

A middleware tool may be an Extract-Transform-Load (ETL), Enterprise Service Bus (ESB) or any other data integration product. It may also be a user-designed program or script implemented in SQL, Java, etc. The middleware tool or user-designed program communicates with the publisher to extract data and communicates with the MDM Hub using the publishing methods described in this guide to load this data.

Consumers

Similarly to the publisher, this term refers to the applications consuming data from the hub. This consumption usually takes place via a middleware tool or a user-designed program, and uses the consumption methods described in this guide.

For the rest of this guide, we will refer to the middleware tool or user-defined product used for publishing or consuming data as the middleware.

Data Consumption

Data consumption is available via the REST API or SQL through the tables stored in the Data Location Schema.

Various views on the data can be used for consumption, including:

Golden Data enriched, standardized, validated, de-duplicated and certified in the hub.
Master Data pushed in the hub by source systems.
Errors raised by records pushed in the hub when violating the data quality rules defined in the model.

Data Consumption is always done on a specific Data Location.

Data Publishing

Publishing Concepts

Publishing source data for certification into golden data is performed as a transaction. Such publisher transaction is called an External Load. It is a Semarchy xDM transaction identified by a sequential Load ID.

An External Load represents a source data load transaction.

When an External Load is submitted with an Integration Job Name, a Batch - identified by a Batch ID - is created, and the Job starts processing the data published in this load.

A Batch represents a transaction certifying loaded data and writing in the hub the resulting golden data.

Both Loads and Batches can be reviewed under the data locations in the Management perspective in the Semarchy Application Builder.

External Load Lifecycle

An external load lifecycle is described below:

Initialize the External Load
- The middleware uses the SQL Interface or the REST API to initialize an external load.
- It receives from the platform a Load ID identifying the external load.
- At that stage, an external load transaction is open with the platform.
Load Data
- The middleware inserts data into the landing tables in the data location schema. This done using the SQL Interface or the REST API.
- When loading data, the middleware provides both the Load ID and a Publisher Code corresponding to the publisher application.
Submit the External Load
- The middleware uses the SQL Interface or the REST API to submit the external load.
- It provides the Load ID as well as the name of the Integration Job to trigger with this submission.
- The platform creates a Batch to process the data published in this external load.
- It receives from the platform a Batch ID identifying the batch that is processed by the platform for this external load.
- At that stage, the external load transaction is closed.

The middleware can also Cancel the External Load to abort the external load instead of submitting it.

Data locations may be moved to a Maintenance status by their administrator. When a data location is in that state, it is not possible to initialize external loads.

Continuous Loads

Continuous loads enable integration developers to push data into the MDM hub in a continuous way without having to take care of Load Initialization or Load Submission.

With continuous loads:

Integration developers do not need to initialize and submit individual external loads. They directly load data into the hub using the Load ID or Name of the continuous load.
At regular intervals, Semarchy xDM automatically creates then submits an external load with the data loaded in the continuous load. This external load is submitted with a program name, a job, and a submitter name.
The continuous load remains, with the same Load ID and Name. Subsequent data loads made with this continuous load are processed at the next interval.

Configuring Continuous Loads

Continuous loads are configured and managed by the administrator in a data location. Unlike external loads, they cannot be created, submitted or canceled via integration points.

Using Continuous Loads

The SQL Interface or the REST API can use continuous loads via their Load ID or Name to push data into the hub without having to perform the load ID management operation.

The REST API calls support both the Load ID or the Name of the continuous load in their URL.
The SQL interface exposes tables with a B_LOADID column to load the Load ID of the continuous load.
It provides a utility function called GET_CONTINUOUS_LOAD, stored in the repository (and in the INTEGRATION_LOAD package for Oracle), to returns the Load ID for a given continuous load name.

Retrieving a Load ID from a Continuous Load Name (Oracle)

vLoad_id := <repository_schema>.INTEGRATION_LOAD.GET_CONTINUOUS_LOADID(
	'<continuous_load_name>'           /* Name of the continuous load. */
	);

Retrieving a Load ID from a Continuous Load Name (PostgreSQL)

vLoad_id := <repository_schema>.get_continuous_loadid(
	'<continuous_load_name>'           /* Name of the continuous load. */
	);

Retrieving a Load ID from a Continuous Load Name (SQL Server)

EXEC <repository_schema>.dbo.GET_CONTINUOUS_LOADID
    @LOAD_ID OUTPUT,
    @P_CONTINUOUS_LOAD_NAME  = '<continuous_load_name>' /* Name of the continuous load. */

Using the Name instead of the Load ID in your integration processes and flows gives the flexibility to use the same integration process or flow definition regardless of the data location. When deploying a model to another data location (for example, to move from development to production), you just need to create another continuous load with the same Name.

When using a continuous load, make sure that it is configured with a job that processes the entities that are loaded.

Batch Lifecycle

When an external load is submitted, the following operations take place:

The platform creates a batch and returns to the submitter the Batch ID
The integration batch poller picks up the batch on its schedule:
1. It creates a Job instance using the Job Definition which name is provided in the submit action.
2. It moves the job into the Queue specified in the job definition
The Execution engine processes this job in the queue.
When the job completes, the batch is considered finished.

Even when multiple loads take place simultaneously, the sequence into which the external loads are submitted defines the order into which the data is processed by the integration jobs and golden data certified from this source data.

Interfaces for Integration

This section describes the interfaces available for integration.

REST API

This interface is available for application to consume or publish data from the hub in a programmatic way.

SQL Interface

This interface is available for data integration and ETL products to publish or consume data. It is composed of the Integration Load Functions and the Data Location Database Schema.

The Integration Load Functions are stored in the repository schema, contains function to manage the External Load Lifecycle, for the purpose of publishing data in batch mode.

The Data Location Database Schema stores a set of tables that contain the hub data.
This schema contains the landing (or staging) tables used to publish data into the hub. It contains also the golden records tables and the intermediate tables handled by the integration job that create golden records from the source records. There is a single data structure for the entire hub, and a single set of tables for each entity regardless of the model edition.

The data location schema is accessed for integration purposes to:

Publish data in batch mode in the landing tables.
Consume data from the golden data and master data tables.

The structure of the tables stored in the data location schema is detailed in the Table Structures section.

Data Certification

This section explains the artifacts involved when certify data published into the hub.

Integration Job

The integration job processes the data submitted in an external load and runs this data through the Certification Process, which is a series of steps to create and certify golden data out of this source data.

This job is generated from the certification rules defined at design time, and it uses the data structures automatically created in the MDM hub when deploying the model edition.

Although understanding the details of the process is not needed for publishing source data or consuming golden data, it is necessary to have complete understanding of this process to drill down into the various structures between the source and the golden data. For example, to review the rejects or the duplicates detected by the integration job for a given golden record.

An integration job is a sequence of tasks used to certify golden data for a group of entities. The model edition deployed in the data location brings several integration jobs definitions with it. Each of these job definitions is designed to certify data for a group of entities.

Integration jobs definitions as well as integration job logs are stored in the repository

For example, a multi-domain hub contains entities for the PARTY domain and for the PRODUCTS domain, and has two integration jobs definition:

INTEGRATE_CUSTOMERS certifies data for the Party, Location, etc… entities.
INTEGRATE_PRODUCTS certifies data for the Brand, Product, Part, etc… entities.

Integration jobs are started when source data has been loaded in the landing tables and is submitted for golden data certification.

Each integration job is the implementation of the overall certification process template. It may contain all or some of the steps of this process. The following section details the structure of the certification process.

Certification Process

Introduction to the Certification Process

The Certification Process creates consolidated and certified Golden Records from various sources:

Source Records, pushed into the hub by middleware systems on behalf of upstream applications (known as the Publishers).
Depending on the type of the entity, these records are either converted to golden records directly (basic entities) or matched and consolidated into golden records (ID and fuzzy matched entities). When matched and consolidated, these records are referred to as Master Records. The golden records they have contributed to create are referred to as Master Based golden records.
Source Authoring Record, authored by users in the MDM applications.
When a user authors data in an MDM application, depending on the entity type and the application design, he performs one of the following operations:
- He creates new golden records or updates existing golden records that exist only for the hub, and do not exist in any of the publishers. These records are referred to as Data Entry Based golden records This pattern is allowed for all entities, but basic entities support only this pattern.
- He creates or updates master records on behalf of publishers, submitting these records to matching and consolidation. This pattern is allowed only for ID and fuzzy matched entities.
- He overrides golden values resulting from the consolidation of records pushed by publishers. This pattern is allowed only for ID and fuzzy matched entities.
Delete Operations, made by users on golden and master records from entities with delete enabled.
Matching Decisions, taken by data stewards for fuzzy matched entities, using duplicates managers. Such decisions include confirming, merging or splitting groups of matching records as well as accepting/rejecting suggestions.

The certification process takes these various sources, applies the rules and constraints defined in the model in order to create, update or delete the golden data that business users browse using the MDM applications and that downstream application consume from the hub.

This process is automated and involves several phases, automatically generated from the rules and constraints, which are defined in the model based on the functional knowledge of the entities and the publishers involved.

The following sections describe the details of the certification process for ID, fuzzy matched and basic entities, and the delete process for all entities.

Certification Process for ID and Fuzzy Matched Entities

The following figure describes the certification process and the various Table Structures involved in this process.

Certification Process for Matched Entities

The certification process involves the following steps:

Enrich and Standardize Source Data: Source Authoring Records (in the SA tables) created or updated on behalf of publishers and Source Records (in the SD tables) are enriched and standardized using the SemQL and API (Java Plug-in or REST Client) Enrichers, executed Pre-Consolidation.
Validate Source Data: The enriched and standardized records are checked against the various Constraints executed Pre-Consolidation. Erroneous records are ignored for the rest of the processing and the errors are logged into the SE - source errors - and AE - authoring errors - tables.
Note that source authoring records are enriched and validated only for basic entities. For ID and fuzzy matched, source authoring records are not enriched and validated.
Match and Find Duplicates: For fuzzy matched entities, this step matches pairs of records using a Matcher and creates groups of matching records (match groups). For ID matched entities, matching is simply made on the ID value.
The matcher works as follows:
- It runs a set of Match Rules. Each rule has two phases: first, a binning phase creates small bins of records. Then a matching phase compares each pair of records within these small bins to detects duplicates.
- Each match rule has a Match Score that expresses how strongly the pair of records matches. A pair of records that match according to one or more rules is given the highest Match Score of all these rules. Match pairs with scores and rules are stored in the DU table.
- When a match group is created, an overall Confidence Score is computed for that group. According to this score, the group is marked as a suggestion or immediately merged, and possibly confirmed. These automated actions are configured in the Merge Policy and Auto-Confirm Policy of the matcher.
- Matching Decisions taken by users on match groups (stored in the UM table) are applied at that point, superseding the matcher’s choices.
Consolidate Data: This step consolidates match group duplicates into single consolidated records. The Consolidation Rules created in the Survivorship Rules defines how the attributes consolidate. Integration master records and integration golden (consolidated) records are stored at that stage in the GI and MI tables
Enrich Consolidated Data: The SemQL and API (Java Plug-in or REST Client) Enrichers executed Post-Consolidation run to standardize or add data to the consolidated records.
Publish Certified Golden Data: This step finally publishes the Golden Records for consumption. The final master and golden records are stored in the GD and MD tables
- This step applies possible overrides from Source Authoring Record (Involving the SA, SF, GA and GF tables) , according to the Override Rules defined in the Survivorship Rules.
- This step also creates or updates Data Entry Based golden records (that exist only in the MDM), from Source Authoring Records.
Validate Golden Data: The quality of the golden records is checked against the various Constraints executed on golden records (Post-Consolidation). Note that unlike the pre-consolidation validation, it does not remove erroneous golden records from the flow but flags them as erroneous. The errors are also logged (in the GE tables) .
Historize Data: Golden and master data changes are added to their history (stored in the GH and MH tables) if historization is enabled.

Source Authoring Records are not enriched or validated for ID and fuzzy matched entities as part of the certification process. These records should be enriched and validated as part of the steppers into which users author the data.

Certification Process for Basic Entities

The following figure describes the certification process and the various Table Structures involved in this process.

Certification Process for Matched Entities

The certification process involves the following steps:

Enrich and Standardize Source Data: During this step, the Source Records and Source Authoring Records (both stored in the SA tables) are enriched and standardized using SemQL and API (Java Plug-in and REST Client) Enrichers executed Pre-Consolidation.
Validate Source Data: The quality of the enriched source data is checked against the various Constraints executed Pre-Consolidation. Erroneous records are ignored for the rest of the processing and the errors are logged (in the AE tables) .
Publish Certified Golden Data: This step finally publishes the Golden Records for consumption (in the GD tables) .
Historize Data: Golden data changes are added to their history (stored in the GH table) if historization is enabled.

Note that:

Basic entities do not separate Source Records from Source Authoring Records. Both follow the same process.
Source data for basic entities does not pass through enrichers or validations executed post-consolidation.

Deletion Process

A Delete Operation (for basic, ID or fuzzy matched entities) on a golden record involves the following steps:

Propagate through Cascade: Extends the deletion to the child records directly or indirectly related to the deleted ones with a Cascade configuration for Delete Propagation.
Propagate through Nullify: Nullifies child records related to the deleted ones with a Nullify configuration for the Delete Propagation.
Compute Restrictions: Removes from deletion the records having related child records and a Restrict configuration for the Delete Propagation. If restrictions are found, the entire delete is canceled as a whole.
Propagate Delete to owned Master Records to propagate deletion to the master records attached to deleted golden records. This step only applies to ID and fuzzy matched entities.
Publish Deletion: Tracks record deletion (in the GX and MX tables), with the record values for soft deletes only, and then removes the records from the golden and master data (in the GD and MD tables). When doing a hard delete, this step deletes any trace of the records in every table (SA, SD, UM, MH, GH, etc.). The only trace of a hard delete is the ID (without data) of the deleted master and golden records, in the GX and MX tables. Deletes are tracked in the history for golden and master records (in the MH and GH tables), if historization is configured.

Is is not necessary to configure in the job all the entities deletion should cascade to. The job generation automatically detects the entities that must be included for deletion based on the entities managed by the job.

For more information about these various phases and how the certification process is designed, refer to the Certification Process Design chapter in the Semarchy xDM Developer’s Guide.

Table Structures

This section describes the general structure of the tables involved in the certification process.

Tables

For each entity, a pre-defined set of tables is created. These tables are named with the following convention: <Prefix>_<Physical Table Name>

The prefix is automatically generated depending on the table nature (Source Data, Source Error, etc.), and the Physical Table Name is set in the entity definition.

The following list describes the tables created for a given entity.

Table Name Applicable Entity Type Access Description

Table	Name	Applicable Entity Type	Access	Description
`SD_<Physical Table Name>`	Source Data	Fuzzy and ID Matched	Read/Write	This table contains source data loaded by publishers, using the middleware.
`SA_<Physical Table Name>`	Source Authoring	All	Read (Write for Basic Entities)	For matching entities, this table contains both the records created in the hub by the user as well as possible overrides performed on consolidated records. In that case, it is associated to a `SF_<Physical Table Name>`, storing flags that indicate which field was overridden. For basic entities, it contains the records created by the user, and can be used to load data into the hub instead of the SD table. This table also used to trace deletion operations.
`SE_<Physical Table Name>` `AE_<Physical Table Name>`	Source Errors Source Authoring Error	All	Read	This table contains the errors detected during the pre-consolidation validation phase. It is named `AE_<Physical Table Name>` for basic entities.
`DU_<Physical Table Name>`	Duplicates	Fuzzy Matched	Read	This table stores references to pairs of records detected as duplicates in the matching phase, with the match rule and match score.
`UM_<Physical Table Name>`	User Matches	Fuzzy Matched	Read/Write	This table contains the master records that have been manually re-grouped by users using duplicate managers. You can use this table to perform similar operations programmatically. See the Managing Duplicates Using SQL section for examples of duplicate management operations you can perform using this table.
`MI_<Physical Table Name>`	Master Integration	Fuzzy and ID Matched	Internal	This table contains the records that have passed the pre-consolidation validation phase and that have been grouped by the matching phase.
`MD_<Physical Table Name>`	Master Data	Fuzzy and ID Matched	Read	This table contains the master records, which are the enriched and validated source records that have participated in the creation of golden records. This table can be accessed to review groups/duplicates or to refer to the source data.
`MX_<Physical Table Name>`	Deleted Master	Fuzzy and ID Matched	Read	This table contains the deleted master records logs, plus the deleted data for soft deletes.
`MH_<Physical Table Name>`	Master History	Fuzzy and ID Matched	Read	This table contains the master records history.
`GI_<Physical Table Name>`	Golden Integration	Fuzzy and ID Matched	Internal	This table contains the records that have been consolidated but not yet passed through the post-consolidation validation phase.
`GA_<Physical Table Name>`	Golden Authoring	Fuzzy and ID Matched	Internal	This table contains data overrides performed on golden records . It is associated to a `GF_<Physical Table Name>`, storing flags that indicate which field was overridden, and a `GP_<Physical Table Name>` storing values consolidated before an override.
`GX_<Physical Table Name>`	Deleted Golden	All	Read	This table contains the deleted golden records logs, plus the deleted data for soft deletes.
`GH_<Physical Table Name>`	Golden History	All	Read	This table contains the golden records history.
`GD_<Physical Table Name>`	Golden Data	All	Read	This table contains the golden records, including those flagged as erroneous by post-consolidation validations.
`GE_<Physical Table Name>`	Golden Errors	Fuzzy and ID Matched	Read	This table contains the errors detected during the post-consolidation validation phase on golden records.

SD_<Physical Table Name>

Source Data

Fuzzy and ID Matched

Read/Write

This table contains source data loaded by publishers, using the middleware.

SA_<Physical Table Name>

Source Authoring

All

Read (Write for Basic Entities)

For matching entities, this table contains both the records created in the hub by the user as well as possible overrides performed on consolidated records. In that case, it is associated to a SF_<Physical Table Name>, storing flags that indicate which field was overridden.
For basic entities, it contains the records created by the user, and can be used to load data into the hub instead of the SD table. This table also used to trace deletion operations.

SE_<Physical Table Name>
AE_<Physical Table Name>

Source Errors
Source Authoring Error

All

Read

This table contains the errors detected during the pre-consolidation validation phase. It is named AE_<Physical Table Name> for basic entities.

DU_<Physical Table Name>

Duplicates

Fuzzy Matched

Read

This table stores references to pairs of records detected as duplicates in the matching phase, with the match rule and match score.

UM_<Physical Table Name>

User Matches

Fuzzy Matched

Read/Write

This table contains the master records that have been manually re-grouped by users using duplicate managers. You can use this table to perform similar operations programmatically. See the Managing Duplicates Using SQL section for examples of duplicate management operations you can perform using this table.

MI_<Physical Table Name>

Master Integration

Fuzzy and ID Matched

Internal

This table contains the records that have passed the pre-consolidation validation phase and that have been grouped by the matching phase.

MD_<Physical Table Name>

Master Data

Fuzzy and ID Matched

Read

This table contains the master records, which are the enriched and validated source records that have participated in the creation of golden records. This table can be accessed to review groups/duplicates or to refer to the source data.

MX_<Physical Table Name>

Deleted Master

Fuzzy and ID Matched

Read

This table contains the deleted master records logs, plus the deleted data for soft deletes.

MH_<Physical Table Name>

Master History

Fuzzy and ID Matched

Read

This table contains the master records history.

GI_<Physical Table Name>

Golden Integration

Fuzzy and ID Matched

Internal

This table contains the records that have been consolidated but not yet passed through the post-consolidation validation phase.

GA_<Physical Table Name>

Golden Authoring

Fuzzy and ID Matched

Internal

This table contains data overrides performed on golden records . It is associated to a GF_<Physical Table Name>, storing flags that indicate which field was overridden, and a GP_<Physical Table Name> storing values consolidated before an override.

GX_<Physical Table Name>

Deleted Golden

All

Read

This table contains the deleted golden records logs, plus the deleted data for soft deletes.

GH_<Physical Table Name>

Golden History

All

Read

This table contains the golden records history.

GD_<Physical Table Name>

Golden Data

All

Read

This table contains the golden records, including those flagged as erroneous by post-consolidation validations.

GE_<Physical Table Name>

Golden Errors

Fuzzy and ID Matched

Read

This table contains the errors detected during the post-consolidation validation phase on golden records.

For publishing and consuming data, internal tables are not needed. The structure of these tables is not explained in this document.

Columns

Attribute Columns

Attributes appear in the tables' structure as follows:

Attributes using list of values, built-in and user-defined types are mapped to single columns with a database type and length corresponding to the attribute type. The column name for such a simple attribute is the Physical Column Name value specified in the simple attribute definition.
Complex attributes are mapped on columns named with the following convention: <Complex Attribute Physical Prefix><Definition Attribute Physical Column Name>

For example: The Country (COUNTRY) and City (CITY) definition attributes of the Geocoded Address complex attribute (Prefix: GEO) will be mapped to columns named GEOCOUNTRY and GEOCITY.

Built-in Columns

In addition to the attribute columns, built-in columns are added to the tables' structure, and are used in the certification process.

They track for example:

Batch information: B_BATCHID
Cause of a rejected record: B_CONSTRAINTNAME, B_CONSTRAINTTYPE
Class information: B_CLASSNAME
Match/Merge information: B_MATCHGRP, B_CONFSCORE, B_HASSUGGMERGE, B_SUGGMERGEID, B_SUGGMERGECONFSCORE, B_SUGGMERGEMASTERSCOUNT, B_CONFIRMATIONSTATUS, B_MASTERSCOUNT, B_ISCONFIRMED
Delete operation information: B_DELETETYPE, B_DELETEDATE, B_DELETEAUTHOR, B_DELETEOPERATION,

The following list describes these columns.

Column Name Exists in Tables Datatype Description

Column Name	Exists in Tables	Datatype	Description
`B_AUTHORINGTYPE`	GA, SA	`VARCHAR2(30 CHAR)`	For matched entities, type of authoring operation: `OVERRIDE` for override `DATA_ENTRY` for data creation
`B_BATCHID`	AE, GA, GD, GE, GF, GI, GP, GX, MD, MI, MX, SE	`NUMBER(38, 0)`	ID of the batch into which the new data, data changes, overrides or duplicate decisions were applied, or during which errors were detected.
`B_CHECKOUTINCLUSION`	UM	`VARCHAR2(30 CHAR)`	Cause that made the record part of the duplicate management transaction. Possible causes are: User (`USER`): the record was checked out by the user. Same Golden (`SAME_GOLDEN`): the record is part of the same match group than another master that was checked out. Same Suggestion (`SAME_SUGG`): the record is part of the same suggestion than another master that was checked out. Same Exclusion Group (`SAME_XGRP`): the record is part of the same exclusion group than another master that was checked out. Other (`OTHER`): Another reason.
`B_CLASSNAME`	AE, GA, GD, GE, GF, GH, GI, GP, GX, MD, MH, MI, MX, SA, SD, SE, SF, UM	`VARCHAR2(128 CHAR)`	Unique name of class / entity to which this record belongs. See Class Name for more information.
`B_CONFIRMATIONSTATUS`	GD, GH, GI, GX, MD, MH, MI, MX, UM	`VARCHAR2(30 CHAR)`	Confirmation status for duplicate management: Confirmed (`CONFIRMED`): Indicates that a master is confirmed in a golden or that a golden has all its masters confirmed. Not-confirmed (`NOT_CONFIRMED`): Indicates that a master is not confirmed or that a golden is entirely made of unconfirmed masters. Historically confirmed (`WAS_CONFIRMED`, for master records): If a master was confirmed into to a golden but this golden was fused into another golden. Partially confirmed (`PARTIALLY_CONFIRMED` for golden records only): Indicates that a golden has part of his masters confirmed.
`B_CONFIRMEDSDPK`	MD, MI, UM	Varies	ID of the confirmed golden record this master record is attached to.
`B_CONFSCORE`	GD, GH, GI, GX, UM	`NUMBER(38, 0)`	Confidence Score of the golden record. It is the average of the match scores in the match group.
`B_CONSTRAINTNAME`	AE, GE, SE	`VARCHAR2(128 CHAR)`	For error records, name of the constraint causing this error.
`B_CONSTRAINTTYPE`	AE, GE, SE	`VARCHAR2(30 CHAR)`	For error records, type of the constraint causing this error.
`B_COPIEDFROM`	SA	`NUMBER(38, 0)`	ID of the original record copied into this record.
`B_CREATOR`	DU, GA, GD, GH, GI, GX, MD, MH, MI, MX, SA, SD, UM	`VARCHAR2(128 CHAR)`	For golden data (GD, GI, GX, GH), the submitter of the batch into which for record was created. For source and master data, the user who has created the record in a workflow, stepper, duplicate manager or a user name loaded by the data integration process. This field may be loaded in the SD table. If left empty, it is automatically set to the submitter of the batch.
`B_CREDATE`	DU, GA, GD, GH, GI, GX, MD, MH, MI, MX, SA, SD, UM	`TIMESTAMP(6)`	For golden and master data, the submit timestamp of the batch into which the record was created. For source data (SD, SA, UM), the timestamp at which the source record was created. Note that this field may be loaded in the SD table, but this value is not propagated beyond the SD table. If this column is left empty in the SD table, it is automatically set to the submit timestamp of the batch.
`B_DELETEAUTHOR`	GX, MX, SA, SD	`VARCHAR2(128 CHAR)`	Author of the delete operation.
`B_DELETEDATE`	GX, MX, SA, SD	`TIMESTAMP(6)`	Deletion timestamp of a record
`B_DELETEOPERATION`	GX, MX, SA, SD	`VARCHAR2(128 CHAR)`	Delete operation ID.
`B_DELETEPATH`	GX, MX, SA, SD	`VARCHAR2(4000 CHAR)`	Cascade path through which the record was reached during a delete. Null for record directly selected for deletion.
`B_DELETETYPE`	GX, MX, SA, SD	`VARCHAR2(30 CHAR)`	Delete Type (`SOFT_DELETE` or `HARD_DELETE` for golden and master deletion). The type of delete can be `LEGLESS_DELETE` for golden records deleted when they loose all their master records.
`B_ERROR_STATUS`	GD, GH, GX, SA, SD	`VARCHAR2(30 CHAR)`	Error Status of a record. This value indicates whether the source or golden record has passed successfully or not validations. Possible values are: `VALID` if the record has no error. `ERROR` if the record has errors `RECYCLED` if the record was recycled and considered valid `OBSOLETE_ERROR` if the record had errors but a newer version of the record fixes them. a `<NULL>` value also indicates a record with no error.
`B_FROMBATCHID`	GH, MH	`NUMBER(38, 0)`	Batch at which the history record was created.
`B_GOLDENTYPE`	GD, GH, GX	`VARCHAR2(30 CHAR)`	For fuzzy matching and ID matching entities, indicates whether the golden record was created and authored only in the MDM (`DE_BASED`) or was consolidated from publishers'' data and possibly overridden (`MASTER_BASED`)
`B_HASOVERRIDE`	GD, GH, GX	`CHAR(1 CHAR)`	For fuzzy matching and ID matching entities, this flag (`0` or `1`) indicates whether the golden record has override values.
`B_HASSUGGMERGE`	GD, GH, GI, GX, MD, MH, MI, MX, UM	`CHAR(1 CHAR)`	Flag (`0` or `1`) indicating that match and merge suggestions are available for this record.
`B_ISCONFIRMED`	GD, GH, GI, GX	`CHAR(1 CHAR)`	Flag (`0` or `1`) indicated whether this golden record has been confirmed (Fuzzy Matched entities only).
`B_LOADID`	AE, SA, SD, SE, SF, UM	`NUMBER(38, 0)`	Load Identifier used as the unique transaction ID for external application pushing data to the platform
`B_MASTERSCOUNT`	GD, GH, GI, GX, UM	`NUMBER(38, 0)`	Number of master records contributing to the golden record.
`B_MATCHGRP`	MD, MH, MI, MX	`NUMBER(38, 0)`	ID of the match group for the master record. This column is set when matching takes place.
`B_MATCHRULE`	DU	`VARCHAR2(128 CHAR)`	Match rule having cause the pair of records to match.
`B_MATCHSCORE`	DU	`NUMBER(38, 0)`	Score of the matched pair.
`B_OLDMATCHGRP`	MI	`NUMBER(38, 0)`	Previous identifier of the match group for the master record.
`B_OLDSDPK`	MD, MH, MI, MX	Varies	ID of the previous golden record the master record was attached to.
`B_ORIGINALBATCHID`	SA, SD, UM	`NUMBER(38, 0)`	Batch identifier of the record when it was originally edited out in a stepper or a duplicate manager.
`B_ORIGINALCONFIRMATIONSTATUS`	UM	`VARCHAR2(30 CHAR)`	Original Confirmation Status in a duplicate management operation.
`B_ORIGINALCONFIRMEDSDPK`	UM	Varies	ID of the original confirmed golden record to which the master record was attached to.
`B_ORIGINALCONFSCORE`	UM	`NUMBER(38, 0)`	Confidence Score of the original golden in a duplicate management operation.
`B_ORIGINALMASTERSCOUNT`	UM	`NUMBER(38, 0)`	Number of master records in the original golden in a duplicate management operation.
`B_ORIGINALSDPK`	UM	Varies	ID of the original golden record to which the master record was attached to.
`B_ORIGINALXGRP`	UM	`RAW(16)`	Original exclusion group to which the master record belonged to.
`B_PUBID`	DU, GD, GH, GI, GX, MD, MH, MI, MX, SD, SE, UM	`VARCHAR2(30 CHAR)`	For matching entities, code of the publisher that published the record.
`B_PUBID2`	DU	`VARCHAR2(30 CHAR)`	For duplicate pairs, code of the publisher that published the second record.
`B_SOURCEID`	DU, GD, GH, GI, GX, MD, MH, MI, MX, SD, SE, UM	`VARCHAR2(128 CHAR)`	ID of the source record in the source publisher system (Fuzzy Matched entities only). See below for a detailed explanation of the primary key options.
`B_SOURCEID2`	DU	`VARCHAR2(128 CHAR)`	For duplicate pairs, ID of the second source record in the source publisher system (Fuzzy Matched entities only).
`B_SUGGMERGECONFSCORE`	GD, GH, GI, GX, MD, MH, MI, MX, UM	`NUMBER(38, 0)`	Confidence Score for the suggested match group.
`B_SUGGMERGEID`	GD, GH, GI, GX, MD, MH, MI, MX, UM	`NUMBER(38, 0)`	ID of the merge suggested by the automated matching.
`B_SUGGMERGEMASTERSCOUNT`	GD, GH, GI, GX, MD, MH, MI, MX, UM	`NUMBER(38, 0)`	Number of master records in the suggested merge.
`B_TOBATCHID`	GH, MH	`NUMBER(38, 0)`	Batch at which history record stopped being current or null if the records is still current.
`B_UPDATOR`	DU, GA, GD, GH, GI, GX, MD, MH, MI, MX, SA, SD, UM	`VARCHAR2(128 CHAR)`	For golden data (GD, GI, GX, GH), the submitter of the batch into which the record was updated. For source and master data, the latest user who has updated the record in a workflow, stepper, duplicate manager or the user name loaded by the data integration process. This field may be loaded in the SD table. If left empty, it is automatically set to the submitter of the batch.
`B_UPDDATE`	DU, GA, GD, GH, GI, GX, MD, MH, MI, MX, SA, SD, UM	`TIMESTAMP(6)`	For golden and master data, the submit timestamp of the batch into which the record was updated. For source data (SD, SA, UM), the timestamp at which the source record was created or updated. Note that this field may be loaded in the SD table, but this value is not propagated beyond the SD table. If this column is left empty in the SD table, it is automatically set to the submit timestamp of the batch.
`B_XGRP`	MD, MH, MI, MX, UM	`RAW(16)`	Exclusion group ID. An exclusion group represents a group of records for which a user has taken split decisions.

B_AUTHORINGTYPE

GA, SA

VARCHAR2(30 CHAR)

For matched entities, type of authoring operation:

OVERRIDE for override
DATA_ENTRY for data creation

B_BATCHID

AE, GA, GD, GE, GF, GI, GP, GX, MD, MI, MX, SE

NUMBER(38, 0)

ID of the batch into which the new data, data changes, overrides or duplicate decisions were applied, or during which errors were detected.

B_CHECKOUTINCLUSION

VARCHAR2(30 CHAR)

Cause that made the record part of the duplicate management transaction. Possible causes are:

User (USER): the record was checked out by the user.
Same Golden (SAME_GOLDEN): the record is part of the same match group than another master that was checked out.
Same Suggestion (SAME_SUGG): the record is part of the same suggestion than another master that was checked out.
Same Exclusion Group (SAME_XGRP): the record is part of the same exclusion group than another master that was checked out.
Other (OTHER): Another reason.

B_CLASSNAME

AE, GA, GD, GE, GF, GH, GI, GP, GX, MD, MH, MI, MX, SA, SD, SE, SF, UM

VARCHAR2(128 CHAR)

Unique name of class / entity to which this record belongs. See Class Name for more information.

B_CONFIRMATIONSTATUS

GD, GH, GI, GX, MD, MH, MI, MX, UM

VARCHAR2(30 CHAR)

Confirmation status for duplicate management:

Confirmed (CONFIRMED): Indicates that a master is confirmed in a golden or that a golden has all its masters confirmed.
Not-confirmed (NOT_CONFIRMED): Indicates that a master is not confirmed or that a golden is entirely made of unconfirmed masters.
Historically confirmed (WAS_CONFIRMED, for master records): If a master was confirmed into to a golden but this golden was fused into another golden.
Partially confirmed (PARTIALLY_CONFIRMED for golden records only): Indicates that a golden has part of his masters confirmed.

B_CONFIRMEDSDPK

MD, MI, UM

Varies

ID of the confirmed golden record this master record is attached to.

B_CONFSCORE

GD, GH, GI, GX, UM

NUMBER(38, 0)

Confidence Score of the golden record. It is the average of the match scores in the match group.

B_CONSTRAINTNAME

AE, GE, SE

VARCHAR2(128 CHAR)

For error records, name of the constraint causing this error.

B_CONSTRAINTTYPE

AE, GE, SE

VARCHAR2(30 CHAR)

For error records, type of the constraint causing this error.

B_COPIEDFROM

NUMBER(38, 0)

ID of the original record copied into this record.

B_CREATOR

DU, GA, GD, GH, GI, GX, MD, MH, MI, MX, SA, SD, UM

VARCHAR2(128 CHAR)

For golden data (GD, GI, GX, GH), the submitter of the batch into which for record was created. For source and master data, the user who has created the record in a workflow, stepper, duplicate manager or a user name loaded by the data integration process.
This field may be loaded in the SD table. If left empty, it is automatically set to the submitter of the batch.

B_CREDATE

DU, GA, GD, GH, GI, GX, MD, MH, MI, MX, SA, SD, UM

TIMESTAMP(6)

For golden and master data, the submit timestamp of the batch into which the record was created. For source data (SD, SA, UM), the timestamp at which the source record was created.
Note that this field may be loaded in the SD table, but this value is not propagated beyond the SD table. If this column is left empty in the SD table, it is automatically set to the submit timestamp of the batch.

B_DELETEAUTHOR

GX, MX, SA, SD

VARCHAR2(128 CHAR)

Author of the delete operation.

B_DELETEDATE

GX, MX, SA, SD

TIMESTAMP(6)

Deletion timestamp of a record

B_DELETEOPERATION

GX, MX, SA, SD

VARCHAR2(128 CHAR)

Delete operation ID.

B_DELETEPATH

GX, MX, SA, SD

VARCHAR2(4000 CHAR)

Cascade path through which the record was reached during a delete. Null for record directly selected for deletion.

B_DELETETYPE

GX, MX, SA, SD

VARCHAR2(30 CHAR)

Delete Type (SOFT_DELETE or HARD_DELETE for golden and master deletion). The type of delete can be LEGLESS_DELETE for golden records deleted when they loose all their master records.

B_ERROR_STATUS

GD, GH, GX, SA, SD

VARCHAR2(30 CHAR)

Error Status of a record. This value indicates whether the source or golden record has passed successfully or not validations. Possible values are:

VALID if the record has no error.
ERROR if the record has errors
RECYCLED if the record was recycled and considered valid
OBSOLETE_ERROR if the record had errors but a newer version of the record fixes them.
a <NULL> value also indicates a record with no error.

B_FROMBATCHID

GH, MH

NUMBER(38, 0)

Batch at which the history record was created.

B_GOLDENTYPE

GD, GH, GX

VARCHAR2(30 CHAR)

For fuzzy matching and ID matching entities, indicates whether the golden record was created and authored only in the MDM (DE_BASED) or was consolidated from publishers'' data and possibly overridden (MASTER_BASED)

B_HASOVERRIDE

GD, GH, GX

CHAR(1 CHAR)

For fuzzy matching and ID matching entities, this flag (0 or 1) indicates whether the golden record has override values.

B_HASSUGGMERGE

GD, GH, GI, GX, MD, MH, MI, MX, UM

CHAR(1 CHAR)

Flag (0 or 1) indicating that match and merge suggestions are available for this record.

B_ISCONFIRMED

GD, GH, GI, GX

CHAR(1 CHAR)

Flag (0 or 1) indicated whether this golden record has been confirmed (Fuzzy Matched entities only).

B_LOADID

AE, SA, SD, SE, SF, UM

NUMBER(38, 0)

Load Identifier used as the unique transaction ID for external application pushing data to the platform

B_MASTERSCOUNT

GD, GH, GI, GX, UM

NUMBER(38, 0)

Number of master records contributing to the golden record.

B_MATCHGRP

MD, MH, MI, MX

NUMBER(38, 0)

ID of the match group for the master record. This column is set when matching takes place.

B_MATCHRULE

VARCHAR2(128 CHAR)

Match rule having cause the pair of records to match.

B_MATCHSCORE

NUMBER(38, 0)

Score of the matched pair.

B_OLDMATCHGRP

NUMBER(38, 0)

Previous identifier of the match group for the master record.

B_OLDSDPK

MD, MH, MI, MX

Varies

ID of the previous golden record the master record was attached to.

B_ORIGINALBATCHID

SA, SD, UM

NUMBER(38, 0)

Batch identifier of the record when it was originally edited out in a stepper or a duplicate manager.

B_ORIGINALCONFIRMATIONSTATUS

VARCHAR2(30 CHAR)

Original Confirmation Status in a duplicate management operation.

B_ORIGINALCONFIRMEDSDPK

Varies

ID of the original confirmed golden record to which the master record was attached to.

B_ORIGINALCONFSCORE

NUMBER(38, 0)

Confidence Score of the original golden in a duplicate management operation.

B_ORIGINALMASTERSCOUNT

NUMBER(38, 0)

Number of master records in the original golden in a duplicate management operation.

B_ORIGINALSDPK

Varies

ID of the original golden record to which the master record was attached to.

B_ORIGINALXGRP

RAW(16)

Original exclusion group to which the master record belonged to.

B_PUBID

DU, GD, GH, GI, GX, MD, MH, MI, MX, SD, SE, UM

VARCHAR2(30 CHAR)

For matching entities, code of the publisher that published the record.

B_PUBID2

VARCHAR2(30 CHAR)

For duplicate pairs, code of the publisher that published the second record.

B_SOURCEID

DU, GD, GH, GI, GX, MD, MH, MI, MX, SD, SE, UM

VARCHAR2(128 CHAR)

ID of the source record in the source publisher system (Fuzzy Matched entities only). See below for a detailed explanation of the primary key options.

B_SOURCEID2

VARCHAR2(128 CHAR)

For duplicate pairs, ID of the second source record in the source publisher system (Fuzzy Matched entities only).

B_SUGGMERGECONFSCORE

GD, GH, GI, GX, MD, MH, MI, MX, UM

NUMBER(38, 0)

Confidence Score for the suggested match group.

B_SUGGMERGEID

GD, GH, GI, GX, MD, MH, MI, MX, UM

NUMBER(38, 0)

ID of the merge suggested by the automated matching.

B_SUGGMERGEMASTERSCOUNT

GD, GH, GI, GX, MD, MH, MI, MX, UM

NUMBER(38, 0)

Number of master records in the suggested merge.

B_TOBATCHID

GH, MH

NUMBER(38, 0)

Batch at which history record stopped being current or null if the records is still current.

B_UPDATOR

DU, GA, GD, GH, GI, GX, MD, MH, MI, MX, SA, SD, UM

VARCHAR2(128 CHAR)

For golden data (GD, GI, GX, GH), the submitter of the batch into which the record was updated. For source and master data, the latest user who has updated the record in a workflow, stepper, duplicate manager or the user name loaded by the data integration process.
This field may be loaded in the SD table. If left empty, it is automatically set to the submitter of the batch.

B_UPDDATE

DU, GA, GD, GH, GI, GX, MD, MH, MI, MX, SA, SD, UM

TIMESTAMP(6)

For golden and master data, the submit timestamp of the batch into which the record was updated. For source data (SD, SA, UM), the timestamp at which the source record was created or updated.
Note that this field may be loaded in the SD table, but this value is not propagated beyond the SD table. If this column is left empty in the SD table, it is automatically set to the submit timestamp of the batch.

B_XGRP

MD, MH, MI, MX, UM

RAW(16)

Exclusion group ID. An exclusion group represents a group of records for which a user has taken split decisions.

Primary Keys Columns

The primary key to load depends on the Entity Type:

Basic

For basic entities, Semarchy xDM uses a single identifier, stored in a column named after the Physical Column Name specified in the primary key attribute definition. This column will exist in all tables. This identifier is simply propagated into the hub from the source records.

ID Matched

When using ID Matching, Semarchy xDM assumes a common identifier across all systems. In this case, this common identifier is stored in a column named after the Physical Column Name specified in the primary key attribute definition.

This column will exist in all tables. When publishing data into the hub, the middleware loads this column with the primary key from the publishing system. This identifier is simply propagated into the hub, and matching is done using this primary key.

Fuzzy Matched

When using Fuzzy Matching, Semarchy xDM assumes no common identifier across publishers. There may be two different records with the same ID in different systems. There is a need to match the records and consolidate them under a golden record having a primary key generated by the system.

In this case, the source identifier is stored in the column named B_SOURCEID, along with the publisher code stored in the B_PUBID. This is the case for the SD, SE and MD tables.

When publishing data into the hub, the middleware loads this B_SOURCEID column with a primary key value from the publishing system. If this primary key is a composite key in the source system, all the columns of this composite key must be concatenated into B_SOURCEID.

When the records are consolidated in a golden record (GD and GE tables), a System Defined Primary Key is generated and stored in a column named after the Physical Column Name specified in the primary key attribute definition. This key is referred to as the Golden Record ID.

The MD table makes the bridge between the Source ID and the Golden ID as it contains both these values.

Reference Columns

When a reference exists in the source publisher and need to be expressed in the landing table, this reference mapping in the table structure depends on the Entity Type of the referenced entity.

Reference to a Basic Entity

For a reference to a Basic entity, the referenced key stored is simply the primary key of the basic entity. As a consequence, the referenced value is stored in a single column. This column is named after the Physical Name provided in the reference definition and is prefixed with F_. For example, F_COUNTRIES.

Reference to an ID Matched Entity

For a reference to an ID Matched entity, the referenced key is the same for all systems. As a consequence, the referenced value is stored in a single column. This column is named after the Physical Name provided in the reference definition and is prefixed with F_. For example, F_EMPLOYEE.

For example: if Customer references Employee and this entity uses ID Matching, SD_CUSTOMER will contain the following information.

CUSTOMER_NAME	F_EMPLOYEE
Gadgetron	11
Roxxon	56

CUSTOMER_NAME

F_EMPLOYEE

Gadgetron

Roxxon

This information means that the customer Gadgetron references the employee number 11, and Roxxon references employee number 56. This employee number is the same in all systems.

Reference to a Fuzzy Matched Entity

For a reference to a Fuzzy Matched entity, the referenced may point to a master record from a publisher, or directly to a golden record in the hub:

For a reference to a master record:
- The referenced value is stored in a column named after the Physical Name provided in the definition of the reference and prefixed with FS_ (for Foreign Source ID). For example, FS_CUSTOMER.
- The referenced publisher is also stored, in a column named after the Physical Name provided in the definition of the reference and prefixed with FP_ (for Foreign Publisher). For example, FP_CUSTOMER. For a reference to a golden record:
The reference value is stored in a column named after the Physical Name provided in the definition of the reference and prefixed with F_ (for Foreign ID).

Example 1. Reference to a master record

Contact references Customer and this entity uses Fuzzy Matching, SD_CONTACT contains the following information.

FIRST_NAME	LAST_NAME	FP_CUSTOMER	FS_CUSTOMER
John	Doe	CRM	1235
Jane	Smith	MKT	A3251

FIRST_NAME

LAST_NAME

FP_CUSTOMER

FS_CUSTOMER

John

Doe

CRM

1235

Jane

Smith

MKT

A3251

This information means that the contact John Doe references the customer with the primary key 1235 in the CRM publisher, and that Jane Smith references the customer with the primary key A3251 in the MKT publisher.

Example 2. Reference to a golden record

Contact references Customer and this entity uses Fuzzy Matching, SD_CONTACT contains the following information.

FIRST_NAME	LAST_NAME	F_CUSTOMER
John	Doe	6598
Jane	Smith	6556

FIRST_NAME

LAST_NAME

F_CUSTOMER

John

Doe

6598

Jane

Smith

6556

This information means that the contact John Doe references the golden customer record with the golden ID 6598, and that Jane Smith references the golden customer with the golden ID 6556.

If both the reference to the master and to the golden are loaded, only the reference to the golden is taken into account.

To clear a reference, you must push a null value in the three FS_, FP_ and F_ columns.

Class Name

Several entities involved in an inheritance relation have their data stored in the same set of tables. These tables store the superset of the attributes of the parent and all its child entities. The B_CLASSNAME column is used to identify the class (entity) of a record in a table.

For example, when Person and Company inherit from the Party entity, the resulting table is named after the parent entity (Party), and will contain all the attributes of Person and Company as well. In the GD_PARTY table for example, records representing persons will have B_CLASSNAME='Person'. When publishing person or company information in the SD_PARTY table, the middleware must set B_CLASSNAME='Person' or B_CLASSNAME='Company' accordingly.

Constraints

The only constraints physically enforced in the tables are the primary keys and the not null columns. These constraints apply to system columns only.

For example, on an SD_ table, the following constraints are enforced:

primary key on B_LOADID, B_PUBID, B_SOURCEID (or the primary key column for ID Matched entities)
B_LOADID, B_CLASSNAME, B_PUBID and B_SOURCEID (or the primary key column for ID Matched entities) are not null

Other constraints defined in the model (mandatory attributes, references, etc.) are not enforced in the physical model but checked during the validation phase.

Publishing Data Using SQL

Publishing data is done using function calls to initialize/submit the external load and DML statements to load the data into the landing (SD for matching entities, SA for basic entities) tables.

Overview

In this approach, external loads are handled using the and SQL interface. It works as follows:

The external load is initialized using a function call to the GET_NEW_LOADID function on the repository schema. This function call returns a Load ID.
SD (SA for Basic Entities) tables are loaded using SQL inserts issued on the data location schema.
The external load is submitted or cancelled using function calls on the repository schema:
- SUBMIT_LOAD to submit a load identified by its Load ID. This function call returns a Batch ID.
- CANCEL_LOAD to cancel a load identified by its Load ID.

This approach is recommended when the middleware is an ETL or data integration product.

Alternately, you can use Continuous Loads to load data into the SD tables with the Load ID of the Continuous Load, without having to explicitly initialize and submit individual external loads.

Initializing a Load

Initializing an external load uses the following parameters:

data_location_name: name of the data location. This data location is the one hosting the MDM hub into which the external load is performed.
program_name: This variable is for information only. It is used to identify the middleware performing the external load. For example 'ETL Custom Script'.
load_description: This variable is for information only. It is used to describe the nature of this load. For example: 'Daily updates for Customers and Contacts'.
user_name: name of the user initializing the external load. This user may or may not be a user defined in the security realm of the application server.

This function call to GET_NEW_LOADID initializes an external load and returns a Load ID. It is performed on the repository database schema.

In Oracle, the integration load functions are part of a INTEGRATION_LOAD package. You must prefix the function name with the name of this package.

In SQL Server, the integration load functions are procedures in the repository database that return values such as the Load ID.

Initializing and External Load (Oracle)

vLoad_id := <repository_schema>.INTEGRATION_LOAD.GET_NEW_LOADID(
	'<data_location_name>'  /* Data Location Name (as in the UI) */
	'<program_name>'        /* Informational. Identifies the Middleware*/
	'<load_description>'    /* Informational. Describes the load. */
	'<user_name>'           /* User initializing the load. */
	  );

Initializing and External Load (PostgreSQL)

vLoad_id := <repository_schema>.get_new_loadid(
	'<data_location_name>'  /* Data Location Name (as in the UI) */
	'<program_name>'        /* Informational. Identifies the Middleware*/
	'<load_description>'    /* Informational. Describes the load. */
	'<user_name>'           /* User initializing the load. */
	  );

Initializing and External Load (SQL Server)

EXEC <repository_database>.dbo.GET_NEW_LOADID
    @LOAD_ID OUTPUT,
    @P_DATA_LOCATION_NAME = '<data_location_name>', /* Data Location Name (as in the UI) */
    @P_PROGRAM_NAME = '<program_name>',		/*  Informational. Identifies the Middleware */
    @P_LOAD_DESCRIPTION = N'<load_description>',/* Informational. Describes the load. */
    						/* Note the N to allow accented characters */
    @P_USERNAME = '<user_name>' 		/* User initializing the load. */

The following example performs a function call initializing an external load on the CustomerMDMHub data location. The repository schema is REPO. The returned Load ID is stored in the vLoad_id variable.

Example: Initializing an external load (Oracle).

vLoad_id := REPO.INTEGRATION_LOAD.GET_NEW_LOADID(
	'CustomerMDMHub',
	'Custom ETL',
	'Initial Load for the Hub',
	'John Doe' );

Loading Data

Loading data consists in inserting new source records in the SD (and SA for Basic Entities) tables for the set of entities taken into account in the integration job.

When loading data in the SD or SA tables:

You use SQL insert commands or your regular ETL/Data Integration Platform.
The insert commands are issued on the SD or SA tables stored in the data location schema.

Tables to Load

Make sure to load all the SD and SA tables for the entities that will be taken into account in the integration job. For example, if the integration job processes the Customer and Contact entities, then you should load the SD_CUSTOMER and SD_CONTACT tables.
If you insert data into the SD table of an entity that is not taken into account by the integration job, this data will be ignored. For example, if the integration job processes the Customer and Contact entities, data loaded the SD_EMPLOYEE table will not be taken into account by the integration job to certify golden records for the Employee entity.

Referential Integrity and Load Order

There is no required order to load the SD and SA tables, as no foreign keys are implemented on these tables.
Reference validation is performed by the integration job, as a consequence references between entities must be loaded as indicated in the Columns to Load section.

Using Transactions

It is recommended to use a database transaction when writing to the SD and SA tables.

For external loads: Canceling an external load cancels the load but does not delete records from the tables. Writing in an auto-commit transaction in the tables then canceling the load leaves useless information in the tables.
Using a transaction gives you the capability to rollback all changes on the tables when canceling a load.
For continuous loads, it is strongly recommended to use a transaction and commit only when all tables are loaded: as the continuous load consumes data committed in the table on its own schedule, loading tables without a transaction may cause the continuous load to start processing a data batch before it is completed.

Columns to Load

This section provides some guidance for loading the columns of the SD and SA tables.

System Columns (Mandatory)

The following system columns must be loaded as indicated:

B_LOADID: Load this column with the Load ID provided by GET_NEW_LOADID function call, of with the ID of the Continuous Load. Note that you can use the GET_CONTINUOUS_LOADID function to retrieve a continuous load ID from the continuous load name.
B_CLASSNAME: Name of the entity (or class) being loaded. When inheritance is used, the same table stores data for all parent and child classes. Set explicitly B_CLASSNAME to the name of the entity for which data is being published. For example: Person, Party, Company. See the Class Name section for more details.
B_PUBID: Load this column with a Publisher Code for ID Matched and Fuzzy Matched entities. For example: CRM, MKT, etc. This publisher code identifies the publisher (application that publishes the data) and should be declared in the model edition. The list of publisher codes is available by double-clicking the Publishers node in the Model Design view in the Model Design perspective. If the publisher code is unknown to the model, data from this publisher is processed, but this publisher will have the lowest ranking in a Preferred Publisher consolidation strategy.

Basic entities assume a single source, and do not use the B_PUBID column.

Publisher codes are case sensitive. Make sure to load B_PUBID with the publisher code as defined in the model. Publisher codes may contain uppercase letters, digits and underscores.

Within a single load, you can load the records from various publishers, using the B_PUBID column to identify each publisher

Primary Keys (Mandatory)

For matching entities, the primary key that you load into the SD table allows identifying the source record from the publisher (identified by Publisher Code B_PUDIB). This primary key will allow taking into account the creation or update of a source record and report through the consolidation process the corresponding data changes to the golden record.

For basic entities, the primary key that you provide in the SA table is the one of the golden record and will allow updating this record.

The primary key column to load depends on the Entity Type of the entity.

Basic Entity

If the entity is basic, then this ID must be loaded in the column representing the attribute defined as the primary key attribute for the entity.

ID Matched Entity

If the entity uses ID Matching, then this ID must be loaded in the column representing the attribute defined as the primary key attribute for the entity.

Fuzzy Matched Entity

If the entity uses Fuzzy Matching, then you must load into the B_SOURCEID column the value of the primary key from the source system. If this primary key is a composite key, then you must concatenate the values of the composite primary key and load them in the B_SOURCEID column.

The B_SOURCEID column is a VARCHAR(128) column. Make sure to perform the appropriate conversions for loading this column.

References (Mandatory)

When loading data for entities that are related by a reference relationship, you must load the referencing entity with the value of the referenced primary key. The columns to load differ depending on the Entity Type of the referenced entity.

Reference to a Basic Entity

If the referenced entity is an Basic entity, then you need to load the column representing the referencing attribute. This column is F_<Physical Name of the Reference To Role Name>.

Reference to an ID Matched Entity

If the referenced entity is an ID Matched entity, then you need to load the column representing the referencing attribute. This column is F_<Physical Name of the Reference To Role Name>.

For example, if Customer references Employee and this entity uses ID Matching, then you must load into SD_CUSTOMER the F_EMPLOYEE column with the primary key of the source employee record referenced by each customer record.

Reference to a Fuzzy Matched Entity

If the referenced entity is a Fuzzy Matched entity, then you need to load two columns:

FS_<Physical Name of the Referenced To Role Name>: Load this column with the Source ID of the referenced record.
FP_<Physical Name of the Referenced To Role Name>: Code of the publisher of the referenced record.

Note that these columns should be considered together. You should not load the FP_ column with a publisher code and leave FS_ to a null value, and vice versa.

For example, if Contact references Customer and this entity use Fuzzy Matching, you must load the following columns into the SD_CONTACT table:

FP_CUSTOMER: Code of the publisher providing the customer referenced by the given contact, e.g., MKT.
FS_CUSTOMER: Source ID of the customer referenced by the given contact, e.g., 81239.

Attribute Columns

You should load the attribute columns relevant for the entity you are loading.

Make sure to load:

The attribute columns that make sense for the entity class (B_CLASSNAME) you are loading.
The mandatory attribute columns. If not, pre-consolidation validation may reject source records with null values.
The columns for attributes using a list of values type. If these are loaded with values out of the LOV range of values, pre-consolidation validation may reject source records.

Attributes may be loaded with null or incorrect values if the values are set or modified by the enrichers. Enrichers are executed before any validation.

Loading Date and Timestamp Columns

Date attributes convert to Date columns in the database. The behavior of these date columns differ depending on the database:

Dates in PostgreSQL and SQL Server only contain no time or timestamp portion and do not require specific transformation.
Oracle’s Dates include a time portion. This portion is automatically truncated by the certification job using the TRUNC(date) function. Integration specialists should be aware of this automated transformation when loading a date and may consider performing this truncation in their integration flows.

Timestamp attributes convert to Timestamp columns that contain a timezone portion. Semarchy xDM stores and expects in the database timestamps in the timezone of the application server hosting the Semarchy xDM application. When loading timestamps, integration specialists should take into account the timezone differences and possible timezone conversion between their integration component and the data location’s database server.

Other Columns (Optional)

The following columns do not need to be loaded or can be optionally loaded:

B_ORIGINALBATCHID: This column is not used for external loads and should not be loaded.
B_CREATOR, B_UPDATOR: These columns can be optionally loaded to store the users who have created or updated the records in the source systems. If left null, these columns are automatically set to the name of the user who submits the batch.
B_CREDATE, B_UPDDATE: These columns can be optionally loaded to store the creation and update date (timestamp) of the records in the source systems. If left null, these columns are set to the submit date (timestamp) of the batch. Note that values loaded here are not propagated beyond the SD table.

Submitting a Load

Submitting an external load uses the following parameters:

load_id: Load ID returned by the load initialization.
integration_job: Name of the integration job to process this load.
user_name: name of the user who has initialized the external load. This user may or may not be a user defined in the security realm of the application server.

The SUBMIT_LOAD function call submits an external load identified by its Load ID and returns a Batch ID. It is performed on the repository database schema.

Submitting an External Load (Oracle)

vBatch_id := <repository_schema>.INTEGRATION_LOAD.SUBMIT_LOAD(
	<load_id>		/* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID */
	'<integration_job>'	/* Name of the Integration Job to trigger. */
	'<user_name>'		/* User who has initialized the load. */
	  );

Submitting an External Load (PostgreSQL)

vBatch_id := <repository_schema>.submit_load(
	<load_id>		/* Load ID returned by get_new_loadid */
	'<integration_job>'	/* Name of the Integration Job to trigger. */
	'<user_name>'		/* User who has initialized the load. */
	  );

Submitting an External Load (SQL Server)

EXEC <repository_database>.dbo.SUBMIT_LOAD
	@BATCH_ID OUTPUT,
	@P_LOAD_ID = @LOAD_ID,
	@P_INTEGRATION_JOB_NAME = '<integration_job>',/* Name of the Integration Job to trigger. */
	@P_USERNAME '<user_name>'			/* User who has initialized the load. */

The following example performs a function call to submit an external load identified by the Load ID 22. It submits it with the job name INTEGRATE_DATA. The repository schema is REPO. The returned Batch ID is stored in the vBatch_id variable.

Submitting an external load identified by the Load ID 22. (Oracle)

vBatch_id := REPO.INTEGRATION_LOAD.SUBMIT_LOAD(
	22,
	'INTEGRATE_DATA',
	'John Doe' );

Canceling a Load

Canceling a load is performed using the CANCEL_LOAD function with the following parameters:

load_id: Load ID returned by the load initialization.
user_name: name of the user who has initialized the external load. This user may or may not be a user defined in the security realm of the application server.

The CANCEL_LOAD procedure cancels an external load identified by its Load ID. It is performed on the repository database schema.

This procedure does not flush the content of the SD_% tables loaded during the external load. This must be taken care of separately.

Canceling an External Load (Oracle)

<repository_schema>.INTEGRATION_LOAD.CANCEL_LOAD(
	<load_id>		/* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID */
	'<user_name>'		/* User who has initialized the load. */
	  );

Canceling an External Load (PostgreSQL)

<repository_schema>.cancel_load(
	<load_id>		/* Load ID returned by get_new_loadid */
	'<user_name>'		/* User who has initialized the load. */
	  );

Canceling an External Load (SQL Server)

EXEC <repository_database>.dbo.CANCEL_LOAD(
	@P_LOAD_ID = @LOAD_ID,
	@P_USERNAME = '<user_name>'	/* User who has initialized the load. */
);

Call to INTEGRATION_LOAD.CANCEL_LOAD to cancel an external load identified by the Load ID 22. (Oracle)

REPO.INTEGRATION_LOAD.CANCEL_LOAD(
	22,
	'John Doe' );

Publishing Record Deletions

Semarchy xDM supports records deletion using SQL.

Publishing Golden Records Deletion

Golden record deletion applies to all entities and is similar to deleting a golden record from an application.

To publish a golden record deletion, you must load the following columns into the SA table (even for ID and fuzzy matched entities):

B_LOADID: Load this column with the Load ID provided by the GET_NEW_LOADID function call, or with the ID of the Continuous Load. Use the GET_CONTINUOUS_LOADID function to retrieve a continuous load ID from the continuous load name.
B_DELETETYPE: Load this column with the SOFT_DELETE or HARD_DELETE value depending on the type of delete that you want to trigger.
B_DELETEOPERATION: Delete operation ID. This ID must be a unique value for each deleted record. For example, use a UUID that you convert to a string (e.g.: Using the functions RAWTOHEX(SYS_GUID())).
B_CLASSNAME: Load this column with the name of the entity of the records to delete.
<GoldenID>: Load this column with the ID of the golden record to delete.

You can optionally load the B_DELETEAUTHOR and B_DELETEDATE columns to provide an author and a date for the delete operation.

With golden record deletion, the delete type (and the Delete Operation ID) is automatically propagated to the child records according to the delete propagation configured on the references.

If a delete fails (for example because a child record prevents the delete), the deletion error is traced in the AE - authoring error - table for the entity.

For more information about record deletion, see the Record Deletion section in the Semarchy xDM Developer’s Guide.

Publishing Master Records Deletion

For matching entities, you can publish master records deletion.
To publish such deletion, use the process described for deleting golden records, with the following differences:

Load the SD table instead of the SA table.
Instead of the <GoldenID> column, load the B_PUBID column, plus:
- the B_SOURCEID column for a fuzzy matched entity.
- the column representing the attribute defined as the primary key attribute for an ID matched entity.

Load the B_LOADID, B_DELETETYPE, B_DELETEOPERATION, B_CLASSNAME, B_DELETEAUTHOR and B_DELETEDATE columns as explained in Publishing Golden Records Deletion.

When deleting one of the master records contributing to a golden record, the values consolidated in the golden record are updated to take into account this master record deletion.

Master record deletion does not support the delete propagation defined in the reference relationships. Master deletes are not blocked by a restrict, propagated by a cascade, and references to deleted records are not automatically nullified. Each master deletion is considered individually. Users should perform in the same load all the changes that need to take place with the master deletion.

For more information about record deletion, see the Record Deletion section in the Semarchy xDM Developer’s Guide.

Managing Duplicates Using SQL

It is possible to manage duplicates using SQL, and perform programmatically actions typically available to users in applications' duplicate managers. The following sections provide the methods to perform various duplicates management operations using SQL with the UM table.

All these methods use a Load ID (in a vLoadID variable), which is initialized and submitted using a process similar to the one used when Publishing Data Using SQL.

Make sure to use a job that processes the entities for which the match and merge will run, as well as the related (child) entities.

You can also use Continuous Loads to perform the duplicate management operations.

Confirming Matches

This method automatically confirms programmatically certain matches. It reproduces the behavior of the user interface.

Columns to load into the UM table:

B_LOADID: Load this column with the Load ID provided by the GET_NEW_LOADID function call, or with the ID of the Continuous Load. Use the GET_CONTINUOUS_LOADID function to retrieve a continuous load ID from the continuous load name.
B_SOURCEID, B_PUBID: Load these columns with the ID and the publisher of the master records to confirm.
B_CLASSNAME: Load this column with the name of the entity of these records.
<GoldenID>: Load this column with the ID of the golden record to which the master record is attached to.
B_CONFIRMEDSDPK: Load this column with the same value as <GoldenID>.

Confirm all unconfirmed master records

INSERT INTO UM_CUSTOMER (
	B_LOADID,
	B_SOURCEID,
	B_PUBID,
	B_CLASSNAME,
	CUSTOMER_ID,
	B_CONFIRMEDSDPK
)
SELECT
	vLoad_id,
	B_SOURCEID,
	B_PUBID,
	'Customer',
	CUSTOMER_ID,
	CUSTOMER_ID
FROM MD_CUSTOMER
WHERE
  B_CONFIRMATIONSTATUS = 'NOT_CONFIRMED';

Forcing a New Record to Join a Golden Record

When loading a source record, use this method to load at the same time a user decision forcing this record to:

Either join a golden record with a known ID,
Or create a new golden record with this ID.

To perform such an operation, combine data loading with the previous technique.

Load a new customer record and force it to merge with the existing golden record with CUSTOMER_ID 2.

/* Load source record */

INSERT into SD_CUSTOMER (
  B_LOADID,
  B_SOURCEID,
  B_PUBID,
  B_CLASSNAME,
  -- Data fields are below
  CUSTOMER_NAME,
  ...

) values (
  vLoad_id,
  123456,
  'CRM',
  'Customer',
  -- Values are below
  'GADGETRON',
  ...

)

/* Force a merge of this source record with a known golden ID */

INSERT INTO UM_CUSTOMER (
  B_LOADID,
  B_SOURCEID,
  B_PUBID,
  B_CLASSNAME,
  CUSTOMER_ID,
  B_CONFIRMEDSDPK
)
SELECT
  vLoad_id,
  123456,
  'CRM',
  'Customer',
  2,   -- Target Golden ID
  2    -- Confirm the golden ID
FROM MD_CUSTOMER;

COMMIT;

Moving Master and Golden Records to Golden Records

This method automatically moves master records to golden records. It reproduces the behavior of the user interface for moving masters into golden records, or for merging golden records into golden records

Columns to load into the UM table:

B_LOADID: Load this column with the Load ID provided by the GET_NEW_LOADID function call, of with the ID of the Continuous Load.
B_SOURCEID, B_PUBID: Load these columns with the ID and the publisher of the master records to move.
B_CLASSNAME: Load this column with the name of the entity of these records.
<GoldenID>: Load this column with the ID of the golden record to which the master record should be moved to.
B_CONFIRMEDSDPK: Set this column to the <GoldenID> value if you want to flag the match as confirmed
B_XGRP: Load this column with a UUID value, representing the exclusion group. This UUID value should be same for all the records that belong to the same exclusion group, that is records that previously matched and that should no longer match. The value to load in this column depends on the exclusion groups of the master and golden records:
- If neither the master record being moved nor the target golden record is part of an exclusion group, this value should be null.
- If only the target golden record has an exclusion group, this value should be the UUID of this exclusion group.
- If only the master record being moved has an exclusion group, this value should be the UUID of this exclusion group.
- If the master and golden have different exclusion groups, then both these exclusion groups need to be merged in a new exclusion group. This value should this new UUID, generated for example using the SYS_GUID() Oracle function. In addition, all the records from the two original exclusion groups should be loaded into the UM table with this new exclusion group.

Merging the golden record with CUSTOMER_ID 1 into the golden record with CUSTOMER_ID 2 by moving all its master records.

INSERT INTO UM_CUSTOMER (
	B_LOADID,
	B_SOURCEID,
	B_PUBID,
	B_CLASSNAME,
	CUSTOMER_ID,
	B_CONFIRMEDSDPK,
	B_XGRP
)
SELECT
	vLoad_id,
	B_SOURCEID,
	B_PUBID,
	'Customer',
	2,          -- Target golden ID
	2,          -- Confirm the golden ID
	NULL        -- Neither master nor golden are in an exclusion group
FROM MD_CUSTOMER
WHERE
  CUSTOMER_ID = 1;

Similarly, to split a golden record by moving its master records to new golden records, you do not provide an existing golden ID but seed a new golden ID using the golden ID generation method for the entity.

Splitting all the master records from golden record with CUSTOMER_ID 5 into different golden records.

-- Store a new SYS_GUID value in a raw (16) variable which is used to create a new exclusion group.
select SYS_GUID() into vSysguidXGrp from dual;

-- Split the master records using this new exclusion group.

INSERT INTO UM_CUSTOMER (
	B_LOADID,
	B_SOURCEID,
	B_PUBID,
	B_CLASSNAME,
	CUSTOMER_ID,
	B_CONFIRMEDSDPK,
	B_XGRP
)
SELECT
	vLoad_id,
	B_SOURCEID,
	B_PUBID,
	'Customer',
	SEQ_CUSTOMER.NEXTVAL, -- New golden ID generated by a sequence.
	SEQ_CUSTOMER.CURRVAL, -- The same golden ID is used to confirm.
	vSysguidXGrp            -- New exclusion group ID.
FROM MD_CUSTOMER
WHERE
  CUSTOMER_ID = 5;

Reset User Decisions

This methods gets rid of user decisions previously taken with duplicate managers, forcing the master records to merge according to the matcher’s decisions.

Columns to load into the UM table:

B_LOADID: Load this column with the Load ID provided by the GET_NEW_LOADID function call, of with the ID of the Continuous Load.
B_SOURCEID, B_PUBID: Load these columns with the ID and the publisher of the master records to reconsider for the matching.
B_CLASSNAME: Load this column with the name of the entity of these records.

This operation is not possible in the MDM user interface.
All user decisions are lost. Exclusion groups and match groups are recomputed for these records.
All golden IDs are regenerated in the process.

Resetting all user decisions

INSERT INTO UM_CUSTOMER (
    B_LOADID,
    B_SOURCEID,
    B_PUBID,
    B_CLASSNAME
)
SELECT
    vLoad_id,
    B_SOURCEID,
    B_PUBID,
    'Customer'
FROM MD_CUSTOMER;

Replaying the Matching

The previous method does not force the match rules to run again. It only removes user decisions. To force the entire matching process to replay, you must reset of the user decisions, as described in the previous method, and reload the data to re-match into the SD table.

Replaying the matching process

-- Remove User Decisions

INSERT INTO UM_CUSTOMER (
	B_LOADID,
	B_SOURCEID,
	B_PUBID,
	B_CLASSNAME
)
SELECT
	vLoad_id,
	B_SOURCEID,
	B_PUBID,
	'Customer'
FROM MI_CUSTOMER
-- add below a clause to filter the records to re-process.
-- WHERE MI_CUSTOMER.CUSTOMER_ID = ??
;

-- Re-load the data from the latest master values.
INSERT into SD_CUSTOMER (
    B_LOADID,
    B_PUBID,
    B_SOURCEID,
    B_CLASSNAME,
	-- Data fields are below
    CUSTOMER_NAME,
    TOTAL_REVENUE
	...
)
SELECT
	vLoad_id,
	B_PUBID,
	B_SOURCEID
	-- Data fields are below
    CUSTOMER_NAME,
    TOTAL_REVENUE
	...
FROM MI_CUSTOMER
-- add below a clause to filter the records to re-process.
-- WHERE MI_CUSTOMER.CUSTOMER_ID = ??
;

COMMIT;

Consuming Data Using SQL

Consuming Data from Semarchy xDM is done in SQL using the tables of the hubs. This chapter covers this type of consumption.

Overview

Consuming the data mainly involves:

The Golden Data. Contains the enriched, consolidated, validated and certified golden records
The Master Data. Contains the master records linked to the golden records. Master records contain the references to the source records.

Using the golden data in conjunction with the master data allows cross referencing to source data and re-integrating golden data into source systems.

Consuming Golden Data

Golden Data Table Structure

The complete list of system columns available is provided in the Table Structures section of this guide. The following table lists the system columns used when consuming information from the hub:

Column Name Datatype Description

Column Name	Datatype	Description
`B_BATCHID`	`NUMBER(38,0)`	ID of the batch that created this record.
`B_CLASSNAME`	`VARCHAR2(128 CHAR)`	Class name of this record.
`B_CREDATE`	`TIMESTAMP(6)`	Submit date (timestamp) of the batch into which the record was created.
`B_UPDDATE`	`TIMESTAMP(6)`	Submit date (timestamp) of the batch into which the record was updated.
`B_CREATOR`	`VARCHAR2(128 CHAR)`	Submitter of the batch into which the record was created.
`B_UPDATOR`	`VARCHAR2(128 CHAR)`	Submitter of the batch into which the record was updated.

B_BATCHID

NUMBER(38,0)

ID of the batch that created this record.

B_CLASSNAME

VARCHAR2(128 CHAR)

Class name of this record.

B_CREDATE

TIMESTAMP(6)

Submit date (timestamp) of the batch into which the record was created.

B_UPDDATE

TIMESTAMP(6)

Submit date (timestamp) of the batch into which the record was updated.

B_CREATOR

VARCHAR2(128 CHAR)

Submitter of the batch into which the record was created.

B_UPDATOR

VARCHAR2(128 CHAR)

Submitter of the batch into which the record was updated.

B_CREDATE, B_UPDDATE: These timestamps do not necessarily correspond to the moment when the records were created or updated in a source system or in the hub using a stepper, duplicate manager, etc. They give the moment when the record was submitted to the hub. Note that a record, even unmodified, it considered updated as soon as it is submitted to the hub.
B_CREDATOR, B_UPDATOR: For records created or modified using direct authoring or duplicate managers, this corresponds to the user who has performed the changes. For records created or modified in workflows, this corresponds to the user who has submitted the workflow.

Accessing Golden Data Using SQL

To access golden data using SQL, you query on the GD table. The queries are filtered to access data for one or more entity classes,

The following code sample gives a query to access golden records in the GD table.

Selecting the golden data.

select G.* (1)
from GD_<Physical_Table_Name> G (2)
where G.B_CLASSNAME in ( <classname_list> ) (3)

Explanation of the query:

1	We select all columns from the golden record. You can select specific columns from the GD record.
2	The GD table accessed is named after the entity. Replace `<Physical Table Name>` with the physical table name defined for the entity.
3	Provide the list of entity classes stored in this table you want to access. For example, `Person`,`Company`. Note that if you are no using inheritance, only one class is stored in the table and this clause is not required.

Example: Select the CONTACT_ID, FIRST_NAME and LAST_NAME golden data for the Contact entity.

select CONTACT_ID, FIRST_NAME, LAST_NAME
from GD_CONTACT G
where G.B_CLASSNAME = 'Contact'

Golden Data Primary Key

The primary key for a golden data table depends on the Entity Type and the ID Generation for the entity. A column named after the Physical Column Name of the primary key attribute stores the golden record primary key for the GD table and has the following datatype:

For ID Generation - Sequence: NUMBER(38,0)
For ID Generation - UUID_: RAW
For ID Generation - Manual, the datatype is the one defined for the primary key attribute.

Consuming Master Data

Accessing master data from fuzzy and ID matched entities uses queries similar to those used to access the golden data.
Starting from the golden data, we can refer to the master data using the golden record primary key. The master data table (MD) includes a column that references this primary key. As the master data also stores the primary key of the publisher, it is possible to refer back to the source data from the golden data via the master data.

There is no master data for basic entities. Only source data, source errors and golden data are stored for this type of entities.

Depending on the Entity Type (ID Matched, Fuzzy Matched), access to the master data differs:

Accessing Master Data Using SQL (ID Matched Entity)

With ID Matched Entity, the master data table has a structure similar to the golden data table, and contains in addition the publisher code. The same primary key is stored in the golden and master data in a column named after the physical column name of the primary key attribute (<primary_key_column> in the sample below)

Select the golden data corresponding to a given source record ID (ID Matched Entity)

select M.B_SOURCEID, G.*
from MD_<Physical_Table_Name> M
inner join GD_<Physical_Table_Name> G on (
 	G.<primary_key_column> = M.<primary_key_column>
  and	G.B_CLASSNAME in ( <classname_list> )
)
where
      	M.B_CLASSNAME in ( <classname_list> )
  and 	M.B_PUBID = '<publisher_code>'
  and 	M.<primary_key_column> = '<searched_source_id>'

In this code, access is filtered with the class name for both the golden data and the master data (lines #5 and #8). The two tables are joined on their common primary key (line #4). In addition, the master data is filtered by source publisher (line #9) and ID of the source record (line #10).

Example: Select the golden data for the Employee source record having the EMPLOYEE_NUMBER=100 in the HR system (ID Matched Entity).

select 	M.EMPLOYEE_NUMBER, G.FIRST_NAME, G.LAST_NAME
from 	MD_EMPLOYEE M
inner join GD_EMPLOYEE G on (
 	G.EMPLOYEE_NUMBER = M.EMPLOYEE_NUMBER
  and	G.B_CLASSNAME = 'Employee'
)
where
      	M.B_CLASSNAME = 'Employee'
  and 	M.B_PUBID = 'HR'
  and 	M.EMPLOYEE_NUMBER = '100'

The resulting information can be used to update the source record with golden data.

Accessing Master Data Using SQL (Fuzzy Matched Entity)

With Fuzzy Matched Entity, the master data table has a structure similar to the golden data table. It contains a reference to the golden data primary key, but the master data primary key consists of two columns:

B_PUBID (VARCHAR2(30 CHAR)) contains the code of the publisher that published this record.
B_SOURCEID (VARCHAR2(128 CHAR)) contains the ID of the source record in that publisher

As a consequence, the link between golden and master records is done using the primary key column as in an ID Matched Entity, but the link to the source is done using the B_SOURCEID column.

Select the golden data for a given source record ID (Fuzzy Matched Entity)

select M.B_SOURCEID, G.*
from MD_<entity> M
inner join GD_<entity> G on (
 	G. <primary_key_column> = M. <primary_key_column>
  and	G.B_CLASSNAME in ( <classname_list> )
)
where
      	M.B_CLASSNAME in ( <classname_list> )
  and 	M.B_PUBID = '<publisher_code>'
  and 	M.B_SOURCEID = '<searched_source_id>'

In this code, the golden and master data tables are joined on their golden record primary key (line #4), but the master data is restricted by source publisher (line #9) and ID of the source record (line #10), using the B_SOURCEID column.

Example: Select the golden values for the source Contact record with the ID 27030 in the CRM system (Fuzzy Matched Entity).

select M.B_SOURCEID, G.FIRST_NAME, G.LAST_NAME
from 	MD_CONTACT M
inner join GD_CONTACT G on (
 	G.CONTACT_ID = M.CONTACT_ID
  and	G.B_CLASSNAME = 'Contact'
)
where
      	M.B_CLASSNAME = 'Contact'
  and 	M.B_PUBID = 'CRM'
  and 	M.B_SOURCEID = '27030'

Example: Select side by side the duplicates detected for a given source Contact record with the ID 27030 in the CRM system (Fuzzy Matched Entity). In this example, the master data table is used twice (aliased as M and MM) to retrieve the two sides of a duplicate pair.

Example: Side-by-side duplicates

select
  M.B_PUBID DUP1_PUBLISHER, M.B_SOURCEID DUP1_ID, M.FIRST_NAME DUP1_FIRST_NAME, M.LAST_NAME DUP1_LAST_NAME,
  MM.B_PUBID DUP2_PUBLISHER, MM.B_SOURCEID DUP2_ID, MM.FIRST_NAME DUP2_FIRST_NAME, MM.LAST_NAME DUP2_LAST_NAME,
  G.CONTACT_ID GOLD_ID, G.FIRST_NAME GOLD_FIST_NAME, G.LAST_NAME GOLD_LAST_NAME
from 	MD_CONTACT M
inner join GD_CONTACT G on
 (
 	G.CONTACT_ID = M.CONTACT_ID
  and	G.B_CLASSNAME = 'Contact'
 )
inner join MD_CONTACT MM on
 (
 	MM.CONTACT_ID = M.CONTACT_ID
  and	MM.B_CLASSNAME = 'Contact'
 )
where
      	M.B_CLASSNAME = 'Contact'
  and 	M.B_PUBID = 'CRM'       /* Publisher ID */
  and 	M.B_SOURCEID = '27030'  /* Source ID */
  /* and M.B_PUBID = MM.B_PUBID */
  /* Uncomment the previous line to restrict the duplicates
     to those within the CRM application */

Consuming Errors

Accessing Pre-Consolidation Errors Using SQL

Pre-consolidation errors can be accessed via the Source Errors (SE, or AE for basic entities) tables.
These tables store the error information, that is the information about the constraints that caused the records to fail the validation. The latter is stored in the B_CONSTRAINTNAME (name of the constraint) and B_CONSTRAINTTYPE (type of the constraint) columns.

SE tables do not store the erroneous data itself, but they contain the identifiers to the source (SD or SA for basic entities) records in error.

Note that the B_ERROR_STATUS column on the SD table provides a simple way to detect the records with errors.

Example: Select the source errors for the Contact entity.
In this example, incorrect foreign references would appear. To identify them, we retrieve the incorrect referenced IDs from FP_CUSTOMER and FS_CUSTOMER.

Select the contact records in error

  select * from SD_CONTACT where B_ERROR_STATUS = 'ERROR';

Select the errors as well as the data for the source contacts in error

select SE.B_BATCHID, SD.B_LOADID,
  SE.B_CONSTRAINTNAME, SE.B_CONSTRAINTTYPE,
  SE.B_PUBID, SE.B_SOURCEID,
  SD.FIRST_NAME, SD.LAST_NAME,
  SD.FP_CUSTOMER, SD.FS_CUSTOMER
from SE_CONTACT SE, SD_CONTACT SD
where
  SE.B_CLASSNAME = 'Contact'
  and SD.B_CLASSNAME = 'Contact'
  and SD.B_PUBID = SE.B_PUBID
  and SD.B_SOURCEID = SE.B_SOURCEID
  and SD.B_LOADID = SE.B_LOADID
  and SD.B_ERROR_STATUS = 'ERROR';

Accessing Post-Consolidation Errors Using SQL

Post-consolidation errors can also be accessed via the Golden Errors (GE) tables.
These tables store the error information, that is the information about the constraints that caused the records to fail the validation. The latter is stored in the B_CONSTRAINTNAME (name of the constraint) and B_CONSTRAINTTYPE (type of the constraint) columns.

GE tables do not store the erroneous data itself, but they contain the identifier to the golden (GD) records in error.

Note that the B_ERROR_STATUS column on the GD table provides a simple way to detect the records with errors.

Example: Select the errors and data for the golden contacts in error.

Select the golden contacts in error

select * from GD_CONTACT where B_ERROR_STATUS = 'ERROR';

Select the errors as well as the data for the golden contacts in error

select GE.B_BATCHID,
  GE.B_CONSTRAINTNAME, GE.B_CONSTRAINTTYPE,
  GE.CONTACT_ID,
  GD.FIRST_NAME, GD.LAST_NAME, GD.F_CUSTOMER
from GE_CONTACT GE, GD_CONTACT GD
where
  GE.B_CLASSNAME = 'Contact'
  and GD.B_CLASSNAME = 'Contact'
  and GD.CONTACT_ID = GE.CONTACT_ID
  and GD.B_BATCHID = GE.B_BATCHID
  and GD.B_ERROR_STATUS = 'ERROR';

REST API Overview

The REST API provides programmatic access to read data in Semarchy xDM.

Overview

The REST API is available at the following base URL (referred to as [base_url] in this section):

http://<host>:<port>/{wars-base-name}/api/rest/

Each operation in the REST API is available with a parameterized URL under this base URL. The URL may also take additional parameters.

URL parameters names are case sensitive. Make sure to use them with the case described in this documentation. For example $baseExprs is a valid parameter, but $baseexprs or $BASEXPRS are invalid and will be ignored.

REST API Documentation

The REST API exposes its built-in documentation as a link in the Welcome page. In addition, this documentation is available for tools as an OpenAPI specification.

To Access the REST API Documentation:

Log in to Semarchy xDM as a user with one of the Application Design, Application Management and Platform Administration platform privileges.
Click the REST API link on the Welcome page.

The documentation exposes the endpoint description and provides request and response samples.

The main operations available in the REST API are also described in the following sections.

Authentication & Security

This section describes the authentication mechanisms for the REST API and provides simple recommendations to secure the use of this API.

Authentication & Privileges

Applications accessing Semarchy xDM using the REST API must be authenticated.

The REST API supports two authentication methods:

Basic authentication, where you provide an Authorization HTTP header containing username:password encoded in base64. With this authentication method, REST API calls use the identity, roles, and privileges of the user specified with the username.
API Key authentication, where you provide an API-Key HTTP header containing an API key created in Semarchy Configuration. With this authentication method, REST API calls use the identity (Default User Name) and roles defined for the API Key.

Operations in the REST API need privileges similar to those required to perform the same operations via a user interface:

With Basic Authentication, the authenticated user must have roles with sufficient platform and data-level privileges. He must also have a role with a model privilege grant with the Grant access to the Integration API option selected to interact with data in a given model.
With API Key Authentication, the key must be defined with roles that have sufficient platform and data-level privileges. It must also have a role with a model privilege grant with the Grant access to the Integration API option selected to interact with data in a given model.

See the Semarchy xDM Administration Guide for more information about defining roles and API keys, and the Semarchy xDM Developer’s Guide for more information about securing data, models and privilege grants with those roles.

Security Recommendation

The following simple recommendations will help you secure your code and data when using the REST API.

Use HTTPS
Both API key and Basic authentication send credentials in plain text in HTTP headers to authenticate: They can be stolen by sniffing the network, unless if using https (a.k.a SSL).
We highly recommend using the REST API on a secured HTTP (SSL/TLS) and not on plain HTTP, to keep both the credentials and data transferred secured.
Secure credentials in your code
The client code needs to have those credential and this code should not be publicly visible with the credentials. For instance, it is a bad practice to have the API Key embedded in the javascript code available to all users in their browser.
Choose the right authentication method
API Keys authentication is good for technical or application user authentication. You should generally use API Keys in such case rather than using basic authentication for a technical user. API Keys is better as you can easily create and revoke API Keys from xDM platform instead of having to create the technical user/password in your authentication system.

Using the API

Data Formats

Receiving data from the API

Unless specified otherwise, all Semarchy xDM APIs will return the information that you request in the JSON format.

Sending data to the API

Certain API calls require you to send data in a particular format as part of the API call. By default, all API calls expect input in JSON format

Every API call that requires data to sent by POST or PUT has a different data structure that it expects in the payload. Refer to the documentation for the specific call you are making to see the exact payload formats to complete your request.

Errors

The API endpoints report in the response payload possible errors with an appropriate HTTP code. Refer to the documentation for each endpoint for the possible responses.

When calling an endpoint, note that:

Invalid query parameters are ignored and do not raise errors.
Invalid payload content or unexpected properties in the payload raise errors with suggested fixes.

Consuming Data Using the REST API

Querying Records

Query Multiple Records

Method	GET
Base URL	`http://<host>:<port>/semarchy/api/rest/`
URL	`[base_url]/query/[data_location_name]/[entity_name]/[view_type]` `[base_url]/query/[data_location_name]/[entity_name]/[view_type]([load_id])` `[base_url]/query/[data_location_name]/[entity_name]/[view_type]([batch_id or as_of_date])`
URL Structure	The URL is configured using the following values: `[data_location_name]`: Name of the data location to query `[entity_name]`: Name of the entity to query `[view_type]`: Type of data view to query. See View Types for more information. `[load_id]`: ID of the load containing the source records. This value works with certain view types only. Note that you can only query data in external loads, but not in continuous loads. `[batch_id or as_of_date]`: ID or date of the batch into which the records were created, modified or deleted. This value works with certain view types only. Note that the date must be provided in ECMAScript standard format (`YYYY-MM-DDTHH:mm:ss.sssZ`).
Supported Parameters	Pagination, Query Expressions, Filtering, Sorting
Response Format	The response contains the list of records with the query expressions. Multiple records query: sample response. `{ "records": [ { "CustomerName": "Gadgetron", "City": "Honolulu" }, { "CustomerName": "StayPaft", "City": "Honolulu" } ] }`

Method

GET

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/query/[data_location_name]/[entity_name]/[view_type]
[base_url]/query/[data_location_name]/[entity_name]/[view_type]([load_id])
[base_url]/query/[data_location_name]/[entity_name]/[view_type]([batch_id or as_of_date])

URL Structure

The URL is configured using the following values:

[data_location_name]: Name of the data location to query
[entity_name]: Name of the entity to query
[view_type]: Type of data view to query. See View Types for more information.
[load_id]: ID of the load containing the source records. This value works with certain view types only. Note that you can only query data in external loads, but not in continuous loads.
[batch_id or as_of_date]: ID or date of the batch into which the records were created, modified or deleted. This value works with certain view types only. Note that the date must be provided in ECMAScript standard format (YYYY-MM-DDTHH:mm:ss.sssZ).

Supported Parameters

Pagination, Query Expressions, Filtering, Sorting

Response Format

The response contains the list of records with the query expressions.

Multiple records query: sample response.

{
  "records": [
	{
  	"CustomerName": "Gadgetron",
  	"City": "Honolulu"
	},
	{
  	"CustomerName": "StayPaft",
  	"City": "Honolulu"
	}
  ]
}

Query a Single Record

Method	GET
Base URL	`http://<host>:<port>/semarchy/api/rest/`
URL	`[base_url]/query/[data_location_name]/[entity_name]/[view_type]/[record_id]` `[base_url]/query/[data_location_name]/[entity_name]/[view_type]([load_id])/[record_id]` `[base_url]/query/[data_location_name]/[entity_name]/[view_type]([batch_id or as_of_date])/[record_id]`
URL Structure	The URL is configured using the following values: `[data_location_name]`: Name of the data location to query `[entity_name]`: Name of the entity to query `[view_type]`: Type of data view to query. See View Types for more information. `[load_id]`: ID of the load containing the source records. This value works with certain view types only. Note that you can only query data in external loads, but not in continuous loads. `[batch_id or as_of_date]`: ID or date of the batch into which the record was created, modified or deleted. This value works with certain view types only. Note that the date must be provided in ECMAScript standard format (`YYYY-MM-DDTHH:mm:ss.sssZ`). `[record_id]`: ID of the record to query. Record ID Structure The `record_id` value depends on the `view_type` used. The IDs for the various view types are listed below: GD, GH4B, GI, GDWE, SA4L and SA4LK: `<id_attribute>` MD, MH4B, MI and UM: Fuzzy matched entity: `PublisherID`.`SourceID` ID matched entity: `PublisherID`.`<id_attribute>` GH: `FromBatchID`.`<id_attribute>` MH: Fuzzy matched entity: `FromBatchID`.`PublisherID`.`SourceID` ID matched entity: `FromBatchID`.`PublisherID`.`<id_attribute>` GX: `BatchID`.`<id_attribute>` MX: Fuzzy matched entity: `BatchID`.`PublisherID`.`SourceID` ID matched entity: `BatchID`.`PublisherID`.`<id_attribute>` SD and SDWE: Fuzzy: `LoadID`.`PublisherID`.`SourceID` ID matched entity: `LoadID`.`PublisherID`.`<id_attribute>` SA and SAWE: `LoadID`.`<id_attribute>` GE: `<id_attribute>`$`ContrainstName`.`ConstraintType` SE: Fuzzy: `LoadID`.`PublisherID`.`SourceID`$`ContrainstName`.`ConstraintType` ID matched entity: `LoadID`.`PublisherID`.`<id_attribute>`$`ContrainstName`.`ConstraintType` AE: `LoadID`.`<id_attribute>`$`ContrainstName`.`ConstraintType` SA: `LoadID`.`<id_attribute>` For example, to query a source data record (`SD` view), for matching entity you must provide the concatenation of `LoadID`.`PublisherID`.`SourceID` for the `record_id`
Supported Parameters	Query Expressions
Response Format	The response contains the list of query expressions. Single record query: sample response. `{ "CustomerName": "Gadgetron", "City": "Honolulu", "UPPER(CustomerName)": "GADGETRON" }`

Method

GET

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/query/[data_location_name]/[entity_name]/[view_type]/[record_id]
[base_url]/query/[data_location_name]/[entity_name]/[view_type]([load_id])/[record_id]
[base_url]/query/[data_location_name]/[entity_name]/[view_type]([batch_id or as_of_date])/[record_id]

URL Structure

The URL is configured using the following values:

[data_location_name]: Name of the data location to query
[entity_name]: Name of the entity to query
[view_type]: Type of data view to query. See View Types for more information.
[load_id]: ID of the load containing the source records. This value works with certain view types only. Note that you can only query data in external loads, but not in continuous loads.
[batch_id or as_of_date]: ID or date of the batch into which the record was created, modified or deleted. This value works with certain view types only. Note that the date must be provided in ECMAScript standard format (YYYY-MM-DDTHH:mm:ss.sssZ).
[record_id]: ID of the record to query.

Record ID Structure

The record_id value depends on the view_type used. The IDs for the various view types are listed below:

GD, GH4B, GI, GDWE, SA4L and SA4LK: <id_attribute>
MD, MH4B, MI and UM:
- Fuzzy matched entity: PublisherID.SourceID
- ID matched entity: PublisherID.<id_attribute>
GH: FromBatchID.<id_attribute>
MH:
- Fuzzy matched entity: FromBatchID.PublisherID.SourceID
- ID matched entity: FromBatchID.PublisherID.<id_attribute>
GX: BatchID.<id_attribute>
MX:
- Fuzzy matched entity: BatchID.PublisherID.SourceID
- ID matched entity: BatchID.PublisherID.<id_attribute>
SD and SDWE:
- Fuzzy: LoadID.PublisherID.SourceID
- ID matched entity: LoadID.PublisherID.<id_attribute>
SA and SAWE: LoadID.<id_attribute>
GE: <id_attribute>$ContrainstName.ConstraintType
SE:
- Fuzzy: LoadID.PublisherID.SourceID$ContrainstName.ConstraintType
- ID matched entity: LoadID.PublisherID.<id_attribute>$ContrainstName.ConstraintType
AE: LoadID.<id_attribute>$ContrainstName.ConstraintType
SA: LoadID.<id_attribute>

For example, to query a source data record (SD view), for matching entity you must provide the concatenation of LoadID.PublisherID.SourceID for the record_id

Supported Parameters

Query Expressions

Response Format

The response contains the list of query expressions.

Single record query: sample response.

{
    "CustomerName": "Gadgetron",
    "City": "Honolulu",
    "UPPER(CustomerName)": "GADGETRON"
}

Counting Records

Method	GET
Base URL	`http://<host>:<port>/semarchy/api/rest/`
URL	`[base_url]/count/[data_location_name]/[entity_name]/[view_type]` `[base_url]/count/[data_location_name]/[entity_name]/[view_type]([load_id])` `[base_url]/count/[data_location_name]/[entity_name]/[view_type]([batch_id or as_of_date])`
URL Structure	The URL is configured using the following values: `[data_location_name]`: Name of the data location to count `[entity_name]`: Name of the entity to count `[view_type]`: Type of data view to query. See View Types for more information. `[load_id]`: ID of the load containing the source records. This value works with certain view types only. Note that you can only count records in external loads, but not in continuous loads. `[batch_id or as_of_date]`: ID or date of the batch into which the records were created, modified or deleted. This value works with certain view types only. Note that the date must be provided in ECMAScript standard format (`YYYY-MM-DDTHH:mm:ss.sssZ`).
Supported Parameters	Filtering
Response Format	The response contains the count of records. Record count query: sample response. `{ "count": "42", }`

Method

GET

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/count/[data_location_name]/[entity_name]/[view_type]
[base_url]/count/[data_location_name]/[entity_name]/[view_type]([load_id])
[base_url]/count/[data_location_name]/[entity_name]/[view_type]([batch_id or as_of_date])

URL Structure

The URL is configured using the following values:

[data_location_name]: Name of the data location to count
[entity_name]: Name of the entity to count
[view_type]: Type of data view to query. See View Types for more information.
[load_id]: ID of the load containing the source records. This value works with certain view types only. Note that you can only count records in external loads, but not in continuous loads.
[batch_id or as_of_date]: ID or date of the batch into which the records were created, modified or deleted. This value works with certain view types only. Note that the date must be provided in ECMAScript standard format (YYYY-MM-DDTHH:mm:ss.sssZ).

Supported Parameters

Filtering

Response Format

The response contains the count of records.

Record count query: sample response.

{
  "count": "42",
}

Named Queries

Named queries provide customized REST endpoints to consume data with a predefined structure and query parameters.

For more information about Named Queries, see the Creating Named Queries section in the Semarchy xDM Developer’s Guide.

Query Multiple Records with a Named Query

Method	GET
Base URL	`http://<host>:<port>/semarchy/api/rest/`
URL	`[base_url]/named-query/[data_location_name]/[named_query]/[view_type]` `[base_url]/named-query/[data_location_name]/[named_query]/[view_type]([load_id])` `[base_url]/named-query/[data_location_name]/[named_query]/[view_type]([batch_id or as_of_date])`
URL Structure	The URL is configured using the following values: `[data_location_name]`: Name of the data location to query `[named_query]`: Name of the named query `[view_type]`: Type of data view to query. See View Types for more information. `[load_id]`: ID of the load containing the source records. This value works with certain view types only. Note that you can only query data in external loads, but not in continuous loads. `[batch_id or as_of_date]`: ID or date of the batch into which the records were created, modified or deleted. This value works with certain view types only. Note that the date must be provided in ECMAScript standard format (`YYYY-MM-DDTHH:mm:ss.sssZ`).
Supported Parameters	Pagination, Filtering, Sorting. In addition, query parameters are passed to the query using their name. For example: `?param1=value1&param2=value2` Sorting applies before the sort expression defined in the name query itself. Pagination and Filtering only apply to the root object of the named query.
Response Format	The response contains a hierarchy of objects corresponding to the named query definition.

Method

GET

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/named-query/[data_location_name]/[named_query]/[view_type]
[base_url]/named-query/[data_location_name]/[named_query]/[view_type]([load_id])
[base_url]/named-query/[data_location_name]/[named_query]/[view_type]([batch_id or as_of_date])

URL Structure

The URL is configured using the following values:

[data_location_name]: Name of the data location to query
[named_query]: Name of the named query
[view_type]: Type of data view to query. See View Types for more information.
[load_id]: ID of the load containing the source records. This value works with certain view types only. Note that you can only query data in external loads, but not in continuous loads.
[batch_id or as_of_date]: ID or date of the batch into which the records were created, modified or deleted. This value works with certain view types only. Note that the date must be provided in ECMAScript standard format (YYYY-MM-DDTHH:mm:ss.sssZ).

Supported Parameters

Pagination, Filtering, Sorting.

In addition, query parameters are passed to the query using their name. For example: ?param1=value1&param2=value2

Sorting applies before the sort expression defined in the name query itself. Pagination and Filtering only apply to the root object of the named query.

Response Format

The response contains a hierarchy of objects corresponding to the named query definition.

Counting Records in a Named Query

Method	GET
Base URL	`http://<host>:<port>/semarchy/api/rest/`
URL	`[base_url]/named-query-count/[data_location_name]/[named_query]/[view_type]`
URL Structure	The URL is configured using the following values: `[data_location_name]`: Name of the data location to query `[named_query]`: Name of the named query `[view_type]`: Type of data view to query. See View Types for more information.
Supported Parameters	Filtering. In addition, query parameters are passed to the query using their name. For example: `?param1=value1&param2=value2`
Response Format	The response contains the count of records returned by the named query.

Method

GET

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/named-query-count/[data_location_name]/[named_query]/[view_type]

URL Structure

The URL is configured using the following values:

[data_location_name]: Name of the data location to query
[named_query]: Name of the named query
[view_type]: Type of data view to query. See View Types for more information.

Supported Parameters

Filtering.

In addition, query parameters are passed to the query using their name. For example: ?param1=value1&param2=value2

Response Format

The response contains the count of records returned by the named query.

Query Parameters

View Types

Certain REST endpoints use a view type as part of their URL. This view type, identified by an alias (GD, MD, etc) defines the type of data (golden, master, errors, etc) queried for the entity, using the endpoint.

Example: Query the employees golden records using the GD view.

/api/rest/query/HR/Employee/GD

Example: Query the employees master records using the MD view.

/api/rest/query/HR/Employee/MD

Some of these view types must be contextualized to list records for a given Load ID or for a given point in history (identified by a Batch ID or an As of Date).

Example: Query the employees golden history, using the GH view, as of the batch 22565

/api/rest/query/HR/Employee/GH(22565)

Note that the other view types ignore this contextualization syntax. You must use Filtering to query a subset of the records.

Example: Query employees golden data, filtering only the records changed in the batch 22565

/api/rest/query/HR/Employee/GD?$f=BatchID=22565

The following table lists the most common view types.

View Type URL Contextualization Description

View Type	URL Contextualization	Description
`GD`		Golden consolidated and certified records.
`GE`		Errors detected after master data consolidation (post-consolidation), for matching entities.
`GDWE`		Golden consolidated and certified records, filtered to only show those with errors (matching entities only)
`GH`		Golden History.
`GH4B`	`batch_id` or `as_of_date`	Golden data history as of batch or golden data if not historized. This view type requires a `batch_id` or `as_of_date` value in the URL.
`GX`		Deleted golden records logs (and data for soft delete).
`MD`		Enriched, validated and cleansed master records.
`MH`		Master records history.
`MH4B`	`batch_id` or `as_of_date`	Master data history as of batch or master data if not historized. This view type requires a `batch_id` or `as_of_date` value in the URL.
`MX`		Deleted master records logs (and data for soft delete).
`SD`		Source records from finished loads,
`SE`		Errors on source records, returned with the data from the erroneous record,
`SDWE`		Source records from finished loads, filtered to only show those with errors,
`SD4L`	`load_id`	Source data loaded by the publishers for a given load. This view type requires a `load_id` value in the URL.
`SA`		Source data authored by users.
`AE`		Errors detected on source authoring data.
`SAWE`		Source data authored by users with errors.
`SA4L`	`load_id`	Source data authored by users for a given load. This view type requires a `load_id` value in the URL. This view automatically resolves references to source/golden records (for basic entities) or source/master records (for matching entities), as performed when authoring the records in the applications.

GD

Golden consolidated and certified records.

GE

Errors detected after master data consolidation (post-consolidation), for matching entities.

GDWE

Golden consolidated and certified records, filtered to only show those with errors (matching entities only)

GH

Golden History.

GH4B

batch_id or as_of_date

Golden data history as of batch or golden data if not historized. This view type requires a batch_id or as_of_date value in the URL.

GX

Deleted golden records logs (and data for soft delete).

MD

Enriched, validated and cleansed master records.

MH

Master records history.

MH4B

batch_id or as_of_date

Master data history as of batch or master data if not historized. This view type requires a batch_id or as_of_date value in the URL.

MX

Deleted master records logs (and data for soft delete).

SD

Source records from finished loads,

SE

Errors on source records, returned with the data from the erroneous record,

SDWE

Source records from finished loads, filtered to only show those with errors,

SD4L

load_id

Source data loaded by the publishers for a given load. This view type requires a load_id value in the URL.

SA

Source data authored by users.

AE

Errors detected on source authoring data.

SAWE

Source data authored by users with errors.

SA4L

load_id

Source data authored by users for a given load. This view type requires a load_id value in the URL. This view automatically resolves references to source/golden records (for basic entities) or source/master records (for matching entities), as performed when authoring the records in the applications.

All the views and their aliases are detailed in the Semarchy xDM SemQL Reference Guide.

Pagination

When calls return a large number of records, it is recommended to page the record set. By requesting smaller subsets of data, you will get a response much faster than when requesting the entire, potentially large, record set.

Parameter Default Value Description

Parameter	Default Value	Description
`$offset`	0	Defines the record offset for pagination.
`$limit`	100	Defines the maximum number of returned records.

$offset

Defines the record offset for pagination.

$limit

100

Defines the maximum number of returned records.

Example: Request the Customers between position 20 and 29

/query/CustomerAndFinancialMDM/Customer/GD?$offset=20&$limit=10

Query Expressions

Query expressions define which values are returned by the query for each record.

Parameter Default Value Description

Parameter	Default Value	Description
`$baseExprs`	USER_ATTRS	Defines the set of base attributes to include in the records, in addition to those specified using the `expr` parameter. Possible values: NONE: No attributes. USER_ATTRS: All entity attributes, except the built-in attributes and references. VIEW_ATTRS: All entity attributes, except references. Include the built-in attributes.
`$expr`	N/A	Expression to include to the record in addition to, or to remove from, the base attributes (`baseExprs`). You can put as many expressions as you need as query parameters. These expressions may be one of the following: `[semql_expression]`: A SemQL expression to add to the base attributes. The expression string is used as the alias for the value. `[alias]:[semql_expression]`: A SemQL expression to add to the base attributes, preceded by its alias. `$exclude:[semql_expression]`: A SemQL expression to remove from the base attributes.

$baseExprs

USER_ATTRS

Defines the set of base attributes to include in the records, in addition to those specified using the expr parameter. Possible values:

NONE: No attributes.
USER_ATTRS: All entity attributes, except the built-in attributes and references.
VIEW_ATTRS: All entity attributes, except references. Include the built-in attributes.

$expr

N/A

Expression to include to the record in addition to, or to remove from, the base attributes (baseExprs). You can put as many expressions as you need as query parameters. These expressions may be one of the following:

[semql_expression]: A SemQL expression to add to the base attributes. The expression string is used as the alias for the value.
[alias]:[semql_expression]: A SemQL expression to add to the base attributes, preceded by its alias.
$exclude:[semql_expression]: A SemQL expression to remove from the base attributes.

Example: Query all Customer attributes, except the built-in ones. Add the Creator and CreationDate built-in attributes.

/query/CustomerAndFinancialMDM/Customer/GD?$baseExprs=USER_ATTRS&$expr=Creator&$expr=CreationDate

Example: Query only the CustomerName (in uppercase, aliased as 'Name') and TotalRevenue (aliased as 'Revenue').

/query/CustomerAndFinancialMDM/Customer/GD?$baseExprs=NONE&$expr=Name:Upper(CustomerName)&$expr=Revenue:TotalRevenue

Example: Query all Customer attributes, except the built-in ones and the TotalRevenue.

/query/CustomerAndFinancialMDM/Customer/GD?$baseExprs=USER_ATTRS&$expr=$exclude:TotalRevenue

For more information about SemQL expressions and attributes, refer to the Semarchy xDM SemQL Reference Guide.

Sorting

Sort expressions define the order of the records in the response.

Parameter Default Value Description

Parameter	Default Value	Description
`$orderBy`	None	SemQL Order By Clause defining the order of the records in the response.

$orderBy

None

SemQL Order By Clause defining the order of the records in the response.

Example: Query Customers sorted by TotalRevenue and CustomerName

/query/CustomerAndFinancialMDM/Customer/GD?$orderBy=TotalRevenue%20DESC,CustomerName%20ASC

For more information about sorting with SemQL, refer to the Semarchy xDM SemQL Reference Guide.

Filtering

Use filters to request only specific records, that match the criteria you choose.

Parameter Default Value Description

Parameter	Default Value	Description
`$f`		SemQL filter. It is a SemQL condition applied to filter records. Only records matching this condition are returned. You can combine several several SemQL filters in the same query.

$f

SemQL filter. It is a SemQL condition applied to filter records. Only records matching this condition are returned. You can combine several several SemQL filters in the same query.

Example: Query Contacts whose FirstName starts with 'Joe'

/query/CustomerAndFinancialMDM/Contact/GD?$f=FirstName%20LIKE%20%27Joe%25%27

For more information about SemQL conditions and expressions, refer to the Semarchy xDM SemQL Reference Guide.

Publishing Data Using the REST API

The REST API provides also programmatic management of data loads in Semarchy xDM, using requests as well as responses in JSON format.

Overview

The REST API provides the capabilities to manage loads, which includes querying, creating, submitting and canceling loads. It also supports persisting records in existing loads. Finally, it provides a shortcut to Load and Submit data in a single request.

Querying Existing Loads using REST

Method	GET
Base URL	`http://<host>:<port>/semarchy/api/rest/`
URL	`[base_url]/loads/[data_location_name]` to list the loads `[base_url]/load-count/[data_location_name]` to count the loads
Supported Parameters	`$offset`: Defines the results offset for pagination for the list of loads. `$limit` Defines the maximum number of returned results for the list of loads. `$batchId`: Limit results to the loads with given Batch ID. `$loadStatus`: Limit results to the loads in a given status. `$jobNamePattern`: Limit results to the loads with a job name matching the pattern. Use the `_` and `%` wildcards to represent one or any number of characters. `$programNamePattern`: Limit results to the loads created with program name matching the pattern. Use the `_` and `%` wildcards to represent one or any number of characters. `$loadCreator`: Limit results to the loads with this creator. `$batchSubmitter`: Limit results to the loads with this submitter. `$loadDescriptionPattern`: Limit results to the load with a description matching the pattern. Use the `_` and `%` wildcards to represent one or any number of characters.
Response Format	The response contains a count of loads, or the list of loads meeting the criteria. The information returned for each load depends on the status of the load. Query Existing Loads: sample response. { "loads": [ { "loadId": 21, "loadStatus": "DONE", "loadCreator": "semadmin", "loadCreationDate": "2019-04-01T15:47:51.920Z", "programName": "curl", "loadDescription": "Customer Load", "loadSubmitDate": "2019-04-01T15:47:52.137Z", "batchSubmitter": "semadmin", "batchId": 21, "integrationJobName": "INTEGRATE_DATA", "integrationJobQueueName": "Default", "integrationJob": { "startDate": "2019-04-01T15:47:54.537Z", "completionDate": "2019-04-01T15:48:39.483Z", "duration": 44946, "currentTask": { "name": "Insert new MH for modified MD records", "startDate": "2019-04-01T15:48:39.244Z", "duration": 239 }, "notificationStatus": "DONE" }, "numberOfJobExecutions": 1, "submitInterval": -1, "submittable": true, "loadType": "EXTERNAL_LOAD" } ] }

Method

GET

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/loads/[data_location_name] to list the loads [base_url]/load-count/[data_location_name] to count the loads

Supported Parameters

$offset: Defines the results offset for pagination for the list of loads.
$limit Defines the maximum number of returned results for the list of loads.
$batchId: Limit results to the loads with given Batch ID.
$loadStatus: Limit results to the loads in a given status.
$jobNamePattern: Limit results to the loads with a job name matching the pattern. Use the _ and % wildcards to represent one or any number of characters.
$programNamePattern: Limit results to the loads created with program name matching the pattern. Use the _ and % wildcards to represent one or any number of characters.
$loadCreator: Limit results to the loads with this creator.
$batchSubmitter: Limit results to the loads with this submitter.
$loadDescriptionPattern: Limit results to the load with a description matching the pattern. Use the _ and % wildcards to represent one or any number of characters.

Response Format

The response contains a count of loads, or the list of loads meeting the criteria. The information returned for each load depends on the status of the load.

Query Existing Loads: sample response.

{
  "loads": [
    {
      "loadId": 21,
      "loadStatus": "DONE",
      "loadCreator": "semadmin",
      "loadCreationDate": "2019-04-01T15:47:51.920Z",
      "programName": "curl",
      "loadDescription": "Customer Load",
      "loadSubmitDate": "2019-04-01T15:47:52.137Z",
      "batchSubmitter": "semadmin",
      "batchId": 21,
      "integrationJobName": "INTEGRATE_DATA",
      "integrationJobQueueName": "Default",
      "integrationJob": {
        "startDate": "2019-04-01T15:47:54.537Z",
        "completionDate": "2019-04-01T15:48:39.483Z",
        "duration": 44946,
        "currentTask": {
          "name": "Insert new MH for modified MD records",
          "startDate": "2019-04-01T15:48:39.244Z",
          "duration": 239
        },
        "notificationStatus": "DONE"
      },
      "numberOfJobExecutions": 1,
      "submitInterval": -1,
      "submittable": true,
      "loadType": "EXTERNAL_LOAD"
    }
  ]
}

Load Type

The loadType indicates the nature of the load:

An external load (EXTERNAL_LOAD) which can be submitted (submittable=true).
A continuous load (CONTINUOUS_LOAD) which cannot be manually submitted but is automatically submitted every submitInterval seconds.
A load attached to application activities, which cannot be submitted: WORKFLOW, DIRECT_AUTHORING, DIRECT_DELETE, DIRECT_DUPS_CONFIRM, DIRECT_DUPS_MANAGEMENT, ACTIVITY.

Load Status

The possible values for the Load Status are listed in the table below.

Load Status	Description
RUNNING	The load is currently running.
CANCELED	The load was canceled (CancelLoad)
PENDING	The load was submitted. A batch was created and is waiting for the batch poller to pick it.
SCHEDULED	The batch was taken into account by the batch poller. The job is queued by the engine.
PROCESSING	The batch’s job is currently being processed by the engine.
SUSPENDED	The job is suspended, either by an administrator or due to an error. It awaits for administrator intervention.
WARNING	The job completed successfully, but some records have caused validation errors.
DONE	The job completed successfully with no validation errors.
ERROR	The job did not complete successfully, it was canceled by an administrator.

Load Status

Description

RUNNING

The load is currently running.

CANCELED

The load was canceled (CancelLoad)

PENDING

The load was submitted. A batch was created and is waiting for the batch poller to pick it.

SCHEDULED

The batch was taken into account by the batch poller. The job is queued by the engine.

PROCESSING

The batch’s job is currently being processed by the engine.

SUSPENDED

The job is suspended, either by an administrator or due to an error. It awaits for administrator intervention.

WARNING

The job completed successfully, but some records have caused validation errors.

DONE

The job completed successfully with no validation errors.

ERROR

The job did not complete successfully, it was canceled by an administrator.

Integration Job

If a job is attached to the load (the load was submitted), then the integrationJob object provides details about this job, including its start date, current task, duration and any error that may occur in this job.
It also includes the notificationStatus, which indicates whether the notifications were successfully sent.

The currentTask correspond to the running task for RUNNING jobs and the last executed task for KILLED or SUSPENDED jobs

Use the API to monitor jobs which are SUSPENDED or in ERROR to report possible issues with the integration. Combine this endpoint with the capability to manage loads to automate job restart.

Querying a Load using REST

Method	GET
Base URL	`http://<host>:<port>/semarchy/api/rest/`
URL	`[base_url]/loads/[data_location_name]/[load_id or load_name]`
Response Format	The response contains the load identified by `[load_id or load_name]`. The information returned for the load depends on its status. Query one Load: sample response. `{ "loadId": 26, "loadStatus": "CANCELED", "loadCreator": "semadmin", "loadCreationDate": "2019-04-08T09:35:46.897Z", "numberOfJobExecutions": 0, "submitInterval": -1, "submittable": true, "loadType": "DIRECT_AUTHORING" }`

Method

GET

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/loads/[data_location_name]/[load_id or load_name]

Response Format

The response contains the load identified by [load_id or load_name]. The information returned for the load depends on its status.

Query one Load: sample response.

{
  "loadId": 26,
  "loadStatus": "CANCELED",
  "loadCreator": "semadmin",
  "loadCreationDate": "2019-04-08T09:35:46.897Z",
  "numberOfJobExecutions": 0,
  "submitInterval": -1,
  "submittable": true,
  "loadType": "DIRECT_AUTHORING"
}

Initializing a Load using REST

Method	POST
Base URL	`http://<host>:<port>/semarchy/api/rest/`
URL	`[base_url]/loads/[data_location_name]`
Request Payload	The request contains the `CREATE_LOAD` `action`, as well as the information required to create a new load. Load creation: sample request. `{ "action":"CREATE_LOAD", "programName": "curl", "loadDescription": "Load Customers" }`
Response Format	The response contains the load information, including the load ID, load type, and an indication of the status. Load creation: sample response. `{ "loadId": 27, "loadStatus": "RUNNING", "loadCreator": "semadmin", "loadCreationDate": "2019-05-06T13:35:44.259Z", "programName": "curl", "loadDescription": "Load Customers", "numberOfJobExecutions": 0, "submitInterval": -1, "submittable": true, "loadType": "EXTERNAL_LOAD" }`

Method

POST

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/loads/[data_location_name]

Request Payload

The request contains the CREATE_LOAD action, as well as the information required to create a new load.

Load creation: sample request.

{
  "action":"CREATE_LOAD",
  "programName": "curl",
  "loadDescription": "Load Customers"
}

Response Format

The response contains the load information, including the load ID, load type, and an indication of the status.

Load creation: sample response.

{
  "loadId": 27,
  "loadStatus": "RUNNING",
  "loadCreator": "semadmin",
  "loadCreationDate": "2019-05-06T13:35:44.259Z",
  "programName": "curl",
  "loadDescription": "Load Customers",
  "numberOfJobExecutions": 0,
  "submitInterval": -1,
  "submittable": true,
  "loadType": "EXTERNAL_LOAD"
}

Loading Data using REST

To load data in a given load, the URL must contain the Load ID that was returned at load creation time, or the Load ID/Name of a continuous load.

Method

POST

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/loads/[data_location_name]/[load_id or load_name]

Request Payload

The request contains the PERSIST_DATA action, as well as the information required to load data. This information includes:

The persistOptions, which define how records are published into the hub. See Configuring Data Loading in REST for more information.
The persistRecords, which contain the records to be loaded into the hub.

Load data: sample request.

{
  "action":"PERSIST_DATA",
  "persistOptions": {
    "defaultPublisherId": "CRM",
    "optionsPerEntity": {
      "Customer": {
        "enrichers":[],
        "validations":[],
        "queryPotentialMatches": true
        }
      },
    "missingIdBehavior": "GENERATE",
    "persistMode": "IF_NO_ERROR_OR_MATCH"
    },
  "persistRecords": {
    "Customer": [
      {
        "CustomerName": "Gadgetron"
      }
    ]
  }
}

Response Format

The response contains, in the records element, the enriched records, the failedValidations as well as the potentialMatches (if any).
The response also contains the records' status as well as the load information..

Load submission: sample response.

{
    "status": "PERSISTED",   (1)
    "load": {                (2)
        "loadId": 27,
        "loadStatus": "RUNNING",
        "loadCreator": "semadmin",
        "loadCreationDate": "2019-05-06T13:35:44.259Z",
        "programName": "curl",
        "loadDescription": "Load Customers",
        "numberOfJobExecutions": 0,
        "submitInterval": -1,
        "submittable": true
    },
    "records": {
        "Customer": [
            {
                "entityName": "Customer",
                "recordValues": {
                    "PublisherID": "CRM",
                    "SourceID": "5",
                    ...
                    "CustomerName": "Gadgetron",
                    "TotalRevenue": null,
                    "InputAddress.Address": null,
                    ...
                    "GeocodedAddress.Quality": null,
                    "FID_AccountManager": null
                },
                "failedValidations": [],
                "potentialMatches": []
            }
        ]
    }
}

1	The status is `PERSISTED` if the data was persisted, or or `PERSIST_CANCELLED`, if the data has not been persisted, for example if validations failed, or if a match was found.
2	Load information, similar to the information returned when Querying a Load using REST

Configuring Data Loading in REST

Persist Options

When loading one or more records you can configure the following elements in the persistOptions element:

For each entity:
- enrichers: List of enrichers that should be executed before persisting the records.
- validations: List of validations that should be executed after the enrichers.
- queryPotentialMatches: Detect and report potential master records matching the incoming record, using the matcher defined for the entity.
For the entire load:
- missingIdBehavior: Option to define whether to generate IDs when they are not provided in the payload. Possible values are GENERATE to generate the ID or FAIL to fail loading if the ID is missing.
- persistMode: Flag to define whether the records should be persisted or not.

Updating Existing Records

You update an existing record by providing its ID when loading the data:

For basic entities, the ID attribute of the record to update.
For ID-matched entities, the Source ID and Publisher ID of the master record to update.

If you persist a record in a load with an existing record ID, a copy of the record is checked out and changes are applied only the field provided in the request JSON. Other attributes still keep their value.

Setting the Auditing Fields

By default, the auditing fields, Creator, Updator, CreateDate, UpdateDate are automatically set to the current user name and the current date when the user publishes the data. These values cannot be set by the user.

However, a user may be configured to push values in these auditing fields to publish data, for example, in the past, or on behalf of other users.

A user with a role having the Allow Publishing as user in API option selected in a model privilege grant is able to set the audit fields (Creator, Updator, CreateDate, UpdateDate) while publishing data via the REST API.

Enrich and Validate

When loading data, you can choose to execute enrichers, validations, and matchers for each entity.

You may define, for each entity you want to configure, under the optionsPerEntity element, one element named after the entity. For each entity, you can:

give a list of enrichers to run, with their enricher names,
give a list of validations to execute, with their validationType and validationName. Possible validationType values are CHECK, PLUGIN, MANDATORY, LOV, FOREIGN or UNIQUE,

Detect Matches

When loading data, indicate with the queryPotentialMatches boolean element whether the platform should check for duplicates according to the matching rules defined for the entity.

When queryPotentialMatches is set to true, master records potentially matching an incoming record are returned in the response, with all their used-defined attributes, to help identify the reason for the matches.

You can define the attributes returned to only include those required for a specific use case, with two additional properties:

queryPotentialMatchesBaseExpressions defines the set of base attributes to include in the master records detected as potential matches:
NONE: no attributes.
USER_ATTRS: all attributes, except built-in attributes and references.
VIEW_ATTRS: all attributes, except references. This includes built-in attributes.
queryPotentialMatchesExpressions defines a list of expressions to return in addition to the set of base attributes. These expressions are in the following format: <alias>:<semql_expression>.

For example, the following request looks for potential matches.

Request to test potential matches for the Person entity.

{
  "action":"PERSIST_DATA",
  "persistOptions": {
    "defaultPublisherId": "CRM",
    "optionsPerEntity": {
      "Person": {
        "queryPotentialMatches": true,  (1)
        "enrichers":["CleanseEmail"],
        "validations":[],
        "queryPotentialMatchesBaseExpressions": "NONE", (2)
        "queryPotentialMatchesExpressions": { (2)
          "Name": "Concat(FirstName, ' ', LastName)",
          "Email": "CleansedEmail",
          "Golden ID": "Gold_ID",
          "Master ID": "ID"
        }
      }
    },
    "missingIdBehavior": "FAIL",
    "persistMode": "NEVER"   (3)
  },
  "persistRecords": {
    "Person": [
    {
      "SourceID": "99998",
      "FirstName": "John",
      "LastName": "Doe",
      "DateOfBirth": "1974-01-25",
      "SourceEmail": "jass@ellerbusch.com"
    }
    ]
  }
}

1	Trigger potential matches detection for the records. Note that since the matcher uses enriched values, the CleanseEmail enricher is also triggered.
2	Select the information returned for the potential matches. Since `queryPotentialMatchesBaseExpressions` is set to `NONE`, only the expressions defined in the `queryPotentialMatchesExpressions` are returned.
3	This value for `PersistMode` never persists the records. This call only finds potential matches.

The response to this request is as follows:

Response with one potential match found

{
  "status": "PERSIST_CANCELLED",
  "load": {
    ...
  },
  "records": {
    "Person": [
      {
        "entityName": "Person",
        "recordValues": {
          "CleansedEmail": "jass@ellerbusch.com",
          "SourceEmail": "jass@ellerbusch.com",
          "DateOfBirth": "1974-01-25",
          "FirstName": "John",
          "SourceID": "99998",
          "PublisherID": "CRM",
          "LastName": "Doe",
          ...
        },
        "failedValidations": [],
        "potentialMatches": [
          {
            "matchRuleName": "ExactEmailMatch",
            "matchScore": 74,
            "matchedRecordLocation": "MD",
            "matchedRecordId": {
              "SourceID": "1320830",
              "PublisherID": "CRM"
            },
            "matchedRecordData": {
              "Master ID": "CRM.1320830",
              "Email": "jass@ellerbusch.com",
              "Golden ID": 10002,
              "Name": "Jass Ellerbusch"
            }
          }
        ]
      }
    ]
  }
}

Data Loading Behavior

When invoked with a payload, the REST operation will run the enrichers, validations and matchers for each record, depending on the entity configuration.

It will then return:

The enriched data
a list of validation errors if any
a list of potential matches detected by the matching rules.

The record may be persisted or not at that stage, depending on the persistMode option:

If set to ALWAYS, then the records are persisted even if they have errors and potential matches.
If set to NEVER, then the records are not persisted. Use this option to perform a dry-run to test your records.
If set to IF_NO_ERROR_OR_MATCH (default value), then the records are persisted if no validation error was raised and no potential match was found.

Publishing Deletions using REST

You can use the endpoint to load data, with the DELETE_DATA action, in order to publish golden record deletions.

For more information about record deletion, see the Record Deletion section in the Semarchy xDM Developer’s Guide.

Method

POST

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/loads/[data_location_name]/[load_id or load_name]

Request Payload

The request contains the DELETE_DATA action, as well as the information required to delete data. This information includes:

The deleteOptions, which define how records are deleted:
- The deleteType property defines whether to use HARD_DELETE or SOFT_DELETE.
- The recordType property defines whether to delete GOLDEN or MASTER records.
The deleteRecords, which contain the IDs to identify the records to delete:
- To delete a fuzzy/ID matched golden record or a basic entity record: the golden ID.
- To delete a master record for a fuzzy matched entity: the PublisherID, SourceID pair.
- To delete a master record for an ID matched entity: the PublisherID, <id_attribute> pair.

Submitting other information than these IDs in this property is considered an error.

Delete data: sample request to delete a golden record.

{
  "action":"DELETE_DATA",
  "deleteOptions": {
    "deleteType": "SOFT_DELETE";
    "recordType": "GOLDEN"
    },
  "deleteRecords": {
    "Customer": [
      {
        "CustomerID": "123456"
      }
   ],
  }
}

Response Format

The response contains the status of the request, the load information as well as a list of all the records deleted as part of this request (including child records deleted by a cascade).

Delete data: sample response.

{
  "status": "PERSISTED",   (1)
  "load": {                (2)
  ...
  },
  "records": {
    "Customer": [
    {
      "entityName": "Customer",
      "CustomerID": "123456",
      "status": "DELETABLE" (3)
    },
    {
      "entityName": "Contact",
      "CustomerID": "778899",
      "status": "DELETABLE"
    },
    ...
    ]
  }
}

1	The status is `PERSISTED` if the delete request was persisted, or `PERSIST_CANCELLED`, if the request was not persisted.
2	Load information, similar to the information returned when Querying a Load using REST
3	Record deletion status.

The record deletion status indicates whether deletion is possible or not for the record. It is one of the following:

DELETABLE: Record is deletable and will be if the load is submitted.
PERMISSION_DENIED: Record cannot be deleted due to a privilege issue.
CHILD_RESTRICTION: Record cannot be deleted due to a restriction on a child record.
RECORD_NOT_FOUND: Record is not found and cannot be deleted.
OTHER_ERROR: Another error occurred during deletion.

Submitting a Load using REST

To submit a load, the URL must contain the Load ID that was returned at load creation time.

Method	POST
Base URL	`http://<host>:<port>/semarchy/api/rest/`
URL	`[base_url]/loads/[data_location_name]/[load_id]`
Request Payload	The request contains the `SUBMIT` `action`, as well as the job to use when submitting the data. Load submission: sample request. `{ "action":"SUBMIT", "jobName": "INTEGRATE_DATA" }`
Response Format	The response contains the load information, including the load ID, batch ID, and an indication of the status. Load submission: sample response. `{ "loadId": 27, "loadStatus": "PENDING", "loadCreator": "semadmin", "loadCreationDate": "2019-05-06T13:35:44.259Z", "programName": "curl", "loadDescription": "Load Customers", "loadSubmitDate": "2019-05-06T13:39:02.807Z", "batchSubmitter": "semadmin", "batchId": 24, "integrationJobName": "INTEGRATE_DATA", "integrationJobQueueName": "Default", "numberOfJobExecutions": 0, "submitInterval": -1, "submittable": true, "loadType": "EXTERNAL_LOAD" }`

Method

POST

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/loads/[data_location_name]/[load_id]

Request Payload

The request contains the SUBMIT action, as well as the job to use when submitting the data.

Load submission: sample request.

{
  "action":"SUBMIT",
	"jobName": "INTEGRATE_DATA"
}

Response Format

The response contains the load information, including the load ID, batch ID, and an indication of the status.

Load submission: sample response.

{
  "loadId": 27,
  "loadStatus": "PENDING",
  "loadCreator": "semadmin",
  "loadCreationDate": "2019-05-06T13:35:44.259Z",
  "programName": "curl",
  "loadDescription": "Load Customers",
  "loadSubmitDate": "2019-05-06T13:39:02.807Z",
  "batchSubmitter": "semadmin",
  "batchId": 24,
  "integrationJobName": "INTEGRATE_DATA",
  "integrationJobQueueName": "Default",
  "numberOfJobExecutions": 0,
  "submitInterval": -1,
  "submittable": true,
  "loadType": "EXTERNAL_LOAD"
}

Canceling a Load using REST

To cancel a load, the URL must contain the Load ID that was returned at load creation time.

Method	POST
Base URL	`http://<host>:<port>/semarchy/api/rest/`
URL	`[base_url]/loads/[data_location_name]/[load_id]`
Request Payload	The request contains only the `CANCEL` `action`. Load cancellation: sample request. `{ "action":"CANCEL" }`
Response Format	The response contains load ID as well as an indication of the status. Load cancellation: sample response. `{ "loadId": 28, "loadStatus": "CANCELED", "loadCreator": "semadmin", "loadCreationDate": "2019-05-06T13:41:52.865Z", "programName": "curl", "loadDescription": "Load Customers", "numberOfJobExecutions": 0, "submitInterval": -1, "submittable": true, "loadType": "EXTERNAL_LOAD" }`

Method

POST

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/loads/[data_location_name]/[load_id]

Request Payload

The request contains only the CANCEL action.

Load cancellation: sample request.

{
  "action":"CANCEL"
}

Response Format

The response contains load ID as well as an indication of the status.

Load cancellation: sample response.

{
  "loadId": 28,
  "loadStatus": "CANCELED",
  "loadCreator": "semadmin",
  "loadCreationDate": "2019-05-06T13:41:52.865Z",
  "programName": "curl",
  "loadDescription": "Load Customers",
  "numberOfJobExecutions": 0,
  "submitInterval": -1,
  "submittable": true,
  "loadType": "EXTERNAL_LOAD"
}

Load and Submit Data using REST

Using the REST API, it is possible to create a load, load data (or request deletions) and submit the load in a single request.

Method

POST

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/loads/[data_location_name]

Request Payload

The request contains the CREATE_LOAD_AND_SUBMIT action, as well as the information required to create a new load, load data and submit the load. This includes the persistOptions and persistRecords elements.

Load and Submit: sample request.

{
  "action":"CREATE_LOAD_AND_SUBMIT",
	"programName": "curl",
	"loadDescription": "Customer Load",
	"jobName": "INTEGRATE_DATA",
	"persistOptions": (1)
	"persistRecords": (1)
	"deleteOptions" : (2)
	"deleteRecords" : (2)
}

1	See Configuring Data Loading in REST for more information about the `persistOptions` and `persistRecords`.
2	See [Requesting Deletions using REST] for more information about the `deleteOptions` and `deleteRecords`.

Response Format

The response contains load ID as well as an indication of the status.

Load and Submit: sample response.

{
    "status":  (1)
    "records": (2)
    "load":    (3)
}

1	`PERSISTED` or `PERSIST_CANCELLED`, if the data has not been persisted, for example if a match was found.
2	See Loading Data using REST for more information about this payload.
3	Load information, similar to the information returned when Querying a Load using REST.

Managing a Load using REST

When a load was submitted, it is still possible to manage it using the REST API.

Method	POST
Base URL	`http://<host>:<port>/semarchy/api/rest/`
URL	`[base_url]/loads/[data_location_name]/[load_id]`
Request Payload	The request contains the `SUSPEND`, `KILL` OR `RESUME` `actions` to perform the management operation on the load. Load submission: sample request. `{ "action":"RESUME" }`
Response Format	The response contains the load information and its new state. If the requested operation is not possible, an error is returned.

Method

POST

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/loads/[data_location_name]/[load_id]

Request Payload

The request contains the SUSPEND, KILL OR RESUME actions to perform the management operation on the load.

Load submission: sample request.

{
  	"action":"RESUME"
}

Response Format

The response contains the load information and its new state. If the requested operation is not possible, an error is returned.

Combine this endpoint with the capability to query loads for automating production monitoring, for example to automatically resume jobs that are suspended.