This tutorial will guide you through the process of querying data in Semarchy xDM using the SQL API. You will learn how to run basic and advanced queries, as well as important integration concepts.
This tutorial is based on PostgreSQL and pgAdmin 4. You can run the same queries on SQL Developer for Oracle or SQL Server Management Studio for SQL Server, or use a universal client such as DBeaver.
This tutorial is the second SQL-based unit within the Data Publishing & Consumption track. Before beginning this unit, you must:
If you have not completed these prerequisites, return to the Tutorials menu.
Otherwise, enjoy this tutorial!
Knowing how to query data using SQL is a critical step to building a successful MDM program.
While you ordinarily would rely on a middleware tool or a user-designed program to consume data from xDM, it is still important for you, as the integration developer, to understand the consumption methods described in this tutorial unit, starting with simple queries.
The most basic query selects Person records from the golden data (GD) table. You will also select specific columns which is a more useful query for analyzing data.
GD_PERSON
) table:select gd.* from gd_person gd;
These are all the customer golden records, with all available columns.
ID
, first_name
, and last_name
columns:select gd.id, gd.first_name, gd.last_name from gd_person gd;
These are all the customer golden records with only the golden record ID
, first_name
, and last_name
columns.
When looking at matched customer records, it is useful to analyze groups of master records alongside the resulting golden record, in order to compare how the master records were deduplicated. To get this view, you must join the master data (MD) table with the golden data (GD) table:
select md.b_pubid publisherid, md.b_sourceid sourceid, md.first_name masterfirstname, md.last_name masterlastname, gd.id goldenid, gd.first_name goldenfirstname, gd.last_name goldenlastname from md_person md inner join gd_person gd on gd.id = md.id order by gd.id asc;
The join is on the golden ID, which appears on both the master and golden tables.
Next, you will get a golden record given the source ID.
At times, you might need to look up a golden record, but all you have is an ID from the source system. To query the golden records when you know their source ID, proceed as follows:
select md.b_pubid publisherid, md.b_sourceid sourceid, md.first_name masterfirstname, md.last_name masterlastname, gd.id goldenid, gd.first_name goldenfirstname, gd.last_name goldenlastname from md_person md inner join gd_person gd on ( gd.id = md.id ) where md.b_pubid = 'CRM' and md.b_sourceid = '1483060'; /* SourceID */
sourceId
column contains the source ID which allows us to retrieve the corresponding golden record.goldenId
, goldenFirstName
, and goldenLastName
are from the related golden record.You have successfully run basic queries via SQL to get information in xDM.
In this section, you created:
Next, you will execute advanced queries to retrieve information, such as customer records and the products they purchased.
Now that you are familiar with basic SQL queries, this section will delve into advanced queries to expand the scope of analyses you can perform.
You can query customers and see the products they have purchased. This is useful for analytics and BI tools to answer questions like "Which customers have purchased a certain product?" or "Who are our top-paying customers?"
select pe.id, pe.first_name, pe.last_name, pr.product_name from gd_person pe inner join gd_person_product pp on pe.id = pp.f_person inner join gd_product pr on pr.id = pp.f_product order by pe.first_name desc;
Next, you will query master records to see the corresponding duplicate masters.
In this section, you will learn to query a specific master record and see the corresponding master records that are considered duplicates side-by-side. The query will also show the corresponding golden record that the master records consolidate into.
select md.b_pubid dup1_publisher, md.b_sourceid dup1_id, md.first_name dup1_first_name, md.last_name dup1_last_name, md2.b_pubid dup2_publisher, md2.b_sourceid dup2_id, md2.first_name dup2_first_name, md2.last_name dup2_last_name, gd.id gold_id, gd.first_name golden_first_name, gd.last_name golden_last_name from md_person md inner join gd_person gd on ( gd.id = md.id ) inner join md_person md2 on ( md.id = md2.id ) where md.b_pubid = 'CRM' /* Publisher ID */ and md.b_sourceid = '1419728' /* Source ID */ /* and md.b_pubid = md2.b_pubid */ /* uncomment this line to restrict the duplicates to those within the CRM application */;
dup1_
columns come from the master record that you filtered on.dup2_
columns come from the corresponding master records identified as duplicates to the record that you filtered on.golden_
columns come from the golden record that the master records roll up to.dup2_publisher
and dup2_id
columns show the master records that matched with the record of Antonia Mattos with ID 1419728 from CRM.You have successfully run advanced queries via SQL to get information in xDM.
To summarize, you have created:
Next, you will learn how integration works to understand the concepts to allow you to write your own queries.
Now that you have explored a data location with several SQL queries, it is time to learn more about some integration concepts.
When you load data into xDM, we refer to the source applications where the data came from as publishers.
An application that publishes data into xDM is assigned a publisher code in the Application Builder. This code is known as the publisher ID and is loaded into the column b_pubid
.
The publisher ID is necessary for loading data. It is also useful when you query data to trace which applications the master data came from. xDM displays the publisher ID in the master record's ID in the Customer B2C Demo application.
Semarchy xDM deals with several types of records that correspond to different steps of the integration process:
The transformation process from source records to golden data within Semarchy xDM is called the data certification process or integration process.
Matching is a key step within the data certification process. Its role is to detect duplicates to consolidate them into a golden record. There are two different types of matching:
When records are matched together, survivorship rules select for a given attribute the most appropriate value from the various source records.
Each entity you design in the Application Builder has a given entity type that defines the entity's capabilities for matching, merging, and authoring. Entity types are:
You have gained a better understanding of fundamental integration concepts in xDM.
Next, we will have a more detailed look at each type of entity.
If you have two records with the same ID in different systems, duplicate data within the same system (which is common with customer data), or no common identifier across systems, you will need to match records and consolidate them to become a "best-of-breed" golden record.
A fuzzy-matching entity is designed to handle the scenario where there is no common identifier across systems. A golden record ID will then be generated when records are consolidated. As such, you will define match and survivorship rules with a fuzzy-matching entity.
We will now look at how primary keys work in fuzzy-matching entities.
The primary key is useful for filtering data in your queries, as you learned when querying golden records based on a given source ID.
b_sourceid
column) and publisher code (b_pubid
column).ID
column.Here is an example of a fuzzy-matching entity:
GD_PERSON
table:select gd.* from gd_person gd;
The golden data customer table (GD_PERSON
) contains the golden record ID (ID
column).b_pubid
and b_sourceid
are the publisher ID and source ID which won the survivorship process.
b_mastercount
is the number of master records underlying the golden record.b_confscore
is the confidence score of the matching and b_confirmationstatus
indicates whether master records have been confirmed within a golden record or not.
These are the golden data fields of the values that have won the survivorship process.
MD_PERSON
:select md.* from md_person md;
Observe that the master data table stores the publisher and source IDs as primary keys.
ID
column: this column provides the golden record ID associated with the master record.Also note that the MD table stores information about the matching process, such as match group (b_matchgrp
), confirmation status (b_confirmationstatus
), confidence score (b_confscore
), and suggested merge ID (b_suggmergeid
).
/* golden ID for Callie Aldaba */
with the ID you retrieved in the Customer B2C Demo application (10039
in our example):select gd.id golden_id, gd.first_name golden_first_name, gd.last_name golden_last_name, md.b_pubid md_pub_id, md.b_sourceid md_sourceid, md.first_name md_first_name, md.last_name md_last_name from md_person md inner join gd_person gd on ( gd.id = md.id ) where gd.id=10039 /* Golden ID */
This query returns the same master records as the ones observed using the user interface:
Because fuzzy-matching entities separate the publisher and source IDs in their primary keys, the references to fuzzy-matching entities, likewise, have foreign keys that separate publisher and source when it comes to source data.
Consider the following example, featuring the SD_COMM_CHAN_PREF
table: the foreign key to Person source data in the SD_COMM_CHAN_PREF
table is composed of two columns:
FP_PERSON
: references the customer source/master publisher ID.FS_PERSON
: stores the source ID.For golden data, the foreign key references the xDM system-generated ID and therefore uses a single column for the foreign key. For example in SD_COMM_CHAN_PREF
, F_PERSON
references the Customer golden record ID.
To illustrate that, we will start by reviewing references among entities.
GD_COMM_CHAN_PREF
in pgAdmin.GD_COMM_CHAN_PREF
table, scroll to the far right.The F_PERSON
foreign key is the parent Person golden record ID. In the same way, FP_PERSON
is the parent publisher ID and FS_PERSON
is the parent source ID.
GD_PERSON_PRODUCT
. Remember that PersonProduct is a child of the fuzzy-matching entity Person and basic entity Product.You can see that it contains both Product and Person foreign keys.
F_PRODUCT
, FP_PERSON
, and FS_PERSON
.ID-matching entities assume that data comes from several applications that share a common ID.
Records in entities using ID matching are matched if they have the same ID and then merged into golden records. For such entities, the golden record ID is the common source ID (no ID is generated).
This entity type is well suited when there is a truly unique identifier for all the applications communicating with the MDM hub.
You have gained a better understanding of ID- and fuzzy-matching entities, along with insights into how they function in xDM.
Next, you will learn about basic entities.
In contrast to fuzzy-matching entities, basic entities are designed to handle data coming from a unique data source and thus do not support match and merge. This is suitable for simple reference data entities or when data is authored exclusively in the hub. In the latter case, xDM serves as a source system for authoring data.
Querying data in a basic entity is simpler than in a fuzzy-matching entity because the ID is the same across the source and golden (SD and GD) tables. Also, basic entities do not have the concept of a publisher or MD/MI tables. Therefore, you can use the same identifier in the source system to query basic entity records in the SD and GD tables and bypass joining with the master data (MD) table.
SA_NICKNAME
.SD
and SA
for basic entities) have a load ID to identify the load job.sa_product
, another basic entity.SA_PRODUCT
uses the default ID physical name for its primary key. The product ID in this table remains the same as in the golden data GD_PRODUCT
table.
We will now see how the basic entity primary key is configured in Semarchy xDM Application Builder.
With the Physical Column Name property, you can control the name of the primary key in the database. The Manual option relates to the ID generation method and means that you must load the ID and that you will not rely on xDM to generate it.
Notice that the entity type (basic, fuzzy-matching, or ID-matching) can be selected on this page and that Product type is a basic entity.
The column name of the primary key for the Product entity is ID (default name).
Also observe that the ID can be entered manually or generated using a sequence, a universal unique identifier (UUID), or a custom rule via a SemQL expression. The ID generation method is selected when the entity is created.
select gd.id golden_id, gd.product_name golden_product_name, gd.description golden_desc, sa.id source_id, sa.product_name sa_product_name from sa_product sa inner join gd_product gd on ( gd.id = sa.id ) where gd.id='CARBON4861OK' /* Golden ID */
golden_id
and source_id
values are the same.The demo data assumes that product IDs are consistent across all systems and, therefore, are suitable for a basic entity. This same ID flows from the source system to the Source Authoring
(SA) table and eventually to the Golden Data
(GD) table.
In contrast to fuzzy-matching entities, basic entities do not separate the publisher and source IDs, neither at the source nor at the golden record level. Therefore, foreign keys that reference a basic entity use that parent entity's original foreign key.
For example, the SA_PERSON_PRODUCT
table references both a fuzzy-matching entity (customer) and a basic entity (product):
FP_PERSON
FS_PERSON
F_PRODUCT
: references the original product ID because the product ID is a universal identifier.You have successfully learned the basics about entity types in Semarchy xDM.
Here is a cheat sheet to remind you of the differences between fuzzy-matching, ID-matching, and basic entities:
This cheat sheet may come in handy to understand which entity type to use during development time, when you are building your model. It will also help you remember which tables are available and how to use primary keys when you query data from Oracle or PostgreSQL databases.
Next, you will learn to query errors that were raised during the integration process.
Two types of errors can be raised during the integration process:
The SE, AE, and GE tables serve as error queues and inform you why records failed integration. xDM saves the name of the constraint in b_constraintname
and the type of the constraint in the b_constrainttype
columns.
You can view pre-consolidation errors via the source error (SE) table.
select * from se_person;
b_constraintname
indicates the cause of the error: the records are in the error queue because emails are missing.SE_PERSON
view with the SD_PERSON
.select * from sd_person where b_error_status = 'ERROR';
b_error_status
column, which provides a simple way to detect records with errors.The SD_PERSON
table flags records as errors and provides customer information to guide troubleshooting. However, there is no information explaining why the records are in the error queue.
In this section, you will learn how to join the SE and SD tables to view data, alongside constraint violation.
SD_PERSON
and SE_PERSON
tables to view the reason for errors and the customer information required for troubleshooting.select se.b_batchid, sd.b_loadid, se.b_constraintname, se.b_constrainttype, se.b_pubid, se.b_sourceid, sd.first_name, sd.last_name, sd.source_email from se_person se inner join sd_person sd on ( sd.b_pubid = se.b_pubid and sd.b_sourceid = se.b_sourceid and sd.b_loadid = se.b_loadid ) where sd.b_error_status = 'ERROR';
These records are in the error queue due to missing email in the source_email
column. A validation rule in the CustomerB2CDemo model requires all customers to have an email address, leading to these records being identified as errors.
You also obtain business-relevant fields to aid troubleshooting.
You successfully queried xDM to check for errors and their causes using the SQL API.
Great job! You are now familiar with different types of queries that will prove useful in your tasks involving the SQL API.
In the next unit of the Data Publishing & Consumption track, Load data via SQL, you will learn how to load data in xDM via the SQL API. You will learn basic and advanced loading queries as well as more integration concepts, such as how the integration process works.
To explore other resources, return to the Tutorials menu.
Thank you for completing this tutorial.