Data hub table structures
This page describes the general structure of the tables involved in the certification process.
Tables
For each entity, a predefined set of tables is created. These tables are named with the following convention: <Prefix>_<Physical Table Name>
The prefix
is automatically generated depending on the table nature (source data, source error, etc.), and the Physical Table Name
is set in the entity definition.
The following list describes the tables created for a given entity.
Table | Name | Applicable entity type | Access | Description |
---|---|---|---|---|
|
Source Data |
Fuzzy- and ID-matched |
Read/Write |
This table contains source data loaded by publishers, using the middleware. |
|
Source Authoring |
All |
Read (Write for basic entities) |
For matching entities, this table contains both the records created in the hub by the user as well as possible overrides performed on consolidated records. In that case, it is associated with an |
|
Source Errors |
All |
Read |
This table contains the errors detected during the pre-consolidation validation phase. It is named |
|
Duplicates |
Fuzzy-matched |
Read |
This table stores references to pairs of records detected as duplicates in the matching phase, with the match rule and match score. |
|
User Matches |
Fuzzy-matched |
Read/Write |
This table contains the master records that have been manually re-grouped by users using duplicate managers. You can use this table to perform similar operations programmatically. For examples of duplicate management operations you can perform using this table, see Manage duplicate records using SQL. |
|
Master Integration |
Fuzzy- and ID-matched |
Internal |
This table contains the records that have passed the pre-consolidation validation phase and that have been grouped by the matching phase. |
|
Master Data |
Fuzzy- and ID-matched |
Read |
This table contains the master records, which are the enriched and validated source records that have participated in the creation of golden records. This table can be accessed to review groups/duplicates or to refer to the source data. |
|
Deleted Master |
Fuzzy- and ID-matched |
Read |
This table contains the deleted master records logs, plus the deleted data for soft deletes. |
|
Master History |
Fuzzy- and ID-matched |
Read |
This table contains the master records history. |
|
Golden Integration |
Fuzzy- and ID-matched |
Internal |
This table contains the records that have been consolidated but not yet passed through the post-consolidation validation phase. |
|
Golden Authoring |
Fuzzy- and ID-matched |
Internal |
This table contains data overrides performed on golden records. It is associated with a |
|
Deleted Golden |
All |
Read |
This table contains the deleted golden records logs, plus the deleted data for soft deletes. |
|
Golden History |
All |
Read |
This table contains the golden records history. |
|
Golden Data |
All |
Read |
This table contains the golden records, including those flagged as erroneous by post-consolidation validations. |
|
Golden Errors |
Fuzzy- and ID-matched |
Read |
This table contains the errors detected during the post-consolidation validation phase on golden records. |
For publishing and consuming data, internal tables are not needed. The structure of these tables is not explained on this page.
Columns
Attribute columns
Attributes appear in the tables' structure as follows:
-
Attributes using lists of values, built-in and user-defined types are mapped to single columns with a database type and length corresponding to the attribute type. The column name for such a simple attribute is the Physical Column Name value specified in the simple attribute definition.
-
Complex attributes are mapped on columns named with the following convention:
<Complex Attribute Physical Prefix><Definition Attribute Physical Column Name>
For example, the Country (COUNTRY
) and City (CITY
) definition attributes of the Geocoded Address complex attribute (prefix: GEO
) will be mapped to columns named GEOCOUNTRY
and GEOCITY
.
Built-in columns
In addition to the attribute columns, built-in columns are added to the table’s structure and are used in the certification process.
They track for example:
-
Batch information:
B_BATCHID
-
Cause of a rejected record:
B_CONSTRAINTNAME
,B_CONSTRAINTTYPE
-
Class information:
B_CLASSNAME
-
Match/Merge information:
B_MATCHGRP
,B_CONFSCORE
,B_HASSUGGMERGE
,B_SUGGMERGEID
,B_SUGGMERGECONFSCORE
,B_SUGGMERGEMASTERSCOUNT
,B_CONFIRMATIONSTATUS
,B_MASTERSCOUNT
,B_ISCONFIRMED
-
Delete operation information:
B_DELETETYPE
,B_DELETEDATE
,B_DELETEAUTHOR
,B_DELETEOPERATION
,
The following list describes these columns.
Column name | Exists in tables | Datatype | Description |
---|---|---|---|
|
GA, SA |
|
For matched entities, type of authoring operation:
|
|
AE, GA, GD, GE, GF, GI, GP, GX, MD, MI, MX, SE |
|
ID of the batch into which the new data, data changes, overrides or duplicate decisions were applied, or during which errors were detected. |
|
UM |
|
Cause that made the record part of the duplicate management transaction. Possible causes are:
|
|
AE, GA, GD, GE, GF, GH, GI, GP, GX, MD, MH, MI, MX, SA, SD, SE, SF, UM |
|
Unique name of class/entity to which this record belongs. For more information, see Class name. |
|
GD, GH, GI, GX, MD, MH, MI, MX, UM |
|
Confirmation status for duplicate management:
|
|
MD, MI, UM |
Varies |
ID of the confirmed golden record to which this master record is attached. |
|
GD, GH, GI, GX, UM |
|
Confidence score of the golden record. It is the average of the match scores in the match group. |
|
GD, GH, GI, GX, UM |
|
Algorithm used to compute the confidence score:
|
|
AE, GE, SE |
|
For error records, name of the constraint causing the error. |
|
AE, GE, SE |
|
For error records, type of the constraint causing the error. |
|
SA |
|
ID of the original record copied into this record. |
|
DU, GA, GD, GH, GI, GX, MD, MH, MI, MX, SA, SD, UM |
|
For golden data (GD, GI, GX, GH), the submitter of the batch into which the record was created. For source and master data, the user who has created the record in a workflow, stepper, duplicate manager, or a user name loaded by the data integration process. |
|
DU, GA, GD, GH, GI, GX, MD, MH, MI, MX, SA, SD, UM |
|
For golden and master data, the submit timestamp of the batch into which the record was created. For source data (SD, SA, UM), the timestamp at which the source record was created. |
|
GX, MX, SA, SD |
|
Author of the delete operation. |
|
GX, MX, SA, SD |
|
Deletion timestamp of a record |
|
GX, MX, SA, SD |
|
Delete operation ID. |
|
GX, MX, SA, SD |
|
Cascade path through which the record was reached during a delete. Null for record directly selected for deletion. |
|
GX, MX, SA, SD |
|
Delete type ( |
|
GD, GH, GX, SA, SD |
|
Error status of a record. This value indicates whether the source or golden record has passed successfully or not validations. Possible values are:
|
|
GH, MH |
|
Batch at which the history record was created. |
|
GD, GH, GX |
|
For fuzzy-matching and ID-matching entities, indicates the type of golden record—i.e., whether the golden record was created and authored only in the data hub ( |
|
GD, GH, GX |
|
For fuzzy-matching and ID-matching entities, this flag ( |
|
GD, GH, GI, GX, MD, MH, MI, MX, UM |
|
Flag ( |
|
GD, GH, GI, GX |
|
Flag ( |
|
AE, SA, SD, SE, SF, UM |
|
Load identifier used as the unique transaction ID for external application pushing data to the platform |
|
GD, GH, GI, GX, UM |
|
Number of master records contributing to the golden record. |
|
MD, MH, MI, MX |
|
ID of the match group for the master record. This column is set when matching takes place. |
|
DU |
|
Match rule having caused the pair of records to match. |
|
DU |
|
Score of the matched pair. |
|
MI |
|
Previous identifier of the match group for the master record. This column is deprecated. |
|
MD, MH, MI, MX |
Varies |
ID of the previous golden record the master record was attached to. |
|
SA, SD, UM |
|
Batch identifier of the record when it was originally edited out in a stepper or a duplicate manager. |
|
UM |
|
Original confirmation status in a duplicate management operation. |
|
UM |
Varies |
ID of the original confirmed golden record to which the master record was attached. |
|
UM |
|
Confidence score of the original golden in a duplicate management operation. |
|
UM |
|
Algorithm used to compute the original confidence Ssore:
|
|
UM |
|
Number of master records in the original golden in a duplicate management operation. |
|
UM |
Varies |
ID of the original golden record to which the master record was attached. |
|
UM |
|
Original exclusion group to which the master record belonged. |
|
DU, GD, GH, GI, GX, MD, MH, MI, MX, SD, SE, UM |
|
For matching entities, code of the publisher that published the record. |
|
DU |
|
For duplicate pairs, code of the publisher that published the second record. |
|
DU, GD, GH, GI, GX, MD, MH, MI, MX, SD, SE, UM |
|
ID of the source record in the source publisher system (fuzzy-matched entities only). See below for a detailed explanation of the primary key options. |
|
DU |
|
For duplicate pairs, ID of the second source record in the source publisher system (fuzzy-matched entities only). |
|
GD, GH, GI, GX, MD, MH, MI, MX, UM |
|
Confidence Score for the suggested match group. |
|
GD, GH, GI, GX, MD, MH, MI, MX, UM |
|
Algorithm used to compute Confidence Score for the suggested match group:
|
|
GD, GH, GI, GX, MD, MH, MI, MX, UM |
|
ID of the merge suggested by the automated matching. |
|
GD, GH, GI, GX, MD, MH, MI, MX, UM |
|
Number of master records in the suggested merge. |
|
GH, MH |
|
Batch at which history record stopped being current or null if the records are still current. |
|
DU, GA, GD, GH, GI, GX, MD, MH, MI, MX, SA, SD, UM |
|
For golden data (GD, GI, GX, GH), the submitter of the batch into which the record was updated. For source and master data, the latest user who has updated the record in a workflow, stepper, duplicate manager or the user name loaded by the data integration process. |
|
DU, GA, GD, GH, GI, GX, MD, MH, MI, MX, SA, SD, UM |
|
For golden and master data, the submit timestamp of the batch into which the record was updated. For source data (SD, SA, UM), the timestamp at which the source record was created or updated. |
|
MD, MH, MI, MX, UM |
|
Exclusion group ID. An exclusion group represents a group of records for which a user has made split decisions. |
Primary key columns
The primary key to load depends on the entity type.
Basic entities
For basic entities, Semarchy xDM uses a single identifier, stored in a column named after the physical column name specified in the primary key attribute definition. This column will exist in all tables. The identifier is simply propagated into the hub from the source records.
ID-matched entities
When using ID matching, Semarchy xDM assumes a common identifier across all systems. In this case, this common identifier is stored in a column named after the physical column name specified in the primary key attribute definition.
This column will exist in all tables. When publishing data into the hub, the middleware loads this column with the primary key from the publishing system. This identifier is simply propagated into the hub, and matching is done using this primary key.
Fuzzy-matched entities
When using fuzzy matching, Semarchy xDM assumes no common identifier across publishers. There may be two different records with the same ID in different systems. There is a need to match the records and consolidate them under a golden record having a primary key generated by the system.
In this case, the source identifier is stored in the column named B_SOURCEID
, along with the publisher code stored in the B_PUBID
.
This is the case for the SD, SE and MD tables.
When publishing data into the hub, the middleware loads this B_SOURCEID
column with a primary key value from the publishing system.
If this primary key is a composite key in the source system, all the columns of this composite key must be concatenated into B_SOURCEID
.
When the records are consolidated in a golden record (GD and GE tables), a system-defined primary key is generated and stored in a column named after the physical column name specified in the primary key attribute definition. This key is referred to as the golden record ID.
The MD table makes the bridge between the source ID and the golden record ID, as it contains both these values.
Reference columns
When a reference exists in the source publisher and needs to be expressed in the landing table, this reference mapping in the table structure depends on the entity type of the referenced entity.
Reference to a basic entity
For a reference to a basic entity, the referenced key stored is simply the primary key of the basic entity.
As a consequence, the referenced value is stored in a single column. This column is named after the physical name provided in the reference definition and is prefixed with F_
. For example, F_COUNTRIES
.
Reference to an ID-matched entity
For a reference to an ID-matched entity, the referenced key is the same for all systems. As a consequence, the referenced value is stored in a single column. This column is named after the physical name provided in the reference definition and is prefixed with F_
(e.g., F_EMPLOYEE
).
For example, if the Customer entity references Employee and this entity uses ID matching, SD_CUSTOMER
will contain the following information.
CUSTOMER_NAME | F_EMPLOYEE |
---|---|
Gadgetron |
11 |
Roxxon |
56 |
This information means that the customer Gadgetron
references employee number 11, and Roxxon
references employee number 56. This employee number is the same in all systems.
Reference to a fuzzy-matched entity
For a reference to a fuzzy-matched entity, the referenced may point to a master record from a publisher, or directly to a golden record in the hub:
-
For a reference to a master record:
-
The referenced value is stored in a column named after the physical name provided in the definition of the reference and prefixed with
FS_
(for foreign source ID). For example,FS_CUSTOMER
. -
The referenced publisher is also stored, in a column named after the physical name provided in the definition of the reference and prefixed with
FP_
(for foreign publisher). For example,FP_CUSTOMER
.
-
-
For a reference to a golden record:
-
The reference value is stored in a column named after the physical name provided in the definition of the reference and prefixed with
F_
(for foreign ID).
-
The Contact entity references Customer and this entity uses fuzzy matching, SD_CONTACT
contains the following information.
FIRST_NAME | LAST_NAME | FP_CUSTOMER | FS_CUSTOMER |
---|---|---|---|
John |
Doe |
CRM |
1235 |
Jane |
Smith |
MKT |
A3251 |
This information means that the contact John Doe
references the customer with the primary key 1235
in the CRM
publisher, and that Jane Smith
references the customer with the primary key A3251
in the MKT
publisher.
Contact references Customer and this entity uses fuzzy matching, SD_CONTACT
contains the following information.
FIRST_NAME | LAST_NAME | F_CUSTOMER |
---|---|---|
John |
Doe |
6598 |
Jane |
Smith |
6556 |
This information means that the contact John Doe
references the customer golden record with the golden record ID 6598
, and that Jane Smith
references the customer golden record with the golden record ID 6556
.
If both the reference to the master and golden records are loaded, only the reference to the golden record is taken into account. |
To clear a reference, you must push a null value in the three FS_ , FP_ and F_ columns.
|
Class name
Several entities involved in an inheritance relationship have their data stored in the same set of tables. These tables store the superset of the attributes of the parent and all its child entities. The B_CLASSNAME
column is used to identify the class (entity) of a record in a table.
For example, when Person and Company inherit from the Party entity, the resulting table is named after the parent entity (i.e., Party), and will contain all the attributes of Person and Company as well. In the GD_PARTY
table, records representing persons will have B_CLASSNAME='Person'
. When publishing person or company information in the SD_PARTY
table, the middleware must set B_CLASSNAME='Person'
or B_CLASSNAME='Company'
accordingly.
Constraints
The only constraints physically enforced in the tables are the primary keys and the not null columns. These constraints apply to system columns only.
For example, on an SD_
table, the following constraints are enforced:
-
primary key on
B_LOADID
,B_PUBID
,B_SOURCEID
(or the primary key column for ID-matched entities) -
B_LOADID
,B_CLASSNAME
,B_PUBID
andB_SOURCEID
(or the primary key column for ID-matched entities) are not null
Other constraints defined in the model (mandatory attributes, references, etc.) are not enforced in the physical model, but verified during the validation phase.