Consume data using SQL
Using SQL to consume data from the data hub is the most effective approach for outbound batch data integration. This method involves accessing tables of the hub. This page explains the various data consumption patterns.
Overview
Consuming data mainly involves:
-
Golden data, which are enriched, consolidated, validated, and certified golden records.
-
Master data, which are linked to the golden records and contain references to the source records.
Using golden data in conjunction with master data allows cross-referencing with source data and reintegrating golden data into source systems.
Consume golden data
Golden data table structure
The table below lists the system columns used when consuming information from the hub. For the complete list of system columns available, see Data hub table structures.
Column name | Data type | Description |
---|---|---|
|
|
ID of the batch that created this record. |
|
|
Class name of this record. |
|
|
Error status of this record. The |
|
|
Submit date (timestamp) of the batch into which the record was created. |
|
|
Submit date (timestamp) of the batch into which the record was updated. |
|
|
Submitter of the batch into which the record was created. |
|
|
Submitter of the batch into which the record was updated. |
|
Access golden data
To access golden data using SQL, you query the GD table. The queries are filtered to access data for one or more entity classes.
The following code sample gives a query to access golden records in the GD table.
select G.* (1)
from GD_<Physical_Table_Name> G (2)
where G.B_CLASSNAME in ( <Classname_List> ) (3)
and (G.B_ERROR_STATUS is null or G.B_ERROR_STATUS <> 'ERROR') (4)
Explanation of the query:
1 | We select all columns from the golden record. You can select specific columns from the GD record. |
2 | The GD table accessed is named after the entity. Replace <Physical Table Name> with the actual physical table name defined for the entity. |
3 | List of entity classes stored in this table that you want to access (e.g., Person , Company ). If you are not using inheritance, only one class is stored in the table, and this clause is not required. |
4 | This filter excludes records with errors. To query errors, see Consume errors. |
CONTACT_ID
, FIRST_NAME
, and LAST_NAME
golden data for the Contact entityselect CONTACT_ID, FIRST_NAME, LAST_NAME from GD_CONTACT G where G.B_CLASSNAME = 'Contact'
Golden data primary key
The primary key for a golden data table depends on the entity type and ID generation method for the entity. A column named after the physical column name of the primary key attribute stores the golden record primary key for the GD table, with the following datatypes:
-
For ID generation using a sequence:
NUMBER(38,0)
. -
For ID generation using UUID:
RAW
. -
For manually generated IDs, the datatype matches the one defined for the primary key attribute.
Consume master data
To access master data from fuzzy- and ID-matched entities, we use queries similar to those used to access the golden data.
Starting from golden data, master data can be referenced using the golden record primary key. The master data table (MD) includes a column that references this primary key. As the master data also stores the primary key of the publisher, it is possible to refer back to the source data from golden data via master data.
Basic entities do not have master data. Only source data, source errors, and golden data are stored for this type of entity. |
Access to master data varies depending on the entity type (i.e., ID- or fuzzy-matched).
ID-matched entity
With ID-matched entities, the master data table has a structure similar to the golden data table, with the publisher code in addition. The same primary key is stored in the golden and master data within a column named after the physical column name of the primary key attribute (<primary_key_column>
in the sample below).
select M.B_SOURCEID, G.*
from MD_<Physical_Table_Name> M inner join GD_<Physical_Table_Name> G on (
G.<primary_key_column> = M.<primary_key_column>
and G.B_CLASSNAME in ( <classname_list> )
)
where
M.B_CLASSNAME in ( <classname_list> )
and M.B_PUBID = '<publisher_code>'
and M.<primary_key_column> = '<searched_source_id>'
In this code sample, access is filtered with the class name for both the golden and master data (lines #5 and #8). The two tables are joined on their common primary key (line #4). In addition, master data is filtered by a source publisher (line #9) and the ID of the source record (line #10).
EMPLOYEE_NUMBER=100
in the HR
system (ID-matched entity)select M.EMPLOYEE_NUMBER, G.FIRST_NAME, G.LAST_NAME from MD_EMPLOYEE M inner join GD_EMPLOYEE G on (
G.EMPLOYEE_NUMBER = M.EMPLOYEE_NUMBER
and G.B_CLASSNAME = 'Employee'
)
where
M.B_CLASSNAME = 'Employee'
and M.B_PUBID = 'HR'
and M.EMPLOYEE_NUMBER = '100'
The resulting information can be used to update the source record with golden data.
Fuzzy-matched entity
With fuzzy-matched entities, the master data table has a structure similar to the golden data table. It contains a reference to the golden data primary key, but the master data primary key consists of two columns:
-
B_PUBID
(VARCHAR2(30 CHAR)
) contains the code of the publisher that published this record. -
B_SOURCEID
(VARCHAR2(128 CHAR)
) contains the ID of the source record in that publisher
As a consequence, the link between golden and master records is established via the primary key column as in an ID-matched entity. However, the link to the source is accomplished using the B_SOURCEID
column.
select M.B_SOURCEID, G.*
from MD_<entity> M inner join GD_<entity> G on (
G. <primary_key_column> = M. <primary_key_column>
and G.B_CLASSNAME in ( <classname_list> )
)
where
M.B_CLASSNAME in ( <classname_list> )
and M.B_PUBID = '<publisher_code>'
and M.B_SOURCEID = '<searched_source_id>'
In this code, the golden and master data tables are joined on their golden record primary key (line #4). However, the master data is restricted by the source publisher (line #9) and the ID of the source record (line #10), using the B_SOURCEID
column.
27030
in the CRM
system (fuzzy-matched entity)select M.B_SOURCEID, G.FIRST_NAME, G.LAST_NAME from MD_CONTACT M inner join GD_CONTACT G on (
G.CONTACT_ID = M.CONTACT_ID
and G.B_CLASSNAME = 'Contact'
)
where
M.B_CLASSNAME = 'Contact'
and M.B_PUBID = 'CRM'
and M.B_SOURCEID = '27030'
Select the duplicates identified for a specific Contact source record with the ID 27030
in the CRM
system (fuzzy-matched entity). In this example, the master data table is used twice (aliased as M
and MM
) to retrieve both sides of a duplicate pair.
select
M.B_PUBID DUP1_PUBLISHER, M.B_SOURCEID DUP1_ID, M.FIRST_NAME DUP1_FIRST_NAME, M.LAST_NAME DUP1_LAST_NAME,
MM.B_PUBID DUP2_PUBLISHER, MM.B_SOURCEID DUP2_ID, MM.FIRST_NAME DUP2_FIRST_NAME, MM.LAST_NAME DUP2_LAST_NAME,
G.CONTACT_ID GOLD_ID, G.FIRST_NAME GOLD_FIST_NAME, G.LAST_NAME GOLD_LAST_NAME from MD_CONTACT M inner join GD_CONTACT G on
(
G.CONTACT_ID = M.CONTACT_ID
and G.B_CLASSNAME = 'Contact'
)
inner join MD_CONTACT MM on
(
MM.CONTACT_ID = M.CONTACT_ID
and MM.B_CLASSNAME = 'Contact'
)
where
M.B_CLASSNAME = 'Contact'
and M.B_PUBID = 'CRM' /* Publisher ID */
and M.B_SOURCEID = '27030' /* Source ID */
/* and M.B_PUBID = MM.B_PUBID */
/* Uncomment the previous line to restrict the duplicates
to those within the CRM application */
Consume errors
Pre-consolidation errors
Pre-consolidation errors can be accessed via the source error tables (SE, or AE for basic entities).
These tables store error details—that is, information about the constraints that caused the records to fail the validation. This information is stored in the B_CONSTRAINTNAME
(name of the constraint) and B_CONSTRAINTTYPE
(type of constraint) columns.
SE tables do not store the erroneous data itself, but they contain the identifiers to the source (SD or SA for basic entities) records in error.
The B_ERROR_STATUS column on the SD table offers a straightforward means to identify records with errors.
|
The queries below illustrate how to select the source errors for the Contact entity. In this example, incorrect foreign references would appear. To identify them, we retrieve the incorrect referenced IDs from FP_CUSTOMER
and FS_CUSTOMER
.
select * from SD_CONTACT where B_ERROR_STATUS = 'ERROR';
select SE.B_BATCHID, SD.B_LOADID,
SE.B_CONSTRAINTNAME, SE.B_CONSTRAINTTYPE,
SE.B_PUBID, SE.B_SOURCEID,
SD.FIRST_NAME, SD.LAST_NAME,
SD.FP_CUSTOMER, SD.FS_CUSTOMER from SE_CONTACT SE, SD_CONTACT SD
where
SE.B_CLASSNAME = 'Contact'
and SD.B_CLASSNAME = 'Contact'
and SD.B_PUBID = SE.B_PUBID
and SD.B_SOURCEID = SE.B_SOURCEID
and SD.B_LOADID = SE.B_LOADID
and SD.B_ERROR_STATUS = 'ERROR';
Post-consolidation errors
Post-consolidation errors can also be accessed via the golden error (GE) tables.
These tables store error details—that is, information about the constraints that caused the records to fail validation. This information is stored in the B_CONSTRAINTNAME
(name of the constraint) and B_CONSTRAINTTYPE
(type of constraint) columns.
GE tables do not store the erroneous data directly but contain identifiers to the golden (GD) records in error.
The B_ERROR_STATUS column on the GD table offers a straightforward means to identify records with errors.
|
The queries below illustrate how to select errors and data for Contact golden records in error.
select * from GD_CONTACT where B_ERROR_STATUS = 'ERROR';
select GE.B_BATCHID,
GE.B_CONSTRAINTNAME, GE.B_CONSTRAINTTYPE,
GE.CONTACT_ID,
GD.FIRST_NAME, GD.LAST_NAME, GD.F_CUSTOMER from GE_CONTACT GE, GD_CONTACT GD
where
GE.B_CLASSNAME = 'Contact'
and GD.B_CLASSNAME = 'Contact'
and GD.CONTACT_ID = GE.CONTACT_ID
and GD.B_BATCHID = GE.B_BATCHID
and GD.B_ERROR_STATUS = 'ERROR';