Consume data using SQL

Using SQL to consume data from the data hub is the most effective approach for outbound batch data integration. This method involves accessing tables of the hub. This page explains the various data consumption patterns.

Overview

Consuming data mainly involves:

  • Golden data, which are enriched, consolidated, validated, and certified golden records.

  • Master data, which are linked to the golden records and contain references to the source records.

Using golden data in conjunction with master data allows cross-referencing with source data and reintegrating golden data into source systems.

Consume golden data

Golden data table structure

The table below lists the system columns used when consuming information from the hub. For the complete list of system columns available, see Data hub table structures.

Column name Data type Description

B_BATCHID

NUMBER(38,0)

ID of the batch that created this record.

B_CLASSNAME

VARCHAR2(128 CHAR)

Class name of this record.

B_ERROR_STATUS

VARCHAR2(30 CHAR)

Error status of this record. The ERROR value indicates a golden record that has failed post-consolidation validation.

B_CREDATE

TIMESTAMP(6)

Submit date (timestamp) of the batch into which the record was created.

B_UPDDATE

TIMESTAMP(6)

Submit date (timestamp) of the batch into which the record was updated.

B_CREATOR

VARCHAR2(128 CHAR)

Submitter of the batch into which the record was created.

B_UPDATOR

VARCHAR2(128 CHAR)

Submitter of the batch into which the record was updated.

  • B_ERROR_STATUS: this value indicates whether the golden record has successfully passed post-consolidation validations. Adding a filter on this column allows to include or exclude error records.

  • B_CREDATE, B_UPDDATE: these timestamps reflect when records were submitted to the hub, not necessarily when they were created or updated in source systems or in the hub using a stepper or duplicate manager, for example. Even unmodified records are considered updated upon submission to the hub.

  • B_CREDATOR, B_UPDATOR: for records created or modified through direct authoring or via a duplicate manager, these indicate the user who has performed changes. For records created or modified in a workflow, they indicate the user who submitted the workflow.

Access golden data

To access golden data using SQL, you query the GD table. The queries are filtered to access data for one or more entity classes.

The following code sample gives a query to access golden records in the GD table.

Select golden data, excluding records with post-consolidation errors
select G.* (1)
from GD_<Physical_Table_Name> G (2)
where G.B_CLASSNAME in ( <Classname_List> ) (3)
and (G.B_ERROR_STATUS is null or G.B_ERROR_STATUS <> 'ERROR') (4)

Explanation of the query:

1 We select all columns from the golden record. You can select specific columns from the GD record.
2 The GD table accessed is named after the entity. Replace <Physical Table Name> with the actual physical table name defined for the entity.
3 List of entity classes stored in this table that you want to access (e.g., Person, Company). If you are not using inheritance, only one class is stored in the table, and this clause is not required.
4 This filter excludes records with errors. To query errors, see Consume errors.
Select CONTACT_ID, FIRST_NAME, and LAST_NAME golden data for the Contact entity
select CONTACT_ID, FIRST_NAME, LAST_NAME from GD_CONTACT G where G.B_CLASSNAME = 'Contact'

Golden data primary key

The primary key for a golden data table depends on the entity type and ID generation method for the entity. A column named after the physical column name of the primary key attribute stores the golden record primary key for the GD table, with the following datatypes:

  • For ID generation using a sequence: NUMBER(38,0).

  • For ID generation using UUID: RAW.

  • For manually generated IDs, the datatype matches the one defined for the primary key attribute.

Consume master data

To access master data from fuzzy- and ID-matched entities, we use queries similar to those used to access the golden data.

Starting from golden data, master data can be referenced using the golden record primary key. The master data table (MD) includes a column that references this primary key. As the master data also stores the primary key of the publisher, it is possible to refer back to the source data from golden data via master data.

Basic entities do not have master data. Only source data, source errors, and golden data are stored for this type of entity.

Access to master data varies depending on the entity type (i.e., ID- or fuzzy-matched).

ID-matched entity

With ID-matched entities, the master data table has a structure similar to the golden data table, with the publisher code in addition. The same primary key is stored in the golden and master data within a column named after the physical column name of the primary key attribute (<primary_key_column> in the sample below).

Select golden data corresponding to a given source record ID (ID-matched entity)
select M.B_SOURCEID, G.*
from MD_<Physical_Table_Name> M inner join GD_<Physical_Table_Name> G on (
 	G.<primary_key_column> = M.<primary_key_column>
  and	G.B_CLASSNAME in ( <classname_list> )
)
where
      	M.B_CLASSNAME in ( <classname_list> )
  and 	M.B_PUBID = '<publisher_code>'
  and 	M.<primary_key_column> = '<searched_source_id>'

In this code sample, access is filtered with the class name for both the golden and master data (lines #5 and #8). The two tables are joined on their common primary key (line #4). In addition, master data is filtered by a source publisher (line #9) and the ID of the source record (line #10).

Select golden data for the Employee source record with EMPLOYEE_NUMBER=100 in the HR system (ID-matched entity)
select 	M.EMPLOYEE_NUMBER, G.FIRST_NAME, G.LAST_NAME from 	MD_EMPLOYEE M inner join GD_EMPLOYEE G on (
 	G.EMPLOYEE_NUMBER = M.EMPLOYEE_NUMBER
  and	G.B_CLASSNAME = 'Employee'
)
where
      	M.B_CLASSNAME = 'Employee'
  and 	M.B_PUBID = 'HR'
  and 	M.EMPLOYEE_NUMBER = '100'

The resulting information can be used to update the source record with golden data.

Fuzzy-matched entity

With fuzzy-matched entities, the master data table has a structure similar to the golden data table. It contains a reference to the golden data primary key, but the master data primary key consists of two columns:

  • B_PUBID (VARCHAR2(30 CHAR)) contains the code of the publisher that published this record.

  • B_SOURCEID (VARCHAR2(128 CHAR)) contains the ID of the source record in that publisher

As a consequence, the link between golden and master records is established via the primary key column as in an ID-matched entity. However, the link to the source is accomplished using the B_SOURCEID column.

Select golden data for a given source record ID (fuzzy-matched entity)
select M.B_SOURCEID, G.*
from MD_<entity> M inner join GD_<entity> G on (
 	G. <primary_key_column> = M. <primary_key_column>
  and	G.B_CLASSNAME in ( <classname_list> )
)
where
      	M.B_CLASSNAME in ( <classname_list> )
  and 	M.B_PUBID = '<publisher_code>'
  and 	M.B_SOURCEID = '<searched_source_id>'

In this code, the golden and master data tables are joined on their golden record primary key (line #4). However, the master data is restricted by the source publisher (line #9) and the ID of the source record (line #10), using the B_SOURCEID column.

Select golden record values for the Contact source record with ID 27030 in the CRM system (fuzzy-matched entity)
select M.B_SOURCEID, G.FIRST_NAME, G.LAST_NAME from	MD_CONTACT M inner join GD_CONTACT G on (
 	G.CONTACT_ID = M.CONTACT_ID
  and	G.B_CLASSNAME = 'Contact'
)
where
      	M.B_CLASSNAME = 'Contact'
  and 	M.B_PUBID = 'CRM'
  and 	M.B_SOURCEID = '27030'
Example

Select the duplicates identified for a specific Contact source record with the ID 27030 in the CRM system (fuzzy-matched entity). In this example, the master data table is used twice (aliased as M and MM) to retrieve both sides of a duplicate pair.

Side-by-side duplicates
select
  M.B_PUBID DUP1_PUBLISHER, M.B_SOURCEID DUP1_ID, M.FIRST_NAME DUP1_FIRST_NAME, M.LAST_NAME DUP1_LAST_NAME,
  MM.B_PUBID DUP2_PUBLISHER, MM.B_SOURCEID DUP2_ID, MM.FIRST_NAME DUP2_FIRST_NAME, MM.LAST_NAME DUP2_LAST_NAME,
  G.CONTACT_ID GOLD_ID, G.FIRST_NAME GOLD_FIST_NAME, G.LAST_NAME GOLD_LAST_NAME from 	MD_CONTACT M inner join GD_CONTACT G on
 (
 	G.CONTACT_ID = M.CONTACT_ID
  and	G.B_CLASSNAME = 'Contact'
 )
inner join MD_CONTACT MM on
 (
 	MM.CONTACT_ID = M.CONTACT_ID
  and	MM.B_CLASSNAME = 'Contact'
 )
where
      	M.B_CLASSNAME = 'Contact'
  and 	M.B_PUBID = 'CRM' /* Publisher ID */
  and 	M.B_SOURCEID = '27030' /* Source ID */
/* and M.B_PUBID = MM.B_PUBID */
/* Uncomment the previous line to restrict the duplicates
   to those within the CRM application */

Consume errors

Pre-consolidation errors

Pre-consolidation errors can be accessed via the source error tables (SE, or AE for basic entities).
These tables store error details—​that is, information about the constraints that caused the records to fail the validation. This information is stored in the B_CONSTRAINTNAME (name of the constraint) and B_CONSTRAINTTYPE (type of constraint) columns.

SE tables do not store the erroneous data itself, but they contain the identifiers to the source (SD or SA for basic entities) records in error.

The B_ERROR_STATUS column on the SD table offers a straightforward means to identify records with errors.
Example

The queries below illustrate how to select the source errors for the Contact entity. In this example, incorrect foreign references would appear. To identify them, we retrieve the incorrect referenced IDs from FP_CUSTOMER and FS_CUSTOMER.

Select Contact source records in error
  select * from SD_CONTACT where B_ERROR_STATUS = 'ERROR';
Select errors and data for the Contact source records in error
select SE.B_BATCHID, SD.B_LOADID,
  SE.B_CONSTRAINTNAME, SE.B_CONSTRAINTTYPE,
  SE.B_PUBID, SE.B_SOURCEID,
  SD.FIRST_NAME, SD.LAST_NAME,
  SD.FP_CUSTOMER, SD.FS_CUSTOMER from SE_CONTACT SE, SD_CONTACT SD
where
  SE.B_CLASSNAME = 'Contact'
  and SD.B_CLASSNAME = 'Contact'
  and SD.B_PUBID = SE.B_PUBID
  and SD.B_SOURCEID = SE.B_SOURCEID
  and SD.B_LOADID = SE.B_LOADID
  and SD.B_ERROR_STATUS = 'ERROR';

Post-consolidation errors

Post-consolidation errors can also be accessed via the golden error (GE) tables.
These tables store error details—​that is, information about the constraints that caused the records to fail validation. This information is stored in the B_CONSTRAINTNAME (name of the constraint) and B_CONSTRAINTTYPE (type of constraint) columns.

GE tables do not store the erroneous data directly but contain identifiers to the golden (GD) records in error.

The B_ERROR_STATUS column on the GD table offers a straightforward means to identify records with errors.
Example

The queries below illustrate how to select errors and data for Contact golden records in error.

Select Contact golden records in error
select * from GD_CONTACT where B_ERROR_STATUS = 'ERROR';
Select errors and data for the Contact golden records in error
select GE.B_BATCHID,
  GE.B_CONSTRAINTNAME, GE.B_CONSTRAINTTYPE,
  GE.CONTACT_ID,
  GD.FIRST_NAME, GD.LAST_NAME, GD.F_CUSTOMER from GE_CONTACT GE, GD_CONTACT GD
where
  GE.B_CLASSNAME = 'Contact'
  and GD.B_CLASSNAME = 'Contact'
  and GD.CONTACT_ID = GE.CONTACT_ID
  and GD.B_BATCHID = GE.B_BATCHID
  and GD.B_ERROR_STATUS = 'ERROR';