Publish data using SQL
Publishing data in the data hub using SQL is the optimal method for inbound batch data integration.
Data publishing is accomplished through function calls that initialize and submit the external load. DML statements are then used to load the data into the landing tables (SD
for matching entities and SA
for basic entities).
Overview
This approach uses the SQL interface for managing external loads, as detailed below:
-
The external load is initialized using a function call to the
GET_NEW_LOADID
function on the repository schema. This function call returns a load ID. -
SD
(orSA
for basic entities) tables are loaded using SQL inserts issued on the data location schema. -
The external load is submitted or canceled using function calls on the repository schema:
-
SUBMIT_LOAD
to submit a load identified by its load ID. This function call returns a batch ID. -
CANCEL_LOAD
to cancel a load identified by its load ID.
-
This approach is recommended when the middleware is an ETL or data integration product. |
Alternatively, you can use continuous loads to load data into the SD
tables with the load ID of the continuous load, without having to explicitly initialize and submit individual external loads.
Initialize a load
Initializing an external load requires the following parameters:
-
data_location_name
: the name of the data location hosting the data hub into which the external load is performed. -
program_name
: an informational variable identifying the middleware performing the external load (e.g.,ETL Custom Script
). -
load_description
: an informational variable describing the nature of the load (e.g.,Daily updates for Customers and Contacts
). -
user_name
: the name of the user initializing the external load. This user may or may not be defined in the application server’s security realm.
The GET_NEW_LOADID
function call, performed on the repository database schema, initializes an external load and returns a load ID.
In Oracle, the integration load functions are part of an INTEGRATION_LOAD package. You must prefix the function name with the name of this package.
|
In SQL Server, the integration load functions are procedures in the repository database that return values such as the load ID. |
vLoad_id := <repository_schema>.INTEGRATION_LOAD.GET_NEW_LOADID(
'<data_location_name>', /* Data Location Name (as in the UI) */
'<program_name>', /* Informational. Identifies the Middleware*/
'<load_description>', /* Informational. Describes the load. */
'<user_name>' /* User initializing the load. */
);
vLoad_id := <repository_schema>.get_new_loadid(
'<data_location_name>', /* Data Location Name (as in the UI) */
'<program_name>', /* Informational. Identifies the Middleware*/
'<load_description>', /* Informational. Describes the load. */
'<user_name>' /* User initializing the load. */
);
DECLARE
@v_load_id int
BEGIN
EXEC dbo.GET_NEW_LOADID
@return_value_argument=@v_load_id OUTPUT,
@p_data_location_name = 'SEMARCHY_53DL', /* Data Location Name (as in the UI) */
@p_program_name = 'manual_etl_script', /* Informational. Identifies the Middleware */
@p_load_description = N'load_data_update',/* Informational. Describes the load. */
/* Note the N to allow accented characters */
@p_username = 'semadmin' /* User initializing the load. */
PRINT @v_load_id
END
The following example demonstrates a function call that initializes an external load on the CustomerMDMHub data location. The repository schema is REPO
, and the returned load ID is stored in the vLoad_id
variable.
vLoad_id := REPO.INTEGRATION_LOAD.GET_NEW_LOADID(
'CustomerMDMHub',
'Custom ETL',
'Initial load for the hub',
'John Doe' );
Load data
Loading data involves inserting new source records in the SD
(and SA
for basic entities) tables for the set of entities processed by the integration job.
When loading data in the SD
or SA
tables:
-
You use SQL insert commands or your standard ETL or data integration platform.
-
The insert commands are issued on the
SD
orSA
tables within the data location schema.
Tables to load
Make sure to load all the SD
and SA
tables for the entities considered in the integration job. For example, if the integration job processes the Customer and Contact entities, you should load the SD_CUSTOMER
and SD_CONTACT
tables.
Data inserted into an SD
table for an entity not included in the integration job will be ignored. For example, if the integration job processes the Customer and Contact entities, data loaded in the SD_EMPLOYEE
table will not be taken into account by the integration job to certify golden records for the Employee entity.
Referential integrity and load order
There is no specific order required to load the SD
and SA
tables, as no foreign keys are implemented on these tables.
Reference validation is handled by the integration job. Therefore, references between entities must be loaded as indicated in Columns to load.
Using transactions
It is recommended to use a database transaction when writing to the SD
and SA
tables.
-
For external loads: canceling an external load cancels the load but does not delete records from the tables. Writing in an auto-commit transaction in the tables and then canceling the load leaves useless information in the tables. Using a transaction allows you to roll back all changes on the tables when canceling a load.
-
For continuous loads: it is strongly recommended to use a transaction and commit only after all tables are loaded. As the continuous load consumes data committed in the table on its own schedule, loading tables without a transaction may cause the continuous load to start processing a data batch before it is fully completed.
Columns to load
This section offers guidance on loading the columns of the SD
and SA
tables.
System columns (mandatory)
The following system columns must be populated as described:
-
B_LOADID
: populate this column with the load ID provided by theGET_NEW_LOADID
function call, with the ID of the continuous load. You can also use theGET_CONTINUOUS_LOADID
function to retrieve a continuous load ID by its name. -
B_CLASSNAME
: enter the name of the entity (or class) being loaded. In cases of inheritance, the table stores data for all parent and child classes. SetB_CLASSNAME
explicitely to the name of the entity for which data is being published (e.g.,Person
,Party
,Company
). For more information, see Class name. -
B_PUBID
: load this column with a publisher code for ID-matched and fuzzy-matched entities (e.g.,CRM
,MKT
, etc). This code identifies the publisher (i.e., the application providing the data) and should be declared in the model edition. To view the list of publisher codes, double-click the Publishers node in the Model Design perspective. If the publisher code is not known to the model, the data will still be processed, but this publisher will be ranked lowest in a Preferred Publisher consolidation strategy.
Basic entities have a single source and do not use the B_PUBID column.
|
Publisher codes are case-sensitive. Ensure that you load the B_PUBID column with the publisher code exactly as defined in the model. Publisher codes may include uppercase letters, digits, and underscores.
|
Within a single load, you can include records from various publishers by using the B_PUBID column to identify each publisher.
|
Primary keys (mandatory)
For matching entities, the primary key you load into the SD
table identifies the source record from the publisher (specified by the publisher code B_PUDIB
). This key enables the consolidation process to account for the creation or update of a source record and report corresponding data changes to the golden record.
For basic entities, the primary key provided in the SA
table is the one of the golden records. It facilitates updates to these records.
The primary key column to load varies based on the entity type.
Basic entity
If the entity is basic, then this ID must be loaded in the column representing the attribute defined as the primary key attribute for the entity.
ID-matched entity
If the entity uses ID matching, then this ID must be loaded in the column representing the attribute defined as the primary key attribute for the entity.
Fuzzy-matched entity
If the entity uses fuzzy matching, then you must load the value of the primary key from the source system into the B_SOURCEID
column.
If this primary key is a composite key, then you must concatenate the values of the composite primary key and load them into the B_SOURCEID
column.
The B_SOURCEID column is a VARCHAR(128) column. Make sure to perform the appropriate conversions for loading this column.
|
References (mandatory)
When loading data for entities connected by a reference relationship, you must load the referencing entity with the value of the referenced entity’s primary key. The specific columns to load vary based on the type of the referenced entity.
Reference to a basic entity
If the referenced entity is basic, you need to load the column representing the referencing attribute. This column is F_<Physical Name of the Reference To Role Name>
.
Reference to an ID-matched entity
If the referenced entity is ID-matched, you need to load the column representing the referencing attribute. This column is F_<Physical Name of the Reference To Role Name>
.
For example, if Customer references Employee and this entity uses ID matching, then you must load into SD_CUSTOMER
the F_EMPLOYEE
column with the primary key of the source employee record referenced by each customer record.
Reference to a fuzzy-matched entity
If the referenced entity is fuzzy-matched, then you need to load two columns:
-
FS_<Physical Name of the Referenced To Role Name>
: load this column with the source ID of the referenced record. -
FP_<Physical Name of the Referenced To Role Name>
: code of the publisher of the referenced record.
These columns should be considered together. You should not load the FP_ column with a publisher code and leave FS_ to a null value, and vice versa.
|
For example, if Contact references Customer and this entity uses fuzzy matching, you must load the following columns into the SD_CONTACT
table:
-
FP_CUSTOMER
: code of the publisher providing the customer referenced by the given contact (e.g.,MKT
). -
FS_CUSTOMER
: source ID of the customer referenced by the given contact (e.g.,81239
).
Attribute columns
Ensure that you populate the attribute columns relevant to the entity you are loading—that is:
-
The attribute columns that are appropriate for the entity class (
B_CLASSNAME
) being loaded. -
The mandatory attribute columns. Failing to do so may result in pre-consolidation validation rejecting source records with null values.
-
The columns for attributes that use a list-of-values (LOV) type. Loading values outside the LOV range may cause pre-consolidation validation to reject the source records.
Attributes may be loaded with null or incorrect values if the values are set or modified by enrichers. Enrichers are executed before validations. |
Loading
Date and Timestamp columns
|
Other columns (optional)
The following columns do not need to be loaded, or can be optionally loaded:
-
B_ORIGINALBATCHID
: this column is not used for external loads and should not be loaded. -
B_CREATOR
,B_UPDATOR
: these columns can be optionally loaded to store the users who have created or updated the records in the source systems. If left null, these columns are automatically set to the name of the user who submits the batch. -
B_CREDATE
,B_UPDDATE
: these columns can be optionally loaded to store the creation and update date (timestamp) of the records in the source systems. It is recommended to populate these columns with the current date. Values loaded in these columns are not propagated beyond theSD
table.
Submit a load
Submitting an external load uses the following parameters:
-
load_id
: the load ID returned by the load initialization. -
integration_job
: the name of the integration job to process this load. -
user_name
: the name of the user who has initialized the external load. This user may or may not be defined in the security realm of the application server.
The SUBMIT_LOAD
function call submits an external load identified by its load ID and returns a batch ID. It is performed on the repository database schema.
vBatch_id := <repository_schema>.INTEGRATION_LOAD.SUBMIT_LOAD(
<load_id> /* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID */
'<integration_job>' /* Name of the integration job to trigger. */
'<user_name>' /* User who has initialized the load. */
);
vBatch_id := <repository_schema>.submit_load(
<load_id> /* Load ID returned by get_new_loadid */
'<integration_job>' /* Name of the integration job to trigger. */
'<user_name>' /* User who has initialized the load. */
);
DECLARE
@return_value int,
@return_value_argument numeric (18,0)
EXEC @return_value = dbo.SUBMIT_LOAD
@return_value_argument=@return_value_argument OUTPUT,
@P_LOAD_ID=<LoadID from GetNewLoad>,
@P_INTEGRATION_JOB_NAME ='JobName',
@P_USERNAME ='semadmin'
SELECT @return_value_argument
The following example demonstrates a function call for submitting an external load identified by load ID 22. The external load is submitted with the job name INTEGRATE_DATA
and the repository schema is REPO
. The returned batch ID is stored in the vBatch_id
variable.
vBatch_id := REPO.INTEGRATION_LOAD.SUBMIT_LOAD(
22,
'INTEGRATE_DATA',
'John Doe' );
Cancel a load
Canceling a load is performed using the CANCEL_LOAD
function with the following parameters:
-
load_id
: the load ID returned by the load initialization. -
user_name
: the name of the user who has initialized the external load. This user may or may not be defined in the application server’s security realm.
The CANCEL_LOAD
procedure cancels an external load identified by its load ID. It is performed on the repository database schema.
This procedure does not flush the content of the SD_% tables loaded during the external load. This must be handled separately.
|
<repository_schema>.INTEGRATION_LOAD.CANCEL_LOAD(
<load_id> /* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID */
'<user_name>' /* User who has initialized the load. */
);
<repository_schema>.cancel_load(
<load_id> /* Load ID returned by get_new_loadid */
'<user_name>' /* User who has initialized the load. */
);
EXEC <repository_database>.dbo.CANCEL_LOAD
@P_LOAD_ID = @LOAD_ID,
@P_USERNAME = '<user_name>' /* User who has initialized the load. */
INTEGRATION_LOAD.CANCEL_LOAD
to cancel an external load identified by the load ID 22 (Oracle)REPO.INTEGRATION_LOAD.CANCEL_LOAD(
22,
'John Doe' );