Publish data using SQL

Publishing data in the data hub using SQL is the optimal method for inbound batch data integration.

Data publishing is accomplished through function calls that initialize and submit the external load. DML statements are then used to load the data into the landing tables (SD for matching entities and SA for basic entities).

Overview

This approach uses the SQL interface for managing external loads, as detailed below:

  1. The external load is initialized using a function call to the GET_NEW_LOADID function on the repository schema. This function call returns a load ID.

  2. SD (or SA for basic entities) tables are loaded using SQL inserts issued on the data location schema.

  3. The external load is submitted or canceled using function calls on the repository schema:

    • SUBMIT_LOAD to submit a load identified by its load ID. This function call returns a batch ID.

    • CANCEL_LOAD to cancel a load identified by its load ID.

This approach is recommended when the middleware is an ETL or data integration product.

Alternatively, you can use continuous loads to load data into the SD tables with the load ID of the continuous load, without having to explicitly initialize and submit individual external loads.

Initialize a load

Initializing an external load requires the following parameters:

  • data_location_name: the name of the data location hosting the data hub into which the external load is performed.

  • program_name: an informational variable identifying the middleware performing the external load (e.g., ETL Custom Script).

  • load_description: an informational variable describing the nature of the load (e.g., Daily updates for Customers and Contacts).

  • user_name: the name of the user initializing the external load. This user may or may not be defined in the application server’s security realm.

The GET_NEW_LOADID function call, performed on the repository database schema, initializes an external load and returns a load ID.

In Oracle, the integration load functions are part of an INTEGRATION_LOAD package. You must prefix the function name with the name of this package.
In SQL Server, the integration load functions are procedures in the repository database that return values such as the load ID.
Example 1. Syntax to initialize an external load for Oracle
vLoad_id := <repository_schema>.INTEGRATION_LOAD.GET_NEW_LOADID(
	'<data_location_name>',  /* Data Location Name (as in the UI) */
	'<program_name>',        /* Informational. Identifies the Middleware*/
	'<load_description>',    /* Informational. Describes the load. */
	'<user_name>'            /* User initializing the load. */
	  );
Example 2. Syntax to initialize an external load for PostgreSQL
vLoad_id := <repository_schema>.get_new_loadid(
	'<data_location_name>',  /* Data Location Name (as in the UI) */
	'<program_name>',        /* Informational. Identifies the Middleware*/
	'<load_description>',    /* Informational. Describes the load. */
	'<user_name>'            /* User initializing the load. */
	  );
Example 3. Syntax to initialize an external load for SQL Server
DECLARE
  @v_load_id int
  BEGIN
   EXEC dbo.GET_NEW_LOADID
    @return_value_argument=@v_load_id OUTPUT,
    @p_data_location_name = 'SEMARCHY_53DL', /* Data Location Name (as in the UI) */
    @p_program_name = 'manual_etl_script', /*  Informational. Identifies the Middleware */
    @p_load_description = N'load_data_update',/* Informational. Describes the load. */
     /* Note the N to allow accented characters */
    @p_username = 'semadmin' /* User initializing the load. */
    PRINT @v_load_id
END

The following example demonstrates a function call that initializes an external load on the CustomerMDMHub data location. The repository schema is REPO, and the returned load ID is stored in the vLoad_id variable.

Example 4. Initialize an external load (Oracle)
vLoad_id := REPO.INTEGRATION_LOAD.GET_NEW_LOADID(
	'CustomerMDMHub',
	'Custom ETL',
	'Initial load for the hub',
	'John Doe' );

Load data

Loading data involves inserting new source records in the SD (and SA for basic entities) tables for the set of entities processed by the integration job.

When loading data in the SD or SA tables:

  • You use SQL insert commands or your standard ETL or data integration platform.

  • The insert commands are issued on the SD or SA tables within the data location schema.

Tables to load

Make sure to load all the SD and SA tables for the entities considered in the integration job. For example, if the integration job processes the Customer and Contact entities, you should load the SD_CUSTOMER and SD_CONTACT tables.

Data inserted into an SD table for an entity not included in the integration job will be ignored. For example, if the integration job processes the Customer and Contact entities, data loaded in the SD_EMPLOYEE table will not be taken into account by the integration job to certify golden records for the Employee entity.

Referential integrity and load order

There is no specific order required to load the SD and SA tables, as no foreign keys are implemented on these tables.

Reference validation is handled by the integration job. Therefore, references between entities must be loaded as indicated in Columns to load.

Using transactions

It is recommended to use a database transaction when writing to the SD and SA tables.

  • For external loads: canceling an external load cancels the load but does not delete records from the tables. Writing in an auto-commit transaction in the tables and then canceling the load leaves useless information in the tables. Using a transaction allows you to roll back all changes on the tables when canceling a load.

  • For continuous loads: it is strongly recommended to use a transaction and commit only after all tables are loaded. As the continuous load consumes data committed in the table on its own schedule, loading tables without a transaction may cause the continuous load to start processing a data batch before it is fully completed.

Columns to load

This section offers guidance on loading the columns of the SD and SA tables.

System columns (mandatory)

The following system columns must be populated as described:

  • B_LOADID: populate this column with the load ID provided by the GET_NEW_LOADID function call, with the ID of the continuous load. You can also use the GET_CONTINUOUS_LOADID function to retrieve a continuous load ID by its name.

  • B_CLASSNAME: enter the name of the entity (or class) being loaded. In cases of inheritance, the table stores data for all parent and child classes. Set B_CLASSNAME explicitely to the name of the entity for which data is being published (e.g., Person, Party, Company). For more information, see Class name.

  • B_PUBID: load this column with a publisher code for ID-matched and fuzzy-matched entities (e.g., CRM, MKT, etc). This code identifies the publisher (i.e., the application providing the data) and should be declared in the model edition. To view the list of publisher codes, double-click the Publishers node in the Model Design perspective. If the publisher code is not known to the model, the data will still be processed, but this publisher will be ranked lowest in a Preferred Publisher consolidation strategy.

Basic entities have a single source and do not use the B_PUBID column.
Publisher codes are case-sensitive. Ensure that you load the B_PUBID column with the publisher code exactly as defined in the model. Publisher codes may include uppercase letters, digits, and underscores.
Within a single load, you can include records from various publishers by using the B_PUBID column to identify each publisher.

Primary keys (mandatory)

For matching entities, the primary key you load into the SD table identifies the source record from the publisher (specified by the publisher code B_PUDIB). This key enables the consolidation process to account for the creation or update of a source record and report corresponding data changes to the golden record.

For basic entities, the primary key provided in the SA table is the one of the golden records. It facilitates updates to these records.

The primary key column to load varies based on the entity type.

Basic entity

If the entity is basic, then this ID must be loaded in the column representing the attribute defined as the primary key attribute for the entity.

ID-matched entity

If the entity uses ID matching, then this ID must be loaded in the column representing the attribute defined as the primary key attribute for the entity.

Fuzzy-matched entity

If the entity uses fuzzy matching, then you must load the value of the primary key from the source system into the B_SOURCEID column. If this primary key is a composite key, then you must concatenate the values of the composite primary key and load them into the B_SOURCEID column.

The B_SOURCEID column is a VARCHAR(128) column. Make sure to perform the appropriate conversions for loading this column.

References (mandatory)

When loading data for entities connected by a reference relationship, you must load the referencing entity with the value of the referenced entity’s primary key. The specific columns to load vary based on the type of the referenced entity.

Reference to a basic entity

If the referenced entity is basic, you need to load the column representing the referencing attribute. This column is F_<Physical Name of the Reference To Role Name>.

Reference to an ID-matched entity

If the referenced entity is ID-matched, you need to load the column representing the referencing attribute. This column is F_<Physical Name of the Reference To Role Name>.

For example, if Customer references Employee and this entity uses ID matching, then you must load into SD_CUSTOMER the F_EMPLOYEE column with the primary key of the source employee record referenced by each customer record.

Reference to a fuzzy-matched entity

If the referenced entity is fuzzy-matched, then you need to load two columns:

  • FS_<Physical Name of the Referenced To Role Name>: load this column with the source ID of the referenced record.

  • FP_<Physical Name of the Referenced To Role Name>: code of the publisher of the referenced record.

These columns should be considered together. You should not load the FP_ column with a publisher code and leave FS_ to a null value, and vice versa.

For example, if Contact references Customer and this entity uses fuzzy matching, you must load the following columns into the SD_CONTACT table:

  • FP_CUSTOMER: code of the publisher providing the customer referenced by the given contact (e.g., MKT).

  • FS_CUSTOMER: source ID of the customer referenced by the given contact (e.g., 81239).

Attribute columns

Ensure that you populate the attribute columns relevant to the entity you are loading—​that is:

  • The attribute columns that are appropriate for the entity class (B_CLASSNAME) being loaded.

  • The mandatory attribute columns. Failing to do so may result in pre-consolidation validation rejecting source records with null values.

  • The columns for attributes that use a list-of-values (LOV) type. Loading values outside the LOV range may cause pre-consolidation validation to reject the source records.

Attributes may be loaded with null or incorrect values if the values are set or modified by enrichers. Enrichers are executed before validations.
Loading Date and Timestamp columns
  • Date attributes convert to Date columns in the database, with different behaviors depending on the database system:

    • In PostgreSQL and SQL Server, dates do not include a time portion and do not require specific transformation.

    • In Oracle, dates include a time portion. This portion is automatically truncated by the certification job using the TRUNC(date) function. Integration specialists should be aware of this automatic truncation and consider performing this truncation in their integration processes if needed.

    • In Snowflake, dates do not require specific transformations, similar to PostgreSQL and SQL Server. However, Snowflake defaults to the America/Los_Angeles timezone, which can affect how date and timestamp values are interpreted. Integration specialists should account for potential timezone differences and align the timezone settings between Snowflake and the xDM application.

  • Timestamp attributes convert to Timestamp columns that include a timezone portion. xDM stores and expects timestamps in the timezone of the application server hosting the xDM application. When loading timestamps, integration specialists should account for timezone differences and potential conversions between their integration component and the data location’s database server.

Other columns (optional)

The following columns do not need to be loaded, or can be optionally loaded:

  • B_ORIGINALBATCHID: this column is not used for external loads and should not be loaded.

  • B_CREATOR, B_UPDATOR: these columns can be optionally loaded to store the users who have created or updated the records in the source systems. If left null, these columns are automatically set to the name of the user who submits the batch.

  • B_CREDATE, B_UPDDATE: these columns can be optionally loaded to store the creation and update date (timestamp) of the records in the source systems. It is recommended to populate these columns with the current date. Values loaded in these columns are not propagated beyond the SD table.

Submit a load

Submitting an external load uses the following parameters:

  • load_id: the load ID returned by the load initialization.

  • integration_job: the name of the integration job to process this load.

  • user_name: the name of the user who has initialized the external load. This user may or may not be defined in the security realm of the application server.

The SUBMIT_LOAD function call submits an external load identified by its load ID and returns a batch ID. It is performed on the repository database schema.

Example 5. Submitting a load (Oracle)
vBatch_id := <repository_schema>.INTEGRATION_LOAD.SUBMIT_LOAD(
	<load_id>		/* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID */
	'<integration_job>'	/* Name of the integration job to trigger. */
	'<user_name>'		/* User who has initialized the load. */
	  );
Example 6. Submitting a load (PostgreSQL)
vBatch_id := <repository_schema>.submit_load(
	<load_id>		/* Load ID returned by get_new_loadid */
	'<integration_job>'	/* Name of the integration job to trigger. */
	'<user_name>'		/* User who has initialized the load. */
	  );
Example 7. Submitting a load (SQL Server)
DECLARE
@return_value int,
@return_value_argument numeric (18,0)
EXEC @return_value = dbo.SUBMIT_LOAD
	@return_value_argument=@return_value_argument OUTPUT,
	@P_LOAD_ID=<LoadID from GetNewLoad>,
	@P_INTEGRATION_JOB_NAME ='JobName',
   @P_USERNAME ='semadmin'

SELECT @return_value_argument

The following example demonstrates a function call for submitting an external load identified by load ID 22. The external load is submitted with the job name INTEGRATE_DATA and the repository schema is REPO. The returned batch ID is stored in the vBatch_id variable.

Example 8. Submitting an external load identified by the load ID 22 (Oracle)
vBatch_id := REPO.INTEGRATION_LOAD.SUBMIT_LOAD(
	22,
	'INTEGRATE_DATA',
	'John Doe' );

Cancel a load

Canceling a load is performed using the CANCEL_LOAD function with the following parameters:

  • load_id: the load ID returned by the load initialization.

  • user_name: the name of the user who has initialized the external load. This user may or may not be defined in the application server’s security realm.

The CANCEL_LOAD procedure cancels an external load identified by its load ID. It is performed on the repository database schema.

This procedure does not flush the content of the SD_% tables loaded during the external load. This must be handled separately.
Example 9. Cancelling a load (Oracle)
<repository_schema>.INTEGRATION_LOAD.CANCEL_LOAD(
	<load_id>		/* Load ID returned by INTEGRATION_LOAD.GET_NEW_LOADID */
	'<user_name>'		/* User who has initialized the load. */
	  );
Example 10. Cancelling a load (PostgreSQL)
<repository_schema>.cancel_load(
	<load_id>		/* Load ID returned by get_new_loadid */
	'<user_name>'		/* User who has initialized the load. */
	  );
Example 11. Cancelling a load (SQL Server)
EXEC <repository_database>.dbo.CANCEL_LOAD
	@P_LOAD_ID = @LOAD_ID,
	@P_USERNAME = '<user_name>'	/* User who has initialized the load. */
Example 12. Calling INTEGRATION_LOAD.CANCEL_LOAD to cancel an external load identified by the load ID 22 (Oracle)
REPO.INTEGRATION_LOAD.CANCEL_LOAD(
	22,
	'John Doe' );