Publish data using the REST API

The REST API facilitates the publication of data changes and deletions to a data hub.

Overview

The REST API provides the capabilities to manage loads, which includes querying, creating, submitting, and canceling loads. It also supports the persistence of records within existing loads. Additionally, it offers a streamlined option to load and submit data in a single request.

Query existing loads

Method	`GET`
Base URL	`http://<host>:<port>/semarchy/api/rest/`
URL	`[base_url]/loads/[data_location_name]` to list loads. `[base_url]/load-count/[data_location_name]` to count loads.
Supported parameters	`$offset`: defines the results offset for pagination for the list of loads. `$limit`: defines the maximum number of returned results for the list of loads. `$batchId`: limits results to loads with a given batch ID. `$loadStatus`: limits results to loads in a given status. `$jobNamePattern`: limits results to loads running in a queue with a name matching the pattern (use the `_` and `%` wildcards to represent one or any number of characters). `$queueNamePattern`: limits results to loads whose job name matches the pattern. Use the `_` and `%` wildcards to represent one or any number of characters. `$programNamePattern`: limits results to loads created with a program name matching the pattern. Use the `_` and `%` wildcards to represent one or any number of characters. `$loadCreator`: limits results to loads with a given creator. `$batchSubmitter`: limits results to loads with a given submitter. `$loadDescriptionPattern`: limits results to a load whose description matches the pattern. Use the `_` and `%` wildcards to represent one or any number of characters.
Response format	The response includes either a load count or a list of loads that meet the specified criteria. The details returned for each load vary based on its status. Query existing loads: sample response { "loads": [ { "loadId": 21, "loadStatus": "DONE", "loadCreator": "semadmin", "loadCreationDate": "2019-04-01T15:47:51.920Z", "programName": "curl", "loadDescription": "Customer Load", "loadSubmitDate": "2019-04-01T15:47:52.137Z", "batchSubmitter": "semadmin", "batchId": 21, "integrationJobName": "INTEGRATE_DATA", "integrationJobQueueName": "Default", "integrationJob": { "startDate": "2019-04-01T15:47:54.537Z", "completionDate": "2019-04-01T15:48:39.483Z", "duration": 44946, "currentTask": { "name": "Insert new MH for modified MD records", "startDate": "2019-04-01T15:48:39.244Z", "duration": 239 }, "notificationStatus": "DONE" }, "numberOfJobExecutions": 1, "submitInterval": -1, "submittable": true, "loadType": "EXTERNAL_LOAD" } ] }

Method

GET

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/loads/[data_location_name] to list loads. [base_url]/load-count/[data_location_name] to count loads.

Supported parameters

$offset: defines the results offset for pagination for the list of loads.
$limit: defines the maximum number of returned results for the list of loads.
$batchId: limits results to loads with a given batch ID.
$loadStatus: limits results to loads in a given status.
$jobNamePattern: limits results to loads running in a queue with a name matching the pattern (use the _ and % wildcards to represent one or any number of characters).
$queueNamePattern: limits results to loads whose job name matches the pattern. Use the _ and % wildcards to represent one or any number of characters.
$programNamePattern: limits results to loads created with a program name matching the pattern. Use the _ and % wildcards to represent one or any number of characters.
$loadCreator: limits results to loads with a given creator.
$batchSubmitter: limits results to loads with a given submitter.
$loadDescriptionPattern: limits results to a load whose description matches the pattern. Use the _ and % wildcards to represent one or any number of characters.

Response format

The response includes either a load count or a list of loads that meet the specified criteria. The details returned for each load vary based on its status.

Query existing loads: sample response

{
  "loads": [
    {
      "loadId": 21,
      "loadStatus": "DONE",
      "loadCreator": "semadmin",
      "loadCreationDate": "2019-04-01T15:47:51.920Z",
      "programName": "curl",
      "loadDescription": "Customer Load",
      "loadSubmitDate": "2019-04-01T15:47:52.137Z",
      "batchSubmitter": "semadmin",
      "batchId": 21,
      "integrationJobName": "INTEGRATE_DATA",
      "integrationJobQueueName": "Default",
      "integrationJob": {
        "startDate": "2019-04-01T15:47:54.537Z",
        "completionDate": "2019-04-01T15:48:39.483Z",
        "duration": 44946,
        "currentTask": {
          "name": "Insert new MH for modified MD records",
          "startDate": "2019-04-01T15:48:39.244Z",
          "duration": 239
        },
        "notificationStatus": "DONE"
      },
      "numberOfJobExecutions": 1,
      "submitInterval": -1,
      "submittable": true,
      "loadType": "EXTERNAL_LOAD"
    }
  ]
}

Load type

loadType indicates the nature of a load:

An external load (EXTERNAL_LOAD), which can be submitted (submittable=true).
A continuous load (CONTINUOUS_LOAD), which cannot be manually submitted but is automatically submitted every submitInterval seconds.
A load attached to application activities, which cannot be submitted:
- WORKFLOW_SUBMIT corresponds to a submit action performed by a workflow.
- LEGACY_WORKFLOW corresponds to a submit action performed by a legacy workflow. It replaces the WORKFLOW load type, which is deprecated.
- DIRECT_AUTHORING corresponds to a submit action performed by a direct authoring action.
- DIRECT_DELETE corresponds to a submit action performed by a direct delete action.
- DIRECT_DUPS_CONFIRM corresponds to a submit action performed by a direct duplicate confirmation action.
- DIRECT_DUPS_MANAGEMENT corresponds to a submit action performed by a direct duplicate manager action.

Load status

The table below enumerates the potential load statuses.

Load status Description

Load status	Description
CANCELED	The load has been canceled (`CancelLoad`).
DONE	The job completed successfully with no validation errors.
ERROR	The job did not complete successfully, it was canceled by an administrator.
PENDING	The load has been submitted. A batch was created and is waiting for the batch poller to pick it up.
PROCESSING	The batch’s job is currently being processed by the engine.
RUNNING	The load is currently running.
SCHEDULED	The batch has been taken into account by the batch poller. The job is queued by the engine.
STOPPED	The job has been canceled.
SUSPENDED	The job is suspended—either by an administrator or due to an error. Administrator intervention is required.
WARNING	The job completed successfully, but some records have caused validation errors.

CANCELED

The load has been canceled (CancelLoad).

DONE

The job completed successfully with no validation errors.

ERROR

The job did not complete successfully, it was canceled by an administrator.

PENDING

The load has been submitted. A batch was created and is waiting for the batch poller to pick it up.

PROCESSING

The batch’s job is currently being processed by the engine.

RUNNING

The load is currently running.

SCHEDULED

The batch has been taken into account by the batch poller. The job is queued by the engine.

STOPPED

The job has been canceled.

SUSPENDED

The job is suspended—either by an administrator or due to an error. Administrator intervention is required.

WARNING

The job completed successfully, but some records have caused validation errors.

Integration job

If a job is attached to a load that was submitted, then the integrationJob object provides details about this job, including its start date, current task, duration, and any error that may occur during its execution.
It also includes notificationStatus, which indicates whether notifications were successfully sent.

currentTask corresponds to the ongoing task for RUNNING jobs, and the last executed task for KILLED or SUSPENDED jobs.

Use the API to monitor jobs that are SUSPENDED or in ERROR status to report possible integration issues. Combine this endpoint with the capability to manage loads to automate job restarts.

Query a load

Method	`GET`
Base URL	`http://<host>:<port>/semarchy/api/rest/`
URL	`[base_url]/loads/[data_location_name]/[load_id or load_name]`
Response format	The response includes the load identified by `[load_id or load_name]`. The information returned for the load depends on its status. Query one load: sample response `{ "loadId": 26, "loadStatus": "CANCELED", "loadCreator": "semadmin", "loadCreationDate": "2019-04-08T09:35:46.897Z", "numberOfJobExecutions": 0, "submitInterval": -1, "submittable": true, "loadType": "DIRECT_AUTHORING" }`

Method

GET

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/loads/[data_location_name]/[load_id or load_name]

Response format

The response includes the load identified by [load_id or load_name]. The information returned for the load depends on its status.

Query one load: sample response

{
  "loadId": 26,
  "loadStatus": "CANCELED",
  "loadCreator": "semadmin",
  "loadCreationDate": "2019-04-08T09:35:46.897Z",
  "numberOfJobExecutions": 0,
  "submitInterval": -1,
  "submittable": true,
  "loadType": "DIRECT_AUTHORING"
}

Initialize a load

Method	`POST`
Base URL	`http://<host>:<port>/semarchy/api/rest/`
URL	`[base_url]/loads/[data_location_name]`
Request payload	The request contains the `CREATE_LOAD` action, as well as the information required to create a new load. Create a load: sample request `{ "action":"CREATE_LOAD", "programName": "curl", "loadDescription": "Load Customers" }`
Response format	The response contains the load information, including the load ID, load type, and an indication of the status. Create a load: sample response `{ "loadId": 27, "loadStatus": "RUNNING", "loadCreator": "semadmin", "loadCreationDate": "2019-05-06T13:35:44.259Z", "programName": "curl", "loadDescription": "Load Customers", "numberOfJobExecutions": 0, "submitInterval": -1, "submittable": true, "loadType": "EXTERNAL_LOAD" }`

Method

POST

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/loads/[data_location_name]

Request payload

The request contains the CREATE_LOAD action, as well as the information required to create a new load.

Create a load: sample request

{
  "action":"CREATE_LOAD",
  "programName": "curl",
  "loadDescription": "Load Customers"
}

Response format

The response contains the load information, including the load ID, load type, and an indication of the status.

Create a load: sample response

{
  "loadId": 27,
  "loadStatus": "RUNNING",
  "loadCreator": "semadmin",
  "loadCreationDate": "2019-05-06T13:35:44.259Z",
  "programName": "curl",
  "loadDescription": "Load Customers",
  "numberOfJobExecutions": 0,
  "submitInterval": -1,
  "submittable": true,
  "loadType": "EXTERNAL_LOAD"
}

Load data

To load data into a specific load, the URL must include the load ID that was returned during the creation of the load, or the ID or name of a continuous load.

Data is loaded into GD tables for basic entities, and into MD tables for ID- and fuzzy-matched entities.

Using the REST API for bulk data loads is not recommended due to its inherent limitations in handling large volumes of data. The REST API is designed for fast processing of a few records, making it ideal for web services and other tasks requiring quick, responsive data handling.
In some cases, splitting the dataset into several smaller REST API calls can be a first approach to achieve faster processing, rather than processing everything in a single call. However, for optimal performance when dealing with bulk data, it is recommended to use SQL integration, which is better suited for managing large datasets.

Method

POST

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/loads/[data_location_name]/[load_id or load_name]

Request payload

The request includes the PERSIST_DATA action, as well as the information required to load data. This information includes:

persistOptions, which defines how records are published into the hub. For more information, see Configure data loads.
persistRecords, which includes the records to be loaded into the hub.

Load data: sample request

{
  "action":"PERSIST_DATA",
  "persistOptions": {
    "defaultPublisherId": "CRM",
    "optionsPerEntity": {
      "Customer": {
        "enrichers": "JOB_PRE_CONSO",
        "validations": "ALL",
        "queryPotentialMatchesRules": "ALL",
        "queryPotentialMatchesBaseExpressions": "ID",
        "responsePayloadRecordsBaseExpressions": "VIEW_ATTRS"
        }
      },
    "missingIdBehavior": "GENERATE",
    "persistMode": "IF_NO_ERROR_OR_MATCH",
    "responsePayload": "RECORDS"
    },
  "persistRecords": {
    "Customer": [
      {
        "CustomerName": "Gadgetron"
      }
    ]
  }
}

Response format

The response includes, in the records element, the enriched records, failed validations, and potential matches (if any).
The response also includes the records' status and load details.

Submit a load: sample response

{
    "status": "PERSISTED",   (1)
    "load": {                (2)
        "loadId": 27,
        "loadStatus": "RUNNING",
        "loadCreator": "semadmin",
        "loadCreationDate": "2019-05-06T13:35:44.259Z",
        "programName": "curl",
        "loadDescription": "Load Customers",
        "numberOfJobExecutions": 0,
        "submitInterval": -1,
        "submittable": true
    },
    "records": {
        "Customer": [
            {
                "entityName": "Customer",
                "recordValues": {
                    "PublisherID": "CRM",
                    "SourceID": "5",
                    ...
                    "CustomerName": "Gadgetron",
                    "TotalRevenue": null,
                    "InputAddress.Address": null,
                    ...
                    "GeocodedAddress.Quality": null,
                    "FID_AccountManager": null
                },
                "failedValidations": [],
                "potentialMatches": []
            }
        ]
    }
}

1	The status is `PERSISTED` if the data has been persisted, or `PERSIST_CANCELLED` if the data has not (e.g., if validations failed, or if a match was found).
2	Load information, similar to the information returned when querying a load.

Configure data loads

Persist options

When loading one or more records, the following can be configured in the persistOptions element:

For each entity:

enrichers: defines the enrichers that should be executed before persisting the records. By default, no enricher is executed. Possible values are:

JOB_PRE_CONSO: runs all the enrichers configured with a pre-consolidation only or pre- and post-consolidation scope.
ALL: runs all enrichers defined for the entity (even those whose enrichment scope is set to None).

A list of enricher names in the following format:

[
  "<enricher_name>",
  ...
]

Similar to how step enrichers function in steppers, the enrichers specified in this section of the request payload are executed prior to the certification process. Consequently, enrichers with a pre- or post-consolidation scope will run twice. We recommend configuring them to produce consistent results after each execution.

validations: defines the validations that should be executed after the enrichers. By default, no validation is executed. Possible values are:
- JOB_PRE_CONSO: runs all validations configured with a pre-consolidation only or pre- and post-consolidation scope.
- ALL: runs all the validations defined for the entity (even those whose validation scope is set to None).
- A list of validations with their name and type in the following format:
  [ { "validationType": "<validation_type>", "validationName": "<Validation_name>" }, ... ]
  Possible validationType values are CHECK, PLUGIN, MANDATORY, LOV, FOREIGN, or UNIQUE.
queryPotentialMatchesRules: defines the match rules to use to detect potential matches. By default, no match detection is performed. Possible values are:
- ALL: runs all match rules defined for the entity.
- A list of match rule names, in the following format:
  [ "<match_rule_name>", ... ]
queryPotentialMatchesHighestScoreOnly: when set to true, only the match found with the highest match score is returned in the response. Otherwise, all matches found are returned.

The queryPotentialMatches parameter is deprecated and is replaced with queryPotentialMatchesRules and queryPotentialMatchesHighestScoreOnly.
queryPotentialMatchesBaseExpressions: defines the set of base attributes to include in the response for the potential matches found. Possible values are:
- NONE: no attributes.
- USER_ATTRS (default): all entity attributes, except the built-in attributes and references.
- VIEW_ATTRS: all entity attributes, except references, but including the built-in attributes.
- ID: only the identifier attributes. Depending on the entity type and the matched record’s location, returns the publisher ID, source ID`, golden-record ID, and/or primary key.
queryPotentialMatchesExpressions: expressions to include in the response for the potential matches found, in addition to the base attributes (queryPotentialMatchesBaseExpressions). These expressions are in the following format: [alias]:[semql_expression].
responsePayloadRecordsBaseExpressions: defines the set of base attributes to include in the response for the persisted records. Possible values are:
- NONE: no attributes.
- USER_ATTRS (default): all entity attributes, except the built-in attributes and references.
- VIEW_ATTRS: all entity attributes, except references, but including the built-in attributes.
- ID: only the identifier attributes. Depending on the entity type, PublisherID, SourceID, and/or the primary key are returned.

responsePayloadRecordsExpressions: expressions to include in the response for the persisted records, in addition to the base attributes (responsePayloadRecordsBaseExpressions). These expressions are in the following format: [alias]:[semql_expression].

The SemQL view within which queryPotentialMatchesExpressions and responsePayloadRecordsExpressions are executed determines the scope of data these expressions can interact with and retrieve.

When loading data through the REST API, queryPotentialMatchesExpressions searches both the MD and SD tables, while responsePayloadRecordsExpressions executes either within the SD or SA views, depending on the entity type.
When using the REST API to certify a single record, queryPotentialMatchesExpressions exclusively looks for potential matches within the MD table.

For the entire load:
- missingIdBehavior: option to define whether to generate IDs when they are not provided in the payload. Possible values are GENERATE to generate the ID or FAIL to halt loading if the ID is missing.
- persistMode: defines whether the records should be persisted or not. Possible values are:
  - IF_NO_ERROR_OR_MATCH (default): persists a record if no validation error was raised and no potential match was found.
  - ALWAYS: always persists a record.
  - NEVER: never persists a record.
- responsePayload: specifies the content of the response payload. Possible values are:
  - RECORDS (default): details of the persisted records.
  - SUMMARY: count of persisted records.
  - SUMMARY_AND_RECORDS: count and details of the persisted records.

Update existing records

The REST API allows updating golden records for basic entities and master records for ID- and fuzzy-matched entities.

Single update

An existing record is updated by providing its ID during the data loading process:

For basic entities, provide the ID attribute of the record to be updated.
For ID- and fuzzy-matched entities, provide the source ID and publisher ID of the master record to be updated.

If a record is persisted in a load with an existing record ID, a copy of the record is checked out. Changes are then applied only to the fields specified in the request body, while other attributes retain their current values.

Mass-update

With the MASS_UPDATE_DATA action, the endpoint can be used to update multiple records within the same payload.

Method

POST

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/loads/[data_location_name]/[load_id or load_name]

Request payload

The request includes the MASS_UPDATE_DATA action, as well as the information required to update data. This information includes:

responsePayload specifies the content of the response payload:
- RECORDS (default): details of persisted records.
- SUMMARY: count of persisted records.
- SUMMARY_AND_RECORDS: count and details of persisted records.
persistMode defines whether the records should be persisted or not.

For each entity, the following options can be set:

updateCondition: defines the SemQL expression used to qualify the set of records to update.
updateValues: specifies the list of attributes to update for the entity. At least one of the properties updateValues or updateExpressions is required, and no attribute must be present in both locations.
updateExpression: sets the list of literal values for the attributes to update for the entity.
enrichers: defines the enrichers that should be executed after the mass-update. By default, no enricher is executed.
Possible values are:
- JOB_PRE_CONSO: runs all the enrichers configured with a pre-consolidation only or pre- and post-consolidation scope.
- ALL: runs all enrichers defined for the entity.
- A list of enricher names.
validations: defines the validations that should be executed after the enrichers. By default, no validation is executed.
Possible values are:
- JOB_PRE_CONSO: runs all validations configured with a pre-consolidation only or pre- and post-consolidation scope.
- ALL: runs all the validations defined for the entity.
- A list of validations with their name and type.
queryPotentialMatchesRules: defines the match rules to use to query potential matches. By default, no match detection is performed. Possible values are:
- ALL: runs all the match rules defined for the entity.
- A list of match rule names.
queryPotentialMatchesHighestScoreOnly: when set to true, only the match found with the highest match score is returned in the response. Otherwise, all matches found are returned.

The queryPotentialMatches parameter is deprecated and is replaced with queryPotentialMatchesRules and queryPotentialMatchesHighestScoreOnly.
queryPotentialMatchesBaseExpressions: defines the set of base attributes to include in the response for the potential matches found. Possible values are:
- NONE: no attributes.
- USER_ATTRS (default): all entity attributes, except the built-in attributes and references.
- VIEW_ATTRS: all entity attributes, except references, but including the built-in attributes.
- ID: only the identifier attributes. Depending on the entity type and the matched record’s location, returns the publisher ID, source ID, golden-record ID, and/or primary key.
queryPotentialMatchesExpressions: expressions to include in the response for the potential matches found, in addition to the base attributes (queryPotentialMatchesBaseExpressions). These expressions are in the following format: [alias]:[semql_expression].
responsePayloadRecordsBaseExpressions: defines the set of base attributes to include in the response for the mass-updated records. Possible values are:
- NONE: no attributes.
- USER_ATTRS (default): all entity attributes, except the built-in attributes and references.
- VIEW_ATTRS: all entity attributes, except references, but including the built-in attributes.
- ID: only the identifier attributes. Depending on the entity type, PublisherID, SourceID, and/or the primary key are returned.
responsePayloadRecordsExpressions: expressions to include in the response for the mass-updated records, in addition to the base attributes (responsePayloadRecordsBaseExpressions). These expressions are in the following format: [alias]:[semql_expression].

Mass-update data: sample request

{
  "action": "MASS_UPDATE_DATA",
  "massUpdateOptions": {
    "responsePayload": "SUMMARY",
    "persistMode": "IF_NO_ERROR_OR_MATCH",
    "optionsPerEntity": {
      "Customer": {
        "updateCondition": "Email is not null",
        "updateValues": {
          "Email": "NA",
        },
        },
        "enrichers": [
          "CleanseEmail"
        ],
        "validations": [],
        "queryPotentialMatchesRules": "ALL"
      }
    }
  }
}

Response format

The response includes the status of the request, the load information, and a summary or the list of all the records updated as part of this request.

Mass-update data: sample response

{
  "status": "PERSISTED",
  "load": {                (1)
  ...
  },
  "massUpdateSummary": {
    "Customer":
    {
      "recordsPersisted": 25,
      "recordsWithFailedValidations": 4,
      "recordsWithPotentialMatches": 0
    }
  }
}

1	Load information, similar to the information returned when querying a load.

Use the restApiMassUpdateFetchBatchSize system property to change the fetch batch size when mass updating records. The default value is 1,000.

Set the auditing fields

By default, the auditing fields (i.e., Creator, Updator, CreateDate, and UpdateDate) are automatically set to the current username and date when a user publishes data.

However, users with specific privileges may manually set these values, enabling actions like backdating or publishing data on behalf of other users.

Users whose role has been configured with the Allow publishing as user in API option enabled in a model privilege grant can set the auditing fields when publishing data via the REST API.

Enrich, validate, and detect matches

When loading or updating data, enrichers, validations, and matchers can be executed for each entity.

Enrich and validate

For each entity to configure, one element—named after the entity—can be defined under the optionsPerEntity element. For each entity, it is possible to:

Specify a list of enrichers to run, with their enricher names.
Specify a list of validations to execute, with their validationType and validationName.

Detect matches

When loading data, queryPotentialMatchesRules can be used to specify whether the platform should check for duplicates according to the matching rules defined for the entity.

When checking for duplicates, the response includes master records that potentially match an incoming record. This helps identify the reasons for the matches.

The attributes returned can be defined to only include those required for a specific use case, with two additional properties:

queryPotentialMatchesBaseExpressions defines the set of base attributes to include in the master records detected as potential matches:
- NONE: no attributes.
- USER_ATTRS: all attributes, except built-in attributes and references.
- VIEW_ATTRS: all attributes, except references, but including built-in attributes.
- ID: only the identifier attributes. Depending on the entity type and the matched record’s location, returns the publisher ID, source ID, golden-record ID, and/or primary key.
queryPotentialMatchesExpressions defines a list of expressions to return in addition to the set of base attributes. These expressions are in the following format: <alias>:<semql_expression>.

For example, the following request looks for potential matches.

Request for testing potential matches for the Person entity

{
  "action":"PERSIST_DATA",
  "persistOptions": {
    "defaultPublisherId": "CRM",
    "optionsPerEntity": {
      "Person": {
        "queryPotentialMatchesRules": ["SameExactEmailMatchName"],  (1)
        "enrichers":["CleanseEmail"],
        "validations":[],
        "queryPotentialMatchesBaseExpressions": "NONE", (2)
        "queryPotentialMatchesExpressions": { (2)
          "Name": "Concat(FirstName, ' ', LastName)",
          "Email": "CleansedEmail",
          "Golden ID": "Gold_ID",
          "Master ID": "ID"
        }
      }
    },
    "missingIdBehavior": "FAIL",
    "persistMode": "NEVER"   (3)
  },
  "persistRecords": {
    "Person": [
    {
      "SourceID": "99998",
      "FirstName": "John",
      "LastName": "Doe",
      "DateOfBirth": "1974-01-25",
      "SourceEmail": "jass@ellerbusch.com"
    }
    ]
  }
}

1	Trigger potential match detection for records using one match rule. Note that, since the matcher uses enriched values, the CleanseEmail enricher is also triggered.
2	Select the information returned for the potential matches. Since `queryPotentialMatchesBaseExpressions` is set to `NONE`, only the expressions defined in the `queryPotentialMatchesExpressions` are returned.
3	This value for `persistMode` never persists records. This call only finds potential matches.

The response to this request is as follows:

Response with one potential match found

{
  "status": "PERSIST_CANCELLED",
  "load": {
    ...
  },
  "records": {
    "Person": [
      {
        "entityName": "Person",
        "recordValues": {
          "CleansedEmail": "jass@ellerbusch.com",
          "SourceEmail": "jass@ellerbusch.com",
          "DateOfBirth": "1974-01-25",
          "FirstName": "John",
          "SourceID": "99998",
          "PublisherID": "CRM",
          "LastName": "Doe",
          ...
        },
        "failedValidations": [],
        "potentialMatches": [
          {
            "matchRuleName": "ExactEmailMatch",
            "matchScore": 74,
            "matchedRecordLocation": "MD",
            "matchedRecordId": {
              "SourceID": "1320830",
              "PublisherID": "CRM"
            },
            "matchedRecordData": {
              "Name": "Jass Ellerbusch",
              "Email": "jass@ellerbusch.com",
              "Golden ID": 10002,
              "Master ID": "CRM.1320830"
            }
          }
        ]
      }
    ]
  }
}

Data loading behavior

When invoked with a payload, the REST operation runs enrichers, validations, and matchers for each record, depending on the entity configuration.

It then returns:

The enriched data.
A list of validation errors (if any).
A list of potential matches detected by the matching rules.

Records may or may not be persisted at this stage, depending on the persistMode option:

If set to ALWAYS, records are persisted even with errors or potential matches.
If set to NEVER, records are not persisted. Use this option to perform a dry run to test your records.
If set to IF_NO_ERROR_OR_MATCH (default), records are persisted only if no validation error or potential match occurs.

Publish deletions

The endpoint can be used to load data with the DELETE_DATA action and publish record deletions.

Method

POST

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/loads/[data_location_name]/[load_id or load_name]

Request payload

The request includes the DELETE_DATA action, as well as the information required to delete data. This information includes:

deleteOptions, which defines how records are deleted:
- The deleteType property defines whether to use HARD_DELETE or SOFT_DELETE.
- The recordType property defines whether to delete GOLDEN or MASTER records.
deleteRecords, which includes the IDs to identify the records to delete:
- To delete a fuzzy- or ID-matched golden record or a basic entity record: the golden-record ID.
- To delete a master record for a fuzzy-matched entity: the PublisherID, SourceID pair.
- To delete a master record for an ID-matched entity: the PublisherID, <id_attribute> pair.

Submitting other information than these IDs in this property is considered an error.

Delete data: sample request to delete a golden record

{
  "action":"DELETE_DATA",
  "deleteOptions": {
    "deleteType": "SOFT_DELETE",
    "recordType": "GOLDEN"
    },
  "deleteRecords": {
    "Customer": [
      {
        "CustomerID": "123456"
      }
   ]
  }
}

Response format

The response includes the request status, load details, and a list of all the records deleted in this request (including child records deleted through cascading actions).

Delete data: sample response

{
  "status": "PERSISTED",   (1)
  "load": {                (2)
  ...
  },
  "records": {
    "Customer": [
    {
      "entityName": "Customer",
      "CustomerID": "123456",
      "status": "DELETABLE" (3)
    },
    {
      "entityName": "Contact",
      "CustomerID": "778899",
      "status": "DELETABLE"
    },
    ...
    ]
  }
}

1	The status is `PERSISTED` if the deletion request has been persisted, or `PERSIST_CANCELLED` if the request has not.
2	Load information, similar to the information returned when querying a load.
3	Record deletion status.

The record deletion status indicates whether the record can be deleted or not. Possible statuses are:

DELETABLE: the record is deletable and will be deleted if the load is submitted.
PERMISSION_DENIED: the record cannot be deleted due to a privilege issue.
CHILD_RESTRICTION: the record cannot be deleted due to a restriction on a child record.
RECORD_NOT_FOUND: the record is not found and cannot be deleted.
OTHER_ERROR: another error occurred during deletion.

Submit a load

To submit a load, the URL must include the load ID that was returned when the load was created.

Method	`POST`
Base URL	`http://<host>:<port>/semarchy/api/rest/`
URL	`[base_url]/loads/[data_location_name]/[load_id]`
Request payload	The request includes the `SUBMIT` action, as well as the job to use when submitting the data. Submit a load: sample request `{ "action":"SUBMIT", "jobName": "INTEGRATE_DATA" }`
Response format	The response includes the load’s details, including the load ID, batch ID, and an indication of its status. Submit a load: sample response `{ "loadId": 27, "loadStatus": "PENDING", "loadCreator": "semadmin", "loadCreationDate": "2019-05-06T13:35:44.259Z", "programName": "curl", "loadDescription": "Load Customers", "loadSubmitDate": "2019-05-06T13:39:02.807Z", "batchSubmitter": "semadmin", "batchId": 24, "integrationJobName": "INTEGRATE_DATA", "integrationJobQueueName": "Default", "numberOfJobExecutions": 0, "submitInterval": -1, "submittable": true, "loadType": "EXTERNAL_LOAD" }`

Method

POST

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/loads/[data_location_name]/[load_id]

Request payload

The request includes the SUBMIT action, as well as the job to use when submitting the data.

Submit a load: sample request

{
  "action":"SUBMIT",
	"jobName": "INTEGRATE_DATA"
}

Response format

The response includes the load’s details, including the load ID, batch ID, and an indication of its status.

Submit a load: sample response

{
  "loadId": 27,
  "loadStatus": "PENDING",
  "loadCreator": "semadmin",
  "loadCreationDate": "2019-05-06T13:35:44.259Z",
  "programName": "curl",
  "loadDescription": "Load Customers",
  "loadSubmitDate": "2019-05-06T13:39:02.807Z",
  "batchSubmitter": "semadmin",
  "batchId": 24,
  "integrationJobName": "INTEGRATE_DATA",
  "integrationJobQueueName": "Default",
  "numberOfJobExecutions": 0,
  "submitInterval": -1,
  "submittable": true,
  "loadType": "EXTERNAL_LOAD"
}

Cancel a load

To cancel a load, the URL must include the load ID that was returned when the load was created.

Method	`POST`
Base URL	`http://<host>:<port>/semarchy/api/rest/`
URL	`[base_url]/loads/[data_location_name]/[load_id]`
Request payload	The request includes only the `CANCEL` action. Cancel a load: sample request `{ "action":"CANCEL" }`
Response format	The response includes the load ID as well as an indication of the status. Cancel a load: sample response `{ "loadId": 28, "loadStatus": "CANCELED", "loadCreator": "semadmin", "loadCreationDate": "2019-05-06T13:41:52.865Z", "programName": "curl", "loadDescription": "Load Customers", "numberOfJobExecutions": 0, "submitInterval": -1, "submittable": true, "loadType": "EXTERNAL_LOAD" }`

Method

POST

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/loads/[data_location_name]/[load_id]

Request payload

The request includes only the CANCEL action.

Cancel a load: sample request

{
  "action":"CANCEL"
}

Response format

The response includes the load ID as well as an indication of the status.

Cancel a load: sample response

{
  "loadId": 28,
  "loadStatus": "CANCELED",
  "loadCreator": "semadmin",
  "loadCreationDate": "2019-05-06T13:41:52.865Z",
  "programName": "curl",
  "loadDescription": "Load Customers",
  "numberOfJobExecutions": 0,
  "submitInterval": -1,
  "submittable": true,
  "loadType": "EXTERNAL_LOAD"
}

Load and submit data

Using the REST API, it is possible to create a load, load data (or request deletions), and submit the load in a single request.

Method

POST

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/loads/[data_location_name]

Request payload

The request includes the CREATE_LOAD_AND_SUBMIT action, as well as the information required to create a new load, load data, and submit the load. This includes the persistOptions and persistRecords elements.

Load and submit data: sample request

{
  "action":"CREATE_LOAD_AND_SUBMIT",
	"programName": "curl",
	"loadDescription": "Customer Load",
	"jobName": "INTEGRATE_DATA",
	"persistOptions": (1)
	"persistRecords": (1)
	"deleteOptions" : (2)
	"deleteRecords" : (2)
}

1	For more information about `persistOptions` and `persistRecords`, see Configure data loads.
2	For more information about `deleteOptions` and `deleteRecords`, see Publish deletions.

Response format

The response includes the load ID, as well as an indication of the status.

Load and submit data: sample response

{
    "status":  (1)
    "records": (2)
    "load":    (3)
}

1	`PERSISTED` or `PERSIST_CANCELLED` if the data has not been persisted (e.g., if a match was found).
2	For more information about this payload, see Load data.
3	Load information, similar to the information returned when querying a load.

Manage a load

When a load has been submitted, it is still possible to manage it using the REST API.

Method	`POST`
Base URL	`http://<host>:<port>/semarchy/api/rest/`
URL	`[base_url]/loads/[data_location_name]/[load_id]`
Request payload	The request includes the `SUSPEND`, `KILL`, or `RESUME` actions to perform a management operation on the load. Manage a load: sample request `{ "action":"RESUME" }`
Response format	The response includes the load’s details and its new state. If the requested operation is not possible, an error is returned.

Method

POST

Base URL

http://<host>:<port>/semarchy/api/rest/

URL

[base_url]/loads/[data_location_name]/[load_id]

Request payload

The request includes the SUSPEND, KILL, or RESUME actions to perform a management operation on the load.

Manage a load: sample request

{
  	"action":"RESUME"
}

Response format

The response includes the load’s details and its new state. If the requested operation is not possible, an error is returned.

Combine this endpoint with the capability to query loads for automating production monitoring (e.g., to automatically resume suspended jobs).