Enrichers
Enrichers normalize, standardize, and enrich data loaded or authored in the hub. Additional post-consolidation enrichers can apply to consolidated data resulting from the match-and-merge process.
Overview
Enrichers have the following characteristics:
-
Several enrichers can be defined for a single entity and are executed sequentially according to their order in the model.
-
Enrichers can be enabled or disabled for integration jobs. When disabled for integration jobs, enrichers are available for data authoring purposes.
-
Enrichers can be configured to run either on source data (i.e., pre-consolidation) or consolidated data (i.e., post-consolidation).
-
Enrichers running pre-consolidation apply to all incoming data, with the option to define filters so that only specific records are modified by the enricher.
Basic entities only use pre-consolidation enrichers. Post-consolidation enrichers will not run on basic entities. |
There are two types of enrichers:
-
SemQL enrichers express the enrichment rule in the SemQL language. The hub’s database engine executes these enrichers.
-
API enrichers use Java plugins or REST clients. The Semarchy xDM engine runs such enrichers. API enrichers let you perform data transformations that cannot be carried out within the database. For example, if they required an online API or an external library.
Create SemQL enrichers
A SemQL enricher enriches several attributes of an entity using attributes from this entity, transformed using SemQL expressions and functions.
To create a SemQL enricher:
-
Expand the entity node, right-click the Enrichers node, and select Add SemQL Enricher.
The Create New SemQL Enricher wizard opens. -
In the Create New SemQL Enricher wizard, check the Auto Fill option, and then enter the following values:
-
Name: internal name of the object.
-
Label: user-friendly label for this object. As the Auto Fill option is selected, the Label field is automatically populated. Modifying this label is optional.
-
-
Click Next.
-
In the Enricher Expressions page, select the Available Attributes you want to enrich, and click the Add >> button to add them to the Used Attributes.
-
Click Next.
-
(Optional) Click the Edit Expression button to open the expression editor and define a filter targeting only a specific subset of records for enrichment. Skip this task if you want to enrich all records.
-
Click Finish to close the wizard.
The SemQL Enricher editor opens. -
(Optional) In the Description field, enter a description for the SemQL enricher.
-
Select the Enrichment Scope for this enricher. Possible values are:
-
Pre-Consolidation Only
-
Post-Consolidation Only
-
Pre- and Post-Consolidation
-
None (i.e., not executed in jobs)
-
-
Define the enricher expressions:
-
In the Enricher Expressions table, select the Expression column for the attribute you want to enrich and then click the Edit Expression button. The SemQL editor opens.
-
Create a SemQL expression to load the attribute to enrich, and then click OK to close the SemQL editor.
-
Repeat the previous steps to set an expression for each attribute to enrich.
-
-
Press Control+S (or Command+S on macOS) to save the editor.
-
Close the editor.
When running multiple SemQL enrichers on the same entity, you can configure the Enricher aggregation in the integration jobs running these enrichers for faster processing. |
Create API enrichers
An API enricher enriches and standardizes data in an entity, using values from this entity, which are transformed using a Java plugin or a REST client.
An API enricher has:
-
a list of inputs, which are mapped on source attributes or SemQL expressions.
-
a list of parameter values, for plugins only, to configure the plugin behavior.
-
a list of outputs, which are mapped on target attributes.
An API enricher receives the inputs and parameters, processes them and issues outputs which are then loaded into the target attributes.
|
To create a plugin enricher:
-
Expand the entity node, right-click the Enrichers node, and select Add API Enricher.
The Create New API Enricher wizard opens. -
In the Create New API Enricher wizard, select REST Client or Java Plugin.
-
Select the REST client or Java plugin in the drop-down list.
This list shows the built-in plugins and those installed in the platform, or the list of REST clients available in the platform. -
Select the Auto Fill checkbox and then enter the following values:
-
Name: internal name of the object.
-
Label: user-friendly label for this object. As the Auto Fill option is selected, the Label field is automatically populated. Modifying this label is optional.
-
-
Click Next.
-
(Optional) Click the Edit Expression button to open the expression editor and define a filter targeting only a specific subset of records for enrichment. Skip this task if you want to enrich all records.
-
Click Finish to close the wizard.
The API Enricher editor opens. The Plugin Params, Inputs, and Outputs tables show the parameters (for a Java plugin only) and inputs/outputs for the selected Java plugin or REST client. -
Select the Enrichment Scope for this enricher. Possible values are:
-
Pre-Consolidation Only
-
Post-Consolidation Only
-
Pre- and Post-Consolidation
-
None (i.e., not executed in jobs)
-
-
(Optional) For a Java plugin, the mandatory parameters are listed in the Plugin Params. Add the parameters that you need to set:
-
In the Plugin Params table, click the Define Parameters button.
-
In the Parameters dialog, select the Available Parameters you want to add and click the Add >> button to add them to the Used Parameters.
-
Click Finish to close the dialog.
-
-
Set the values for the Java plugin parameters:
-
Click the Value column in the Plugin Params table in front of a parameter.
The cell becomes editable. -
Enter the value of the parameter in the cell, and then press Enter.
-
Repeat the previous steps to set the value for the other parameters.
-
-
Define the Inputs of the enricher. For a Java plugin, the mandatory inputs are automatically listed in the Inputs.
Add the inputs that you need to set for the enricher:-
In the Inputs table, click the Define Inputs button.
-
In the Define Input Bindings dialog, select the Available Inputs you want to add and click the Add >> button to add them to the Used Inputs.
-
Click Finish to close the dialog.
-
-
Set the values for the inputs:
-
Click the Expression column in the Inputs table for an input and then click the Edit Expression button. The SemQL editor opens.
-
Edit the SemQL expression using the attributes to feed the plugin or REST client input and then click OK to close the SemQL Editor.
-
Repeat the previous steps to set an expression for other inputs.
-
-
Define the attributes to enrich in the Outputs table:
-
In the Outputs table, click the Define Outputs button.
-
In the Output Bindings dialog, select in the Available Attributes list those that you want to enrich and then click the Add >> button to add them to the Attributes Used.
-
Click Finish to close the dialog.
-
-
For each attribute in the Outputs table, select in the Output Name column the plugin or REST client output used to enrich that attribute.
-
(Optional) Define advanced configuration properties to optimize and configure the API enricher execution.
-
Press Control+S (or Command+S on macOS) to save the editor.
-
Close the editor.
Advanced enricher configuration
Plugins and REST clients
The enrichers using plugins and REST clients provide options for optimizing and configuring their execution.
The following properties appear in the Advanced Configuration section of the editor:
-
Max Retries: if the execution of the REST client or plugin fails, it is repeated this number of times.
-
Behavior on Error: if the execution still fails after the Max Retries have been attempted, the plugin or REST client either skips the current record, skips the entire task, or stops the whole job, depending on this property.
-
Thread Pool Size: this property defines the number of parallel threads used when running the plugin or REST client. For plugins, this option is taken into account only if the plugin used is thread-safe and declared as such.
-
Batch Update Size: this property defines the batch update size used by an enricher to write records to the database.
If Batch Update Size is left empty, the batch update is set to 1,000 to optimize performance. -
Processing Batch Size: this property defines the size of the record batches processed by each thread of a Java plugin enricher. When configuring this option, bear in mind that records in a batch are processed together. If one record in a batch fails, the entire batch fails and all the records in this batch are processed according to the Max Retries and Behavior on Error properties. This property is not available for REST clients.
In addition, xDM comes with features to optimize the execution of the enrichers, including:
-
Enricher caching, which apply to API enrichers.
-
Enricher aggregation, which applies to both SemQL and API enrichers.