Enricher caching
Overview
API Enrichers use libraries or external services to enrich and standardize entity data. Their performance depends on the processing speed and latency of the libraries and services invoked. Caching accelerates the execution of these enrichers by storing, in the data location database, the outputs returned for a given set of inputs and parameters. Subsequent executions of the API enricher look for outputs in the cache before executing the API enricher.
For example, when cleansing and geocoding addresses using a cloud-based service via a REST client, you may want to cache the cleansed and geocoded addresses to avoid requesting them multiple times for the same input addresses. This avoids the latency of the remote service call and reduces the costs associated with duplicate calls to already geocoded address data.
You can create an enricher cache as a standalone object, or create one based on an existing API enricher.
Create an enricher cache
To create an enricher cache from an API enricher:
-
Open the API Enricher editor.
-
In the editor toolbar, click the Create a cache for this enricher configuration button. The Create New Enricher Cache wizard opens.
The REST client/Java plugin, the parameters, and the cached outputs are automatically configured. -
Enter the following values:
-
Name: Internal name of the object.
-
Cache Table Name: This string is used to name the physical table storing the cache data, in the data location. This name is prefixed by
CT_
to create the actual table. -
Validity Duration: Number of hours for which the cached data is considered valid. After this duration, the data is considered stale. The value
0
means that the cached items are valid forever. -
Batch Update Size. Batch update groups cached items and write them together to the cache in one batch, rather than writing them one by one.
-
-
Press Control+S (or Command+S on macOS) to save the editor.
To create an enricher cache:
-
Right-click the Enricher Caches node and select Add Enricher Cache…. The Create New Enricher Cache wizard opens.
-
In the Create New Enricher Cache wizard, select REST Client or Java Plugin.
-
Select the REST client or Java plugin that you want to cache.
-
Enter the following values:
-
Name: Internal name of the object.
-
Cache Table Name: This string is used to name the physical table storing the cache data, in the data location. This name is prefixed by
CT_
to create the actual table. -
Validity Duration: Number of hours for which the cached data is considered valid. After this duration, the data is considered stale. The value
0
means that the cached items are valid forever.
-
-
Click the Edit button to select the Cached Outputs. These are the outputs of the REST client/Java plugin that you want to keep in the cache.
-
Define a Batch Update Size. Batch update groups cached items and write them together to the cache in one batch, rather than writing them one by one.
-
Click Finish to close the wizard. The Enricher Cache editor opens.
-
For a Java plugin, define the parameters of the cache:
-
In the Cached Params table, click the Define Parameters button.
-
In the Parameters dialog, select the Available Parameters you want to add and click the Add >> button to add them to the Used Parameters.
-
Click Finish to close the dialog.
-
-
For a Java plugin, set the values of the parameters:
-
Click the Value column in the Cached Params table in front of a parameter. The cell becomes editable.
-
Enter the value of the parameter in the cell, and then press Enter.
-
Repeat the previous steps to set the value for the other parameters.
-
-
Press Control+S (or Command+S on macOS) to save the editor.
Use an enricher cache
To use an enricher cache in an API enricher:
-
Open the API Enricher editor.
-
Click the … Select a value button for the Enricher Cache property. A dialog appears with the list of caches configured for this plugin or REST client.
-
Select a cache.
-
Press Control+S (or Command+S on macOS) to save the editor.
Best practices for enricher caches
Use a cache for multiple enrichers
You do not need to configure one cache per enricher. If multiple enrichers use the same plugin with common inputs, you may decide to create a cache that will serve all these enrichers. In that case, make sure that:
-
All enrichers use the same values for their Parameters, since these will be defined as Cached Parameters in the cache configuration.
-
The Cached Outputs defined in the cache include all the outputs consumed by all the enrichers.
Cache life cycle
The cache tables are created when you deploy a model to a data location. When deploying, the Clear API Enricher Caches option allows you to delete the content of all the caches in the data location.
Limits of enricher caching
When using caching with API enrichers:
-
Certain properties can no longer be configured for the enricher. For example, the parameters can no longer be modified, since they are part of the cache configuration.
-
Multithreading is disabled. This is similar to having Thread Pool Size set to 1.
-
Record batch processing is disabled for plugin enrichers. This equivalent to having Processing Batch Size set to 1.