Semarchy Fuzzy lookup classification enricher
The Semarchy Fuzzy lookup classification enricher automatically classifies records using SemQL-based fuzzy lookup rules.
Plugin ID
Semarchy Fuzzy Lookup Classification Enricher - com.semarchy.engine.plugins.fuzzy.lookup.classification
Description
The Fuzzy lookup classification enricher automatically classifies records by applying SemQL-based fuzzy lookup rules. It compares child records with parent records, evaluates attribute similarity, and generates a lookup score to identify the most suitable reference.
This enricher serves as an alternative to AI-based classification methods, offering a rule-based approach for categorizing data.
Plugin parameters
The following table lists the plugin parameters.
Parameter name | Mandatory | Type | Description | ||
---|---|---|---|---|---|
Creation Threshold |
Yes |
Integer |
The score threshold above which the reference ID is automatically assigned after a lookup (range: 0-100). |
||
Entity Name |
Yes |
String |
The name of the entity to which the fuzzy lookup rule applies.
|
||
Fuzzy Lookup Rule |
Yes |
String |
The fuzzy lookup rule to apply. |
Plugin inputs
The following table lists the plugin inputs.
Input name | Mandatory | Type | Description | ||
---|---|---|---|---|---|
Data Location (System-Managed)* |
Yes |
String |
The name of the data location containing the child and parent records to compare. For internal use only; no configuration required. |
||
Load ID (System-Managed)* |
Yes |
String |
The identifier of a specific load of records to compare. For internal use only; no configuration required. |
||
Username (System-Managed)* |
Yes |
String |
The name of the connected user. For internal use only; no configuration required. |
||
User roles (System-Managed)* |
Yes |
String |
The list of roles for the connected user. For internal use only; no configuration required. |
||
View Type (System-Managed)* |
Yes |
String |
The type of view for the attributes to evaluate against the reference record. For internal use only; no configuration required. |
||
Record ID |
No |
String |
The unique identifier of the reference record, applicable to basic entities and fuzzy-matching entities with enrichment scopes set to pre- and post-consolidation, post-consolidation, or none.
|
||
Publisher ID** |
No |
String |
The identifier of the source publisher system for the reference record, applicable to fuzzy-matching entities with enrichment scope set to pre-consolidation. Use in conjunction with Source ID. |
||
Source ID** |
No |
String |
The identifier of the reference record in the source publisher system, applicable to fuzzy-matching entities with enrichment scope set to pre-consolidation. Use in conjunction with Publisher ID. |
* These parameters are automatically set upon the enricher execution to specific SemQL variables and do not require configuration. Any modifications made to these parameters will be ignored.
** Use these parameters in conjunction with each other to ensure proper functionality.
Since the enricher currently supports only basic entities, the Source ID and Publisher ID parameters are not applicable. |
Plugin outputs
The following table lists the plugin outputs.
Output name | Type | Description |
---|---|---|
Best Match ID (NUMBER) |
Number |
The ID of the referenced entity with the highest lookup score, represented as a number. |
Best Match ID (STRING) |
String |
The ID of the referenced entity with the highest lookup score, represented as a string. |
Best Match ID (UUID) |
UUID |
The ID of the referenced entity with the highest lookup score, represented as a UUID. |
Lookup Score |
Number |
A numerical value that represents the degree of similarity between a record and its reference, according to the specified fuzzy lookup rule. |
Fuzzy Lookup Rule Name |
String |
The name of the fuzzy lookup rule applied to identify the most suitable reference. |
Examples and use cases
Relevant use cases for fuzzy lookup classification may include:
-
Product categorization in e-commerce: classify products in a consolidated catalog when descriptions from multiple suppliers differ in formatting, terminology, and structure, which may make standardization challenging.
-
Patient record matching in healthcare: link patient records from multiple clinics to a master index by evaluating similarities in names, birth dates, and addresses, even when information includes misspellings, address changes, or incomplete details.
-
Employee record validation in HR systems: consolidate employee records from regional systems into a global directory while resolving discrepancies in names, IDs, or contact information that hinder accuracy.