Semarchy Fuzzy lookup classification enricher

The Semarchy Fuzzy lookup classification enricher automatically classifies records using SemQL-based fuzzy lookup rules.

Plugin ID

Semarchy Fuzzy Lookup Classification Enricher - com.semarchy.engine.plugins.fuzzy.lookup.classification

Description

The Fuzzy lookup classification enricher automatically classifies records by applying SemQL-based fuzzy lookup rules. It compares child records with parent records, evaluates attribute similarity, and generates a lookup score to identify the most suitable reference.

This enricher serves as an alternative to AI-based classification methods, offering a rule-based approach for categorizing data.

Plugin parameters

The following table lists the plugin parameters.

Parameter name Mandatory Type Description

Creation Threshold

Yes

Integer

The score threshold above which the reference ID is automatically assigned after a lookup (range: 0-100).

Entity Name

Yes

String

The name of the entity to which the fuzzy lookup rule applies.

This enricher currently supports only basic entities.

Fuzzy Lookup Rule

Yes

String

The fuzzy lookup rule to apply.

Plugin inputs

The following table lists the plugin inputs.

Input name Mandatory Type Description

Data Location (System-Managed)*

Yes

String

The name of the data location containing the child and parent records to compare. For internal use only; no configuration required.

Load ID (System-Managed)*

Yes

String

The identifier of a specific load of records to compare. For internal use only; no configuration required.

Username (System-Managed)*

Yes

String

The name of the connected user. For internal use only; no configuration required.

User roles (System-Managed)*

Yes

String

The list of roles for the connected user. For internal use only; no configuration required.

View Type (System-Managed)*

Yes

String

The type of view for the attributes to evaluate against the reference record. For internal use only; no configuration required.

Record ID

No

String

The unique identifier of the reference record, applicable to basic entities and fuzzy-matching entities with enrichment scopes set to pre- and post-consolidation, post-consolidation, or none.

If the record ID is not of type string, cast it to a string using the appropriate conversion expression (e.g., SEM_TO_CHAR, SEM_UUID_TO_CHAR).

Publisher ID**

No

String

The identifier of the source publisher system for the reference record, applicable to fuzzy-matching entities with enrichment scope set to pre-consolidation. Use in conjunction with Source ID.

Source ID**

No

String

The identifier of the reference record in the source publisher system, applicable to fuzzy-matching entities with enrichment scope set to pre-consolidation. Use in conjunction with Publisher ID.

* These parameters are automatically set upon the enricher execution to specific SemQL variables and do not require configuration. Any modifications made to these parameters will be ignored.

** Use these parameters in conjunction with each other to ensure proper functionality.

Since the enricher currently supports only basic entities, the Source ID and Publisher ID parameters are not applicable.

Plugin outputs

The following table lists the plugin outputs.

Output name Type Description

Best Match ID (NUMBER)

Number

The ID of the referenced entity with the highest lookup score, represented as a number.

Best Match ID (STRING)

String

The ID of the referenced entity with the highest lookup score, represented as a string.

Best Match ID (UUID)

UUID

The ID of the referenced entity with the highest lookup score, represented as a UUID.

Lookup Score

Number

A numerical value that represents the degree of similarity between a record and its reference, according to the specified fuzzy lookup rule.

Fuzzy Lookup Rule Name

String

The name of the fuzzy lookup rule applied to identify the most suitable reference.

Examples and use cases

Relevant use cases for fuzzy lookup classification may include:

  • Product categorization in e-commerce: classify products in a consolidated catalog when descriptions from multiple suppliers differ in formatting, terminology, and structure, which may make standardization challenging.

  • Patient record matching in healthcare: link patient records from multiple clinics to a master index by evaluating similarities in names, birth dates, and addresses, even when information includes misspellings, address changes, or incomplete details.

  • Employee record validation in HR systems: consolidate employee records from regional systems into a global directory while resolving discrepancies in names, IDs, or contact information that hinder accuracy.