Semarchy AI zero-shot classification enricher
The Semarchy AI zero-shot classification enricher classifies text inputs into predetermined categories without requiring explicit training on labeled data.
Plugin ID
Semarchy AI Zero-Shot Classification Enricher - com.semarchy.engine.plugins.ai.classification.zeroshot
Description
The AI zero-shot classification enricher leverages advanced machine learning techniques to analyze text inputs and assign data to predetermined categories, without prior training on labeled datasets. This enricher enhances the efficiency of organizing and managing data without the need for extensive manual classification efforts.
Plugin parameters
The following table lists the plugin parameters.
Parameter name | Mandatory | Type | Description | ||
---|---|---|---|---|---|
API Key |
Yes |
String |
Client-side API key for establishing connectivity with the Hugging Face API. |
||
Model |
Yes |
String |
Classification model to use. Any zero-shot classification model is applicable (e.g., |
||
Base URL |
Yes |
String |
Base URL for the Hugging Face model, available either directly on Hugging Face or on Azure (e.g., |
||
Deployment |
No |
String |
Preferred method for accessing the Hugging Face API, which can be done either through direct API calls to Hugging Face or by routing the requests via an alternative provider. Possible values are:
|
||
Datasource |
No |
String |
Name of the platform datasource from which class information is retrieved. If not specified, the enricher defaults to using the data location’s datasource. |
||
Candidate Class (JSON)* |
No |
String |
JSON-formatted query containing one or more
|
||
Candidate Class Table |
No |
String |
Name of a specific table or column from which to retrieve and organize class information. For example:
|
||
Candidate Class (Custom SQL)* |
No |
String |
Custom SQL query for retrieving potential classes for the input text (e.g.,
|
||
Min Score To Classify |
No |
Number |
Value between 0 and 100 used to define the minimum confidence score required for the enricher to classify input text into one of the potential classes. |
||
Multi-Label |
No |
Boolean |
Choice of whether multiple labels (i.e., candidate classes) can be assigned to a single input text sample. |
||
Use Cache |
No |
Boolean |
Choice of whether to use the cache layer on the inference API to accelerate the processing of requests that have been made previously.
|
||
Wait For Model |
No |
Boolean |
Choice of whether to wait for the model to be ready before processing requests or immediately returning a 503 error indicating that the service is unavailable.
|
* Choose one of these options to specify the candidate classes for classification.
Classification models
The AI zero-shot classification enricher is powered by machine learning models that are accessed through the Hugging Face API. The enricher allows flexibility in choosing a model that suits the nature of the data to be classified, considering factors such as domain-specific requirements and performance characteristics.
Here are some relevant models for zero-shot classification:
-
facebook/bart-large-mnli
: versatile and highly accurate, making it suitable for general-purpose classification tasks, though it has longer inference times due to its large size. -
cross-encoder/nli-roberta-base
: balances performance and efficiency, making it ideal for zero-shot classification tasks that require lower latency without significant accuracy trade-offs. -
MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli
: robust and highly accurate, particularly effective for complex or nuanced texts and tasks involving fact-checking or adversarial examples.
For more information on classification models, see the official Hugging Face documentation or the official documentation about Hugging Face on Azure.
Plugin inputs
The following table lists the plugin inputs.
Input name | Mandatory | Type | Description |
---|---|---|---|
Input Text |
Yes |
String |
Text sample to classify, provided as a string. |
Plugin outputs
The following table lists the plugin outputs.
Output name | Type | Description | ||
---|---|---|---|---|
Most Probable Class ID |
String |
The category that the enricher identifies as the most likely classification for the input text based on the predefined labels. |
||
Classification Score |
Number |
The confidence level associated with the most probable class identified by the enricher, represented as a percentage.
|
Examples and use cases
Imagine a scenario where a new record is added to a product catalog with the following description:
"This adorable summer dress features a vibrant floral print on soft, breathable cotton, ensuring comfort and style for any occasion. Its A-line silhouette and practical details, like side pockets and a back zipper, make it a versatile wardrobe essential."
Suppose the zero-shot classification enricher is configured as follows:
-
In the plugin parameters:
-
Candidate Class (JSON):
{"GIRLSCLOTHING":"Girls' clothing", "BOYSCLOTHING":"Boys' clothing", "GIRLSSHOES":"Girls' shoes", "BOYSSHOES":"Boys' shoes"}
or -
Candidate Class Table:
GD_FAMILY
Candidate Class ID:ID
Candidate Class Label:NAME
or -
Candidate Class (Custom SQL):
SELECT ID, NAME FROM GD_FAMILY WHERE 1=1
-
Min Score To Classify:
50
-
-
In the plugin input properties:
-
Input Text:
Description
-
-
In the plugin output properties:
-
FID_Family: Most Probable Class ID
-
EnrichmentConfidenceScore: Classification Score
-
The enricher automatically classifies the new product record into the Girls' clothing family with a confidence score of 77, using the facebook/bart-large-mnli
model.
Common use cases for zero-shot classification may include:
-
Supplier data classification: automatically classify supplier descriptions into predefined categories such as "electronics components," "raw materials," "office supplies," or "logistic services" to facilitate procurement, auditing, and strategic sourcing.
-
Customer segmentation: based on their activity descriptions, segment customers into categories such as "high-value", "medium-value," "low-value," or "loyalty program member" to enable targeted marketing, improved customer service, and personalized offerings.
-
Product type categorization: analyze product names or descriptions and classify them into categories such as "electronics," "apparel," "home goods," or "personal care," to enhance searchability, inventory management, and reporting efficiency for large product inventories.