Semarchy AI-based few-shot classification enricher
The Semarchy AI-based few-shot classification enricher classifies text inputs into predetermined categories using minimal labeled examples.
Plugin ID
Semarchy AI Few-Shot Classification Enricher - com.semarchy.engine.plugins.ai.classification.fewshot
Description
The few-shot classification enricher is a sophisticated tool that generalizes from a few labeled data—called sentences—in order to effectively identify and categorize new, unseen inputs. This approach is particularly useful in specialized contexts where there is limited labeled data available for training generic machine learning models.
Plugin parameters
The following table lists the plugin parameters.
Parameter name | Mandatory | Type | Description |
---|---|---|---|
API Key |
Yes |
String |
Client-side API key for establishing connectivity with the Hugging Face API. |
Model |
Yes |
String |
Classification model to use. Any few-shot classification model is applicable (e.g., |
Datasource |
No |
String |
Name of the platform datasource from which class information is retrieved. If not specified, the enricher defaults to using the data location’s datasource. |
Sentence (JSON)* |
No |
String |
JSON-formatted query containing labeled data examples, provided as |
Sentence Table |
No |
String |
Labeled text samples associated with a specified table and identifiers, to help the model learn how to associate similar inputs with the correct labels. For example:
|
Sentence (Custom SQL)* |
No |
String |
Custom SQL query for retrieving existing classes and their corresponding labels to help the model learn how to associate similar inputs with the correct labels. Basic query
SELECT F_BRAND, DESCRIPTION FROM GD_PRODUCT WHERE F_DESCRIPTION IS NOT NULL Specifying a maximum number of sentences to retrieve per brand
SELECT ID, LABEL FROM ( SELECT F_BRAND AS ID, DESCRIPTION AS LABEL, ROW_NUMBER() OVER (PARTITION BY F_BRAND ORDER BY b_upddate DESC) AS n FROM GD_PRODUCT WHERE DESCRIPTION IS NOT NULL ) x WHERE n <= 20 |
Min Score To Classify |
No |
Number |
Value between 0 and 100, used to define the minimum confidence score required for the enricher to classify input text into one of the potential classes. |
Max Samples Per Class |
No |
Number |
Maximum number of examples to use for each candidate class. |
* Choose one of these options to specify sentences—that is, labeled examples from which the model will learn.
Classification models
The few-shot classification enricher is powered by machine learning models that are accessed through the Hugging Face API. The enricher allows flexibility in choosing a model that suits the nature of the data to be classified, considering factors such as domain-specific requirements and performance characteristics.
Here are some relevant models for few-shot classification:
-
sentence-transformers/all-MiniLM-L6-v2
: a compact transformer model optimized for tasks like clustering or semantic search. -
sentence-transformers/all-mpnet-base-v2
: an efficient model applicable for tasks such as information retrieval, clustering, or assessing sentence similarity.
For more information on classification models, see the official Hugging Face documentation.
Plugin inputs
The following table lists the plugin inputs.
Input name | Mandatory | Type | Description |
---|---|---|---|
Input Text |
Yes |
String |
Text sample to classify, provided as a string. |
Plugin outputs
The following table lists the plugin outputs.
Output name | Type | Description | ||
---|---|---|---|---|
Most Probable Class ID |
String |
The category that the enricher identifies as the most likely classification for the input text based on the predefined labels. |
||
Classification Score |
Number |
The confidence level associated with the most probable class identified by the enricher, represented as a percentage.
|
Examples and use cases
Imagine a scenario where a new record is added to a product catalog with the following description:
"The Heritage Houndstooth Blazer, crafted from a blend of 70% organic cotton and 30% Tencel™, is GOTS-certified, FLA-compliant, and produced in WRAP-certified facilities. Featuring PrecisionFit tailoring and Repreve® recycled polyester lining, it combines sustainability with modern design."
This product catalog includes various brands spanning regular, luxury, and ethical clothing lines. Below are descriptions of some of the products included in the inventory.
-
In the Everyday Essentials (regular) line:
-
"Wear this Esperanza Contrast Color T-Shirt as a casual t-shirt, pairing it with jeans, or wear it as a smart shirt with bright shorts. It’s a perfect addition to any girl’s wardrobe."
-
"Relaxed-fit jean that fits easily over boots. Casual enough for a date, durable enough for hard work, rugged enough for motorcycle riding."
-
-
In the Prestige (luxury) line:
-
"Luxury V-neck designed for a layered look, this Moretti sweater is the perfect knit fabric for a warm and comfortable pullover. This lightweight sweater provides a slim fit for the fashionable man who wants to look meticulous and well-dressed."
-
"Valdo Cashmere is a men’s stylish cashmere wool blended double-breasted pea coat with bronze parallel buttons. The finest materials are used to create this pea coat that makes this piece a warm overcoat for spring and winter seasons."
-
-
In the Eco-Conscious (ethical) line:
-
"Introducing the Vegan Leather Jacket, made from cruelty-free materials and manufactured in facilities adhering to the FLA’s fair labor code. This jacket combines sleek design with ethical principles, ensuring both style and sustainability in every stitch."
-
"Our Fairtrade Wool Blend Coat blends merino wool with recycled fibers, promoting sustainable practices and fair wages for workers. GOTS-certified and dyed with low-impact methods, it offers warmth and style while supporting ethical fashion initiatives."
-
Suppose the few-shot classification enricher is configured as follows:
-
In the plugin parameters:
-
Sentence (JSON):
{"EVERYDAY ESSENTIALS":["Relaxed-fit jean that fits easily over boots. Casual enough for a date, durable enough for hard work, rugged enough for motorcycle riding.", "Wear this Esperanza Contrast Color T-Shirt as a casual t-shirt, pairing it with jeans, or wear it as a smart shirt with bright shorts. It’s a perfect addition to any girl’s wardrobe."], "PRESTIGE":["Luxury V-neck designed for a layered look, this Moretti sweater is the perfect knit fabric for a warm and comfortable pullover. This lightweight sweater provides a slim fit for the fashionable man who wants to look meticulous and well-dressed.", "Valdo Cashmere is a men’s stylish cashmere wool blended double-breasted pea coat with bronze parallel buttons. The finest materials are used to create this pea coat that makes this piece a warm overcoat for spring and winter seasons."], "ECO-CONSCIOUS":["Our Fairtrade Wool Blend Coat blends merino wool with recycled fibers, promoting sustainable practices and fair wages for workers. GOTS-certified and dyed with low-impact methods, it offers warmth and style while supporting ethical fashion initiatives.", "Introducing the Vegan Leather Jacket, made from cruelty-free materials and manufactured in facilities adhering to the FLA’s fair labor code. This jacket combines sleek design with ethical principles, ensuring both style and sustainability in every stitch."]}
or -
Sentence Table:
GD_PRODUCT
Sentence ID:F_LINE
Sentence Text:DESCRIPTION
or -
Sentence (Custom SQL):
SELECT F_LINE, DESCRIPTION FROM GD_PRODUCT WHERE 1=1
-
Min Score To Classify:
50
-
-
In the plugin input properties:
-
Input Text:
Description
-
-
In the plugin output properties:
-
FID_Line: Most Probable Class ID
-
EnrichmentConfidenceScore: Classification Score
-
Based on the product description and provided sentences, the enricher automatically classifies the new product record into the Eco-Conscious clothing line with a confidence score of 72, using the sentence-transformers/all-MiniLM-L6-v2
model.
Relevant use cases for few-shot classification may include:
-
Product code mapping in e-commerce: classify product listings using specific industry codes or SKU numbers (e.g., BSH-043 in "bookshelf systems", DSK-011 in "modular desks", WRD-408 in "wardrobes" for a furniture store) to streamline inventory management, improve search functionality, and enhance user experience.
-
Clinical trial data classification: categorize clinical trial data using medical research codes (e.g., "RCT" for randomized controlled trials, "PK" for pharmacokinetics, "AE" for adverse events) to enhance data analysis, reporting, and regulatory compliance.
-
Employee skill categorization: organize employee profiles according to their educational background (BSc, MEng, PhD, etc.), credentials (PMP, CPA, etc.), software proficiency (DBMS, CRM, CAD, GIS, etc.), and other relevant criteria into categories (e.g., "consulting," "accounting," "IT," "training," or "customer service" in the human resources industry) to support better workforce management, targeted training and development initiatives, and enhanced talent management strategies.