Welcome to Semarchy xDM.
This guide provides reference information about the plug-ins delivered with the Semarchy xDM Platform.
Preface
Overview
This guide provides reference information about the plug-ins delivered with the Semarchy xDM Platform.
Using this guide, you will learn how to use these plug-ins in your MDM projects.
Audience
Document Conventions
This document uses the following formatting conventions:
Convention | Meaning |
---|---|
boldface | Boldface type indicates graphical user interface elements associated with an action, or a product specific term or concept. |
italic | Italic type indicates special emphasis or placeholder variable that you need to provide. |
| Monospace type indicates code example, text or commands that you enter. |
Other Semarchy Resources
In addition to the product manuals, Semarchy provides other resources available on its web site: https://www.semarchy.com.
Obtaining Help
There are many ways to access the Semarchy Technical Support. You can call or email our global Technical Support Center (support@semarchy.com). For more information, see https://www.semarchy.com.
Feedback
We welcome your comments and suggestions on the quality and usefulness
of this documentation.
If you find any error or have any suggestion for improvement, please
mail support@semarchy.com and indicate the title of the documentation
along with the chapter, section, and page number, if available. Please
let us know if you want a reply.
Introduction to Semarchy xDM
Semarchy xDM is the Intelligent Data Hub platform for Master Data
Management (MDM), Reference Data Management (RDM), Application Data Management
(ADM), Data Quality, and Data Governance.
It provides all the features for data
quality, data validation, data matching, de-duplication, data authoring,
workflows, and more.
Semarchy xDM brings extreme agility for defining and implementing data management applications and releasing them to production. The platform can be used as the target deployment point for all the data in the enterprise or in conjunction with existing data hubs to contribute to data transparency and quality.
Its powerful and intuitive environment covers all use cases for setting up a successful data governance strategy.
Semarchy xDM Plug-ins
Semarchy xDM implements plug-ins that use external services or information systems to contribute to the master data processing and enrichment.
Plug-ins are used in Semarchy xDM in:
- Enrichers: By adding new enrichers, you can perform record-level enrichment to update, augment or standardize existing attribute values, or create content in new attributes. For example, you can connect to an external web service to retrieve stock ticker symbols from company names.
- Validations: By adding new validations, you can perform record-level checks, that is check the value of attributes in a record against complex rules. For example, you can connect to an external provider to check whether a billing or shipping address is valid or not.
INFO: Using Plug-ins is explained in the Semarchy xDM Developer’s Guide, in the Certification Process Design chapter. Installing plug-ins to your Semarchy xDM instance is explained in the Semarchy xDM Administration Guide, in the Configuring the Platform chapter.
Text Normalization and Transliteration
This plug-in applies normalization, transliteration and phonetic transformations to text strings.
Semarchy Text Enricher
Plug-in ID
Semarchy Text Enricher - com.semarchy.engine.plugins.convergence.text
Description
This enricher applies normalization, transliteration and phonetic transformations to text strings. It takes an Input Text and applies an Input Filter to this text, for example to remove all characters but letters. Then it applies a series of transformations defined in the Transformation parameter and returns a Transformed Text.
Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Input Filter | No | String | Filter applied to the input text before the
transformation. Valid values for the Filter are: |
Transformation | Yes | String | A pipe-separated sequence of transformation definitions. Transformations include:
See the Transformations section for a detailed description of each transformation. |
Synonyms Separator | No | String | Separator used between the synonyms returned by the enricher. Default value is a pipe (|). |
Plug-in Inputs
The following table lists the plug-in inputs.
Input Name | Mandatory | Type | Description |
---|---|---|---|
Input Text | Yes | String | Text to transform. |
Plug-in Outputs
The following table lists the plug-in outputs.
Output Name | Type | Description |
---|---|---|
Transformed Text | String | Filtered and transformed text. |
Secondary Transformed Text | String | Secondary transformed text. This text may contain transformation resulting from a Beidermorse or Double Metaphone transformation. See Other Transformations for more information. |
Input Filters
The following input filters are supported by the enricher:
NONE
: No filter is applied to the input text.LETTERS
: This transformation removes all non-letter characters from the input string.STANDARD
: Breaks words in the input text according to the rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.
Transformations
The following transformations definitions are supported by the enricher:
- Normalization
NORMALIZE
: Performs a Normalization
- Phonetic Transformation
PHONETIC [SOUNDEX | REFINEDSOUNDEX | METAPHONE [<max_code_length>] | DOUBLEMETAPHONE [<max_code_length>] | CAVERPHONE | CAVERPHONE1 | NYSIIS | MRA | COLOGNE | BEIDERMORSE ]
: applies Phonetic Transformations
- Other Transformations
BEIDERMORSE [Split] [RuleType] [MaxPhonems] [NameType]
DOUBLEMETAPHONE [<max_code_length>] [split]
- Transliteration
TRANSLITERATE [<ID>]
apply a Transliteration transformation to the string. The transliteration is identified by an ID. If not ID is provided, the Any-Latin transliteration is used.
It is possible to sequence transformations. Successive transformations
are separated by a pipe |
sign.
Examples of transformations:
- Normalize and apply Phonetic Soundex:
NORMALIZE | SOUNDEX
- Normalize and then transliterate to Latin script:
NORMALIZE | TRANSLITERATE Any-Latin
- Normalize, transliterate to Latin script and then apply Metaphone with
a maximum resulting length of 5 characters:
NORMALIZE | TRANSLITERATE Any-Latin | PHONETIC METAPHONE 5
- Perform a BEIDERMORSE transformation for family names with an approximate transformation on generic name types:
BEIDERMORSE APPROX 10 FALSE GENERIC
Normalization
The NORMALIZE
transformation normalizes the string by applying a
series of transformations, which map similar characters to a common
target, to ignore certain distinctions between similar characters. This
includes accent removal, case folding, etc.
Example of transformations:
Original Text | Normalized Text | Comments |
---|---|---|
‒ – — ― | - - - - | 4 different dashes converted to 4 similar dashes. |
AbSoLuteLy TRUE | absolutely true | CaseFolding |
… | ... | convert [dotdotdot] to [dot dot dot] |
½ Tsp | 1/2 tsp | Symbol folding |
Æsop | aesop | |
Äsop | asop | |
Dürst | durst | |
Encyclopædia | encyclopaedia | |
œuvre | oeuvre | |
poſt | post | |
résumé français | resume francais | Accent removal and case folding |
Straße | strasse | |
٣ is a magic number | 3 is a magic number | Native Digital folding |
The complete list of transformations is given below:
Accent removal | Hebrew Alternates folding | Overline folding | Suzhou Numeral folding |
Case folding | Jamo folding | Positional forms folding | Symbol folding |
Canonical duplicates folding | Letterforms folding | Small forms folding | Underline folding |
Dashes folding | Math symbol folding | Space folding | Vertical forms folding |
Diacritic removal (including stroke, hook, descender) | Multigraph Expansions: All | Spacing Accents folding | Width folding |
Greek letterforms folding | Native digit folding | Subscript folding | Han Radical folding |
For more information about these transformations see the UTR#30 Characters Foldings transformation.
Phonetic Transformations
A phonetic transformation applied to the string transforms it to a
string corresponding to its pronunciation. The default phonetic
transformation is PHONETIC METAPHONE
.
Phonetic transformations include:
PHONETIC SOUNDEX
andPHONETIC REFINEDSOUNDEX
: Phonetic algorithms for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. More information about SoundexPHONETIC METAPHONE
andPHONETIC DOUBLEMETAPHONE
are algorithms for indexing words by their English pronunciation. They are suitable for use with most English words, not just names. Double Metaphone can return both a primary and a secondary code for an input string; this accounts for some ambiguous cases as well as for multiple variants of surnames with common ancestry. These algorithms support a Max Code Length parameter which defines the maximum length of the encoded result. This value default to 4. More Details about Metaphone.PHONETIC CAVERPHONE
andPHONETIC CAVERPHONE1
. Algorithm for data matching for electoral rolls, optimized for accents present in parts of New Zealand. More Details about Caverphone and Caverphone 1PHONETIC NYSIIS
. New York State Identification and Intelligence System (NYSIIS), which maps similar phonemes to the same letter. The result is a string that can be pronounced by the reader without decoding. More Details about NYSIISPHONETIC MRA
: Match Rating Approach developed by Western Airlines - this algorithm has an encoding and range comparison technique. More Details about MRAPHONETIC COLOGNE
Phonetic algorithm optimized for the German language. See Kölner PhonetikPHONETIC BEIDERMORSE
is a phonetic algorithm supporting greater accuracy in matching Slavic and Yiddish surnames with similar pronunciation but differences in spelling. It returns a list of tokens (separated by the string specified in the Synonyms Separator parameter.): first the transformed input text, then the transformed synonyms of the input text. More information about Beidermorse.
Other Transformations
These other transformations return a list of tokens which can be split into the Transformed Text and Secondary Transformed Text outputs.
Other transformations include:
BEIDERMORSE [<split>] [<rule_type>] [<max_phonems>] [<name_type>]
The Beidermorse transformation returns a list of tokens: first the transformed input text, then the transformed synonyms of the input text. Beidermorse supports the following parameters:- split. If this parameter is set to
true
all synonyms after the first one are concatenated in the Secondary Transformed Text output. If this parameter is set tofalse
(default value) all synonyms are appended to the first token in the Transformed Text output. - rule_type is
EXACT
for exact orAPPROX
for approximate phonetic transformation. - max_phonems is the maximum number of synonyms returned. Default is 20.
- name_type default value is
GENERIC
. UseASHKENAZI
orSEPHARDIC
if you specifically want phonetic encodings optimized for Ashkenazi or Sephardic Jewish family names.
- split. If this parameter is set to
DOUBLEMETAPHONE [<max_code_length>] [<split>]
. This transformation encodes the input string with the Double Metaphone algorithm and returns a primary code and a secondary code. If split is set totrue
, then the secondary code is pushed to the Secondary Transformed Text output. Otherwise, it is concatenated to the primary code in the Transformed Text output.
Transliteration
The TRANSLITERATE
transformation transforms a text from one character
script to another. For example, Traditional to Simplified Chinese,
Japanese Hiragana to Katakana, Cyrillic to Latin script.
Each source/target transliteration is identified by an ID. The list of
supported transliteration IDs is provided in the list below. If no ID is
provided, the Any-Latin transliteration is used.
Each ID represents a transliteration from one script/language to another. For example: Katakana-Latin, Latin-thai, etc. The special tag any stands for any script/language. For example, Any-Latin converts any input script to Latin script.
Accents-Any | Any-Name | Devanagari-Bengali | Han-Latin | Latin-Greek | Pinyin-NumericPinyin |
Amharic-Latin/BGN | Any-NFC | Devanagari-Gujarati | Han-Latin/Names | Latin-Greek/UNGEGN | pl_FONIPA-ja |
Any-Accents | Any-NFD | Devanagari-Gurmukhi | Hangul-Latin | Latin-Gujarati | pl-ja |
Any-am | Any-NFKC | Devanagari-Kannada | Hans-Hant | Latin-Gurmukhi | pl-pl_FONIPA |
Any-Arabic | Any-NFKD | Devanagari-Latin | Hant-Hans | Latin-Han | Publishing-Any |
Any-Armenian | Any-Null | Devanagari-Malayalam | Hebrew-Latin | Latin-Hangul | ro_FONIPA-ja |
Any-Bengali | Any-Oriya | Devanagari-Oriya | Hebrew-Latin/BGN | Latin-Hebrew | ro-ja |
Any-Bopomofo | Any-pl_FONIPA | Devanagari-Tamil | Hex-Any | Latin-Hiragana | ro-ro_FONIPA |
Any-CaseFold | Any-Publishing | Devanagari-Telugu | Hex-Any/C | Latin-Jamo | ru-ja |
Any-cs_FONIPA | Any-Remove | Digit-Tone | Hex-Any/Java | Latin-Kannada | ru-zh |
Any-Cyrillic | Any-ro_FONIPA | es_419-ja | Hex-Any/Perl | Latin-Katakana | Russian-Latin/BGN |
Any-Devanagari | Any-ru | es_419-zh | Hex-Any/Unicode | Latin-Malayalam | Serbian-Latin/BGN |
Any-es_419_FONIPA | Any-sk_FONIPA | es_FONIPA-am | Hex-Any/XML | Latin-NumericPinyin | Simplified-Traditional |
Any-es_FONIPA | Any-Syriac | es_FONIPA-es_419_FONIPA | Hex-Any/XML10 | Latin-Oriya | sk_FONIPA-ja |
Any-FCC | Any-Tamil | es_FONIPA-ja | Hiragana-Katakana | Latin-Syriac | sk-ja |
Any-FCD | Any-Telugu | es_FONIPA-zh | Hiragana-Latin | Latin-Tamil | sk-sk_FONIPA |
Any-Georgian | Any-Thaana | es-am | IPA-XSampa | Latin-Telugu | Syriac-Latin |
Any-Greek | Any-Thai | es-es_FONIPA | it-am | Latin-Thaana | Tamil-Bengali |
Any-Greek/UNGEGN | Any-Title | es-ja | it-ja | Latin-Thai | Tamil-Devanagari |
Any-Gujarati | Any-Upper | es-zh | ja_Latn-ko | Macedonian-Latin/BGN | Tamil-Gujarati |
Any-Gurmukhi | Any-zh | Fullwidth-Halfwidth | ja_Latn-ru | Malayalam-Bengali | Tamil-Gurmukhi |
Any-Han | Arabic-Latin | Georgian-Latin | Jamo-Latin | Malayalam-Devanagari | Tamil-Kannada |
Any-Hangul | Arabic-Latin/BGN | Georgian-Latin/BGN | JapaneseKana-Latin/BGN | Malayalam-Gujarati | Tamil-Latin |
Any-Hans | Armenian-Latin | Greek-Latin | Kannada-Bengali | Malayalam-Gurmukhi | Tamil-Malayalam |
Any-Hant | Armenian-Latin/BGN | Greek-Latin/BGN | Kannada-Devanagari | Malayalam-Kannada | Tamil-Oriya |
Any-Hebrew | ASCII-Latin | Greek-Latin/UNGEGN | Kannada-Gujarati | Malayalam-Latin | Tamil-Telugu |
Any-Hex | Azerbaijani-Latin/BGN | Gujarati-Bengali | Kannada-Gurmukhi | Malayalam-Oriya | Telugu-Bengali |
Any-Hex/C | Belarusian-Latin/BGN | Gujarati-Devanagari | Kannada-Latin | Malayalam-Tamil | Telugu-Devanagari |
Any-Hex/Java | Bengali-Devanagari | Gujarati-Gurmukhi | Kannada-Malayalam | Malayalam-Telugu | Telugu-Gujarati |
Any-Hex/Perl | Bengali-Gujarati | Gujarati-Kannada | Kannada-Oriya | Maldivian-Latin/BGN | Telugu-Gurmukhi |
Any-Hex/Plain | Bengali-Gurmukhi | Gujarati-Latin | Kannada-Tamil | Mongolian-Latin/BGN | Telugu-Kannada |
Any-Hex/Unicode | Bengali-Kannada | Gujarati-Malayalam | Kannada-Telugu | Name-Any | Telugu-Latin |
Any-Hex/XML | Bengali-Latin | Gujarati-Oriya | Katakana-Hiragana | NumericPinyin-Latin | Telugu-Malayalam |
Any-Hex/XML10 | Bengali-Malayalam | Gujarati-Tamil | Katakana-Latin | NumericPinyin-Pinyin | Telugu-Oriya |
Any-Hiragana | Bengali-Oriya | Gujarati-Telugu | Kazakh-Latin/BGN | Oriya-Bengali | Telugu-Tamil |
Any-ja | Bengali-Tamil | Gurmukhi-Bengali | Kirghiz-Latin/BGN | Oriya-Devanagari | Thaana-Latin |
Any-Kannada | Bengali-Telugu | Gurmukhi-Devanagari | Korean-Latin/BGN | Oriya-Gujarati | Thai-Latin |
Any-Katakana | Bopomofo-Latin | Gurmukhi-Gujarati | Latin-Arabic | Oriya-Gurmukhi | Tone-Digit |
Any-ko | Bulgarian-Latin/BGN | Gurmukhi-Kannada | Latin-Armenian | Oriya-Kannada | Traditional-Simplified |
Any-Latin (default) | cs_FONIPA-ja | Gurmukhi-Latin | Latin-ASCII | Oriya-Latin | Turkmen-Latin/BGN |
Any-Latin/BGN | cs_FONIPA-ko | Gurmukhi-Malayalam | Latin-Bengali | Oriya-Malayalam | Ukrainian-Latin/BGN |
Any-Latin/Names | cs-cs_FONIPA | Gurmukhi-Oriya | Latin-Bopomofo | Oriya-Tamil | Uzbek-Latin/BGN |
Any-Latin/UNGEGN | cs-ja | Gurmukhi-Tamil | Latin-Cyrillic | Oriya-Telugu | XSampa-IPA |
Any-Lower | cs-ko | Gurmukhi-Telugu | Latin-Devanagari | Pashto-Latin/BGN | zh_Latn_PINYIN-ru |
Any-Malayalam | Cyrillic-Latin | Halfwidth-Fullwidth | Latin-Georgian | Persian-Latin/BGN |
Lookup
This plug-in performs a data lookup on a mapping table.
Semarchy Lookup Enricher
Plug-in ID
Semarchy Lookup Enricher - com.semarchy.engine.plugins.convergence.text
Description
This enricher performs a data lookup on a mapping table accessed via a JDBC datasource.
The mapping table is located in a datasource provided using the Datasource parameter, which defaults to the data location’s datasource. The mapping table is declared to the enricher:
- By giving a Mapping Table as well as a Lookup Column and a list of (up to 20) Output Columns from this table. The input lookup value is searched in the Lookup Column and the corresponding values from the Output Columns are returned.
- By giving a Custom SQL select statement executed on the datasource, which must return columns aliased
LOOKUP_COLUMN
andOUTPUT_COLUMN1, …, OUTPUT_COLUMN20
. These columns will be used as the lookup and output columns.
The lookup is performed on the mapping table with an optional memory cache configured with the Cache Lookup Data parameter.
When a null value is passed as the Lookup Value or when the lookup finds no matching value in lookup column, the enricher returns the Fallback Value or the Lookup Value, depending on the Fallback Behavior parameter.
Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Cache Lookup Data | No | String | Use this parameter to optionally use a memory cache for the lookup process. Possible values are:
Use the cache only to process batches of records. Do not use it when processing one record at a time. For example, it is recommended to set this parameter to NO_CACHE for enrichers running in steppers. If you configure the cache in such situation, it would load everytime the stepper triggers the enricher, causing bad performances. |
Custom SQL | No | String | Leave this parameter empty to use a generated SQL query. Use this parameter instead of Mapping Table, Lookup Column and Output Columns to define the lookup dataset with a select statement in the following form: select <lookup_column> LOOKUP_COLUMN, <output_column> OUTPUT_COLUMN1, <output_column> OUTPUT_COLUMN2, <output_column> OUTPUT_COLUMN3, ... from <mapping_table> where ... The number of This query must return a dataset with n+1 columns aliased LOOKUP_COLUMN and OUTPUT_COLUMN1 to OUTPUT_COLUMNn . These columns are used instead of the Lookup Column and Output Columns. |
Datasource | No | String | JNDI name of datasource containing the lookup data. If this parameter is not defined, the enricher uses the data location datasource. This parameter should contain the full path of the datasource, for example: java:comp/env/jdbc/SEMARCHY_STAGING . |
Fallback Behavior | No | String | Behavior when the lookup value is not found in the lookup column. Possible values are:
When multiple output columns are specified, the same value - the fallback or lookup value - is sent to all these columns. |
Fallback Value | No | String | Value to return if the lookup value is not found in the lookup column. Default value: |
Lookup Column | No | String | Physical name of the column containing the lookup values. Default value: |
Mapping Table | No | String | Physical name of the mapping table containing the lookup and output columns. Default value: |
Output Columns | No | String | Comma-separated list of the physical names of the columns containing the values returned by the enricher. Default value: The (singular) Output Column parameter available in previous versions of this plug-in is deprecated and replaced by this parameter. |
Plug-in Inputs
The following table lists the plug-in inputs.
Input Name | Mandatory | Type | Description |
---|---|---|---|
Lookup Value | Yes | String | Value to look for in the mapping table’s lookup column. |
Plug-in Outputs
The following table lists the plug-in outputs.
Output Name | Type | Description |
---|---|---|
Output Value<N> | String | Nth Value returned by the lookup. |
Translation
Google Translate Enricher
Plug-in ID
Google Translate Enricher - com.semarchy.engine.plugins.convergence.translate.v2
Description
This enricher translates an Input Text from a Source Language to a Target Language using the Google Translate service. The source language is automatically detected if unspecified. This enricher requires a valid Google Key.
https://www.googleapis.com/language/translate/v2?<parameters>
.
Make sure to make this URL accessible through your firewalls.Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Application Name | Yes | String | Name of the client application accessing
the Google Translate service. Application names should preferably have
the format |
Google Key | Yes | String | Google API Key. It is a unique key that you generate using the Google API Console. |
Plug-in Inputs
The following table lists the plug-in inputs.
Input Name | Mandatory | Type | Description |
---|---|---|---|
Input Text | Yes | String | Text to translate. |
Source Language | No | String | Language of the input text. If it is unspecified, it is detected from the input text. |
Target Language | Yes | String | Target language for the translation. |
Plug-in Outputs
The following table lists the plug-in outputs.
Output Name | Type | Description |
---|---|---|
Translated Text | String | Translated Text. |
Name Processing
Semarchy Person Name Enricher
Plug-in ID
Semarchy Person Name Enricher - com.semarchy.engine.plugins.convergence.personname.PersonNameEnricher
Description
This enricher extracts from a person’s full name his/her Given Name, Surname and Gender. It parses the Input Name and identifies a Given Name and Surname (with a Name Parsing Score confidence percentage). Then the given name is searched in a database of names for the source country code provided in the input. It a given name is matched, a Gender and a Most Frequent Gender (if the given name is unisex) are returned.
Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Surname Position | Yes | String | Position of the Surname. This parameter is used for parsing the input name to detect the first and last names, and for generating the Full Name output. Possible values ( |
Case Transformation | Yes | String | Case transformation for the name.
Possible values: |
Plug-in Inputs
The following table lists the plug-in inputs.
Input Name | Mandatory | Type | Description |
---|---|---|---|
Input Name | Yes | String | Person full name to enrich. |
Source Country Code | Yes | String | Code of the country of origin for the
name. This code indicates the database of names to search to determine a gender for the given name. Built-in databases include |
Plug-in Outputs
The following table lists the plug-in outputs.
Output Name | Type | Description |
---|---|---|
Full Name | String | The reconstructed full name, with the surname positioned according to the Surname Position parameter. |
Gender | String | The gender of the Matched Given Name. One of MALE, FEMALE, UNISEX, UNKNOWN. |
Gender Score | String | Confidence with which for Most Frequent Gender can be used [0-100]. |
Given Name | String | The part identified as Given Name in the input name. |
Matched Given Name | String | Given name matched in the given name database. |
Most Frequent Gender | String | The more frequent gender of the Matched Given Name for the given country. One of MALE, FEMALE, UNKNOWN. |
Names Parsing Score | String | Names Parsing confidence [0-100] |
Surname | String | The part identified as Surname in the input name. |
Surname Position | String | Position at which the surname was detected. |
International Phone Numbers Plug-In
The International Phone Numbers Plug-In for Semarchy xDM provides two features:
- An enricher to standardize and improve phone numbers formatting.
- A validator to check the validity of phone numbers.
Semarchy Phone Enricher
Plug-in ID
Semarchy Phone Enricher - com.semarchy.engine.plugins.convergence.phone
Description
This enricher takes as the Input Phone Number either an international phone number (with the international prefix), or a national phone number provided with a Region Code. It returns a standardized Enriched Phone Number in the Enriched Phone Format. Geocoding Data is also returned and includes (depending on the country) the country, the region/state and the city name.
If a phone number is not valid, the enricher returns the original phone value in the Enriched Phone Number, a Status Code as well as a Status Text describing the issue with the input phone number.
Plug-in Parameters
This plug-in does not use any parameter.
Plug-in Inputs
The following table lists the plug-in inputs.
Input Name | Mandatory | Type | Description |
---|---|---|---|
Input Phone Number | Yes | String | Input Phone Number. |
Region Code | No | String | Two letters region code for a national phone number, according to the ISO 3166-1 standard. If this parameter is left empty, the phone number provided in the Input Phone Number should include the international country calling code. |
Enriched Phone Format | No | String | Format of the Enriched Phone Number. Possible values are |
Region of Origin | No | String | Formats the phone output for international dialing from the country or region provided in this input. E.g.: |
Phone Formats
The following standards are supported to format the enriched phone number:
E123_INTERNATIONAL
andE123_NATIONAL
refer to the ITU-T Recommendation E.123 for national and international phone numbers.INTERNATIONAL
andNATIONAL
use a format similar to the ITU-T Recommendation E.123 for national and international phone numbers, but use hyphens to separate blocks of numbers.E164
refers to the ITU-T Recommendation E.164.RFC3966
refers to the IETF 3966 RFC.
Phone Format Examples:
E123_NATIONAL
(E.123 - National Notation): (042) 123 4594E123_INTERNATIONAL
(E.123 - International Notation): +31 42 123 4567NATIONAL
(E.123 - National Notation with hyphens): (042) 123-4594INTERNATIONAL
(E.123 - International Notation with hyphens): +31 42-123-4567E.164
(E.164 - International Notation): +31421234567 (equivalent to E.123 with no formatting)RFC3966
(RFC3966 - International Notation): +31-42-123-4567 (equivalent to E.123 with hyphens instead of spaces)
Plug-in Outputs
The following table lists the plug-in outputs.
Output Name | Type | Description |
---|---|---|
Enriched Phone Number | String | Phone number returned by the enricher in the format specified in the Enriched Phone Format input. This string is null if the enricher was not able to process the input phone number. The Status Code and Status Text value help troubleshooting such issues. |
Geocoding Data | String | Geocoding data computed for a given number and country. Depending on the country and phone number, this value includes the country, region/state and city information. This string is null if the enricher was not able to process the input phone number. The Status Code and Status Text value help troubleshooting such issues. |
Status Code | String | Return code for the phone number processing. More details about the Status Codes. |
Status Text | String | Text explaining the status code. |
International Phone Prefix | String | International Phone Prefix for worldwide dialing. |
National Number | String | National number part of a phone number in International format. It is often the International number without the Country Prefix. |
Extension | String | Extension part of the phone number. |
Country Code Source | String | Explains how the Country Code was retrieved. Possible values are |
Leading Zero | String | Returns 0 or 1 to specify if leading zero is mandatory for foreign calls. |
Possible Phone Number | String | Returns 0 or 1 to indicate whether a phone number is a possible number, and the region where the number could be dialed from. |
Possible Phone Number Reason | String | Detailed explanation of why a phone number is a possible number or not. Possible values are |
Valid Phone Number | String | Returns 0 or 1 to indicate whether a phone number matches a valid pattern. |
Valid Phone Number For Region | String | Returns 0 or 1 to indicate that a phone number is valid for the specified Region Code. |
Phone Line Type | String | Provides the line type of a phone number. Possible values are : |
Region Code | String | Returns the region code for the Phone Number. See this link for the list of codes. |
International Phone Number | String | Phone number formatted for international dialing. |
Time Zones | String | List of corresponding time zones for a given number. For example: |
First Time Zone | String | First time zone from the list of corresponding time zones for a given number. |
Carrier Name | String | Name of the carrier for the phone number. |
Status Codes
The following status codes are returned by the enricher:
0 - OK
: Optimal execution. No error detected.1 - INPUT_WAS_NULL
: Input phone number was not set.2 - PARSING FAILED
: The string supplied did not seem to be a phone number. Review the Status text for more information.
Semarchy Phone Extractor
Plug-in ID
Semarchy Phone Extractor - com.semarchy.engine.plugins.convergence.phone.extractor
Description
This enricher extracts a list of phone numbers from an Input Text and returns them as a Phone List, in a given Extraction Format.
Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Matching Leniency | No | String | Defines the phone number extraction
leniency. Possible values are |
Extraction Format | No | String | Format of the extracted phone numbers.
Possible values are |
List Separator | No | String | Define the separator character used in the extracted phones list. |
Maximum Invalid Numbers | No | String | Maximum number of invalid numbers allowed before stopping to process the text. This is to cover cases where the text contains a lot of false positives. |
Plug-in Inputs
The following table lists the plug-in inputs.
Input Name | Mandatory | Type | Description |
---|---|---|---|
Input Text | Yes | String | Input text to search for phone numbers. |
Accepted Region | No | String | Defines the region used when Matching Leniency is set to |
Plug-in Outputs
The following table lists the plug-in outputs.
Output Name | Type | Description |
---|---|---|
Extracted Phone List | String | List of phone numbers extracted. |
Phone 1 to Phone 5 | String | First, second… extracted phone number in the list. |
Semarchy Phone Validator
Plug-in ID
Semarchy Phone Validator - com.semarchy.engine.plugins.convergence.phone
Description
This validator takes as the Input Phone Number either an international phone number (with the international prefix), or a national phone number provided with a Country Code. The validator checks whether this phone number is a valid international or national phone number.
Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Validation Leniency | No | String | Precise validation leniency for
possible phone numbers. Value may be |
Plug-in Inputs
The following table lists the plug-in inputs.
Input Name | Mandatory | Type | Description |
---|---|---|---|
Input Phone Number | Yes | String | Input Phone Number. |
Country Code | No | String | Two letters country code for a national phone number, according to the ISO 3166-1 standard. If this parameter is left empty, the phone number provided in the Input Phone Number should include the international country calling code. |
Email Plug-In
The Email Plug-In for Semarchy xDM provides an enricher to improve the quality of email addresses and a validator to check email validity.
Semarchy Email Enricher
Plug-in ID
Semarchy Email Enricher - com.semarchy.engine.plugins.convergence.email
Description
This enricher takes an Input Email Address and splits this address into the local-part (user name) and the domain name. Both these parts are checked syntactically and syntax errors are fixed automatically. The domain name validity is also checked using MX records lookup. The plug-in uses a Domain Name Cache for faster checks and automated fixes on domain names.
Domain Name Cache
The plug-in uses several mechanisms for faster checks and automated fixes on domain names:
- Domain names already checked as valid (MX record lookup) are persisted in a domain name cache stored in a JDBC Datasource. This avoids repeating MX lookup.
- A list of known domains (e.g.:
hotmail.com
,gmail.com
, etc.) is automatically seeded in the host name validation cache. - Common domain mistakes are fixed using a seeded replace list. For
example
gmai.com
is automatically fixed togmail.com
using the cache. - Invalid domains are automatically fixed to similar valid domains
already present in the cache. For example,
semarcyh.com
is fixed tosemarchy.com
assemarchy.com
was previously checked as a valid domain name.
See Appendix A: Semarchy Email Enricher Domain Name Cache for more information about the domain name cache.
Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Datasource | No | String | Full name of the JDBC Datasource used to store
the host name validation cache. |
Lowercase User Name | No | String | Set to `1' to transform the local-part (username) to lowercase in the cleansed email address. |
Offline Mode | No | String | Set to `1' to query only the local domain cache. The plug-in does not perform the MX Record Lookup. |
Processing Mode | No | String | Processing mode: |
Plug-in Inputs
The following table lists the plug-in inputs.
Input Name | Mandatory | Type | Description |
---|---|---|---|
Input Email Address | Yes | String | Input email address to cleanse. |
Plug-in Outputs
The following table lists the plug-in outputs.
Output Name | Type | Description |
---|---|---|
Cleansed Email Address | String | Cleansed email address returned by the enricher. This address may be valid or not. The syntactic validity or domain name validity of the email address is indicated in the other plug-in outputs. |
Valid Domain | String | Flag (0 or 1) indicating
whether the domain name is valid or not (based on syntax and MX records
lookup) in the cleansed email address. In Offline mode, this parameter
returns 1 or 0 if the domain name appears in the local domain cache as
valid or invalid. It returns |
Valid Domain Syntax | String | Flag (0 or 1) indicating whether the domain name syntax is valid or not in the cleansed email address. |
Valid Email Syntax | String | Flag (0 or 1) indicating whether the cleansed email address is syntactically valid or not. |
Valid Username Syntax | String | Flag (0 or 1) indicating whether the local-part (user name) syntax is valid or not in the cleansed email address. |
Valid Input Domain | String | Flag (0 or 1) indicating whether the domain
name is valid or not (based on syntax and MX records lookup) in the
input email address. In Offline mode, this parameter returns 1 or 0 if
the domain name appears in the local domain cache as valid of invalid.
It returns |
Valid Input Domain Syntax | String | Flag (0 or 1) indicating whether the domain name syntax is valid or not in the input email address. |
Valid Input Email Syntax | String | Flag (0 or 1) indicating whether the input email address is syntactically valid or not. |
Valid Input Username Syntax | String | Flag (0 or 1) indicating whether the local-part (user name) syntax is valid or not in the input email address. |
Semarchy Email Validator
Plug-in ID
Semarchy Email Validator - com.semarchy.engine.plugins.convergence.email
Description
This enricher takes an Input Email Address and checks its syntactic validity. The domain name validity is optionally also checked using MX records lookup.
The plug-in uses the same mechanisms as the Semarchy Email Enricher for checking the email validity, except that it does not modify the incoming email.
Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Accepted Domains | No | String | Value tolerated for the email domain. Possible values:
|
Offline Mode | No | String | Set to `1' to query only the local domain cache. The plug-in does not perform the MX Record Lookup. |
Processing Mode | No | String | Processing mode: |
Plug-in Inputs
The following table lists the plug-in inputs.
Input Name | Mandatory | Type | Description |
---|---|---|---|
Input Email Address | Yes | String | Input email address to check. |
Melissa Plug-ins
The Melissa Plug-in for Semarchy xDM provides enrichers to fix and complete contact data for US/Canada using the Personator service, and to validate international addresses in 240 countries using the Global Address Verification service.
Melissa Global Address Enricher
Plug-in ID
Melissa Global Address Enricher - com.semarchy.engine.plugins.melissa.GlobalAddressVerificationEnricher
Description
The Melissa Global Address Enricher validates international addresses in 240 countries using the Global Address Verification service.
Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
License String | Yes | String | Your license string. This must be valid for you to access the Melissa Service. |
Delivery Lines | No | Boolean | The options allows you to specify if the Address Lines 1-8 should contain just the delivery address or the entire address |
Line Separator | No | String | Possible values: SemiColon, Pipe, CR, LF, CRLF, Tab, BR. This is the line separator used for the FormattedAddress result. |
Output Script | No | String | Possible values: NoChange, Latn, Native. This is the script type used for all applicable fields. |
Country Of Origin | No | String | Must contain a valid ISO-3166-1Alpha-2, ISO-3166-1 Alpha-3, or ISO-3166-1 Numeric code. This is used to determine whether or not to include the country name as the last line in FormattedAddress |
SSL Connection | No | Boolean | Default is true. Set to false if you don’t wish to use a secure connection. |
Failure Error Codes | No | String | Comma-separated list of codes (AE01, AE02) or code families (AE). When this result code is returned by the API, the enrichment is failed. |
Requests Limit | No | Number | When set, this numeric value limits the number of requests made to the Melissa API and the number of enriched records. Records after this limit are not enriched and the plugin returns blank outputs. This parameter is intended for tests purposes only. |
Plug-in Inputs
The following table lists the plug-in inputs.
Input Name | Mandatory | Type | Description |
---|---|---|---|
AddressLine1 | No | String | The input field for the address line 1. This should contain the delivery address information (house number, street, building, suite, etc.) but should not contain locality information (city, state, postal code, etc.) which have their own inputs. |
AddressLine2 | No | String | The input field for the address line 2. This can be a continuation of AddressLine1 (ex: suite) or another address. |
AddressLine3 | No | String | The input field for the address. This should contain the delivery address information (house number, thoroughfare, building, suite, etc.) but should not contain locality information (locality, administrative area, postal code, etc.) which have their own inputs. |
DependentLocality | No | String | The smaller population center data element. This depends on the Locality element. |
DoubleDependentLocality | No | String | The smallest population center data element. This depends on the Locality and DependentLocality elements. |
Locality | No | String | The most common population center data element. |
PostalCode | No | String | The postal code. |
SubAdministrativeArea | No | String | The smallest geographic data element. |
SubNationalArea | No | String | The administrative region within a country on an arbitrary level below that of the sovereign state. |
Country | No | String | The country. |
Plug-in Outputs
The following table lists the plug-in outputs.
Output Name | Type | Description |
---|---|---|
AddressKey | String | Returns a unique identifier for an address. This key can be used with other current and future Melissa services. |
AddressLine1 | String | These are the string values that will return the standardized or corrected contents of the input address. These lines will include the entire address including the locality, administrative area, and postal code. |
AddressType | String | Returns the Address Type for US and Canada |
AdministrativeArea | String | The most common geographic data element. |
Building | String | Descriptive name identifying an individual location. This is a string value that is the parsed Building element from the output. |
CountryISO3166_1_Alpha2 | String | ISO 3166 2-character country code. |
CountryISO3166_1_Alpha3 | String | ISO 3166 3-character country code. |
CountryISO3166_1_Numeric | String | ISO 3166 3-digit numeric country code. |
CountryName | String | Returns the country name for the record. |
DependentLocality | String | A dependent locality is a logical area unit that is smaller than a locality but larger than a double dependent locality or thoroughfare. It can often be associated with a neighborhood or sector. Great Britain is an example of a country that uses double dependent locality. In the United States, this would correspond to Urbanization, which is used only in Puerto Rico. |
DependentThoroughfare | String | Block data element or dependent street. This is used when there are more than one thoroughfares with the same name in one locality. An adjoining thoroughfare is used to uniquely identify the target thoroughfare. This is rarely used. |
DependentThoroughfareLeadingType | String | Thoroughfare type at the beginning of the dependent thoroughfare. The leading type is parsed from the dependentThoroughfare parameter. For example, if the dependent thoroughfare is "St. Hickory E," the dependent thoroughfare leading type would be "St. |
DependentThoroughfareName | String | Dependent thoroughfare name parsed from the dependentThoroughfare parameter. For example, if the dependent thoroughfare is "E Hickory Ln," the dependent thoroughfare name would be "Hickory. |
DependentThoroughfarePostDirection | String | Cardinal directional at the end of the dependent thoroughfare. The postfix directional is parsed from the dependentThoroughfare parameter. For example, if the dependent thoroughfare is "Hickory Ln N," the dependent thoroughfare post direction would be "N. |
DependentThoroughfarePreDirection | String | Cardinal directional at the beginning of the dependent thoroughfare. The prefix directional is parsed from the dependentThoroughfare parameter. For example, if the dependent thoroughfare is "W Hickory Ln," the dependent thoroughfare pre direction would be "W. |
DependentThoroughfareTrailingType | String | Thoroughfare type at the end of the dependent thoroughfare. The trailing type is parsed from the dependentThoroughfare parameter. For example, if the dependent thoroughfare is "W Hickory Ln," the dependent thoroughfare trailing type would be "Ln. |
DoubleDependentLocality | String | A double dependent locality is a logical area unit that is smaller than a dependent locality but bigger than a thoroughfare. This field is very rarely used. Great Britain is an example of a country that uses double dependent locality. |
FormattedAddress | String | Mailing address. The full mailing address in the preferred format for the country of the address. This includes the Organization as the first line, one or more lines in the origin country’s format, and the destination country (if required). Separate lines will be delimited by what is specified in the option. |
Latitude | String | Returns the geocoded latitude for the address entered in the AddressLine field. |
Locality | String | This is the most common geographic area and used by virtually all countries. This is usually the value that is written on a mailing label and referred to by terms like City, Town, Postal Town, etc. |
Longitude | String | Returns the geocoded longitude for the address entered in the AddressLine field. |
Organization | String | This is a string value that matches the Organization request element. It is not modified or populated by the service. |
PostBox | String | Post box information for a particular delivery point. |
PostalCode | String | Returns the 9-digit postal code for U.S. addresses and 6-digit postal code for Canadian addresses. |
PremisesNumber | String | Alphanumeric indicator within premises field. Parsed from the premises parameter. |
PremisesType | String | Leading premise type indicator within premises field. Parsed from the premises parameter. |
Results | String | String value containing a comma-separated list of status, error codes, and change codes for the record. Refer the the Melissa documentation for more details. |
SubAdministrativeArea | String | The smallest geographic data element. |
SubNationalArea | String | A sub-national area is a logical area unit that is larger than an administrative area but smaller than the country itself. It is extremely rarely used. |
SubPremises | String | Alphanumeric code identifying an individual location. More specific than premises. |
SubPremisesNumber | String | Sub premises number indicator within premises field. Parsed from the subPremises parameter. |
SubPremisesType | String | Sub premises type indicator within premises field. Parsed from the subPremises parameter. |
Thoroughfare | String | This value is a part of the address lines and contains all the sub-elements of the thoroughfare like trailing type, thoroughfare name, pre direction, post direction, etc. |
ThoroughfareLeadingType | String | Leading thoroughfare type indicator parsed from the thoroughfare parameter. A leading type is a thoroughfare type that is placed before the thoroughfare. This value is a part of the Thoroughfare field. For example, the thoroughfare type of "Rue" in Canada and France is placed before the thoroughfare, making it a leading type. |
ThoroughfareName | String | Name indicator parsed from the thoroughfare parameter. |
ThoroughfarePostDirection | String | Postfix directional parsed from the thoroughfare parameter. |
ThoroughfarePreDirection | String | Prefix directional parsed from the thoroughfare parameter. |
ThoroughfareTrailingType | String | Trailing thoroughfare type indicator parsed from the thoroughfare parameter. A trailing type is a thoroughfare type that is placed after the thoroughfare. This value is a part of the Thoroughfare field. For example, the thoroughfare type of "Avenue" in the US is placed after the thoroughfare, making it a trailing type. |
TransmissionResults | String | This is a string value that lists error codes from any errors caused by the most recent request as a whole. |
Melissa Personator Enricher
Plug-in ID
Melissa Personator Enricher - com.semarchy.engine.plugins.melissa.PersonatorConsumerEnricher
Description
The Melissa Personator Enricher fixes and completes contact data for US/Canada using the Personator Consumer service.
Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
License String | Yes | String | Your license string. This must be valid for you to access the Melissa Service. |
Action Append | No | Boolean | The Append Action will return elements based on the selected point of centricity which can either be the address, email or phone. For example, an address centric Append will return the name, company, phone and email associated with the given address. US only. |
Action Check | No | Boolean | The Check Action will validate the individual input data pieces for validity and correct them if possible. If the data is correctable, additional information |
Action Move | No | Boolean | The Move Action will return the latest address for an individual or business if a previous address was entered. Move requires either a Last Name and Address, or a Business/Company Name and Address as inputs. US only. |
Action Verify | No | Boolean | The Verify Action will return to you the relationships between your different input data pieces. It can show you if your name, |
Advanced Address Correction | No | Boolean | Uses the name input to perform more advanced address corrections. This can correct or append house numbers, street names, cities, states, and ZIP codes. |
Append Options | No | String | Possible values: blank, checkError, always. Setting the Append option to Blank will cause the service to return information only when the input address, phone, email, name or company is blank. |
Centric Hint | No | String | Possible values: auto, address, phone, email. Default value is Auto. When set to Auto, it first uses Address if available, followed by Phone if no Address is available, and lastly Email if neither Address nor Phone are available. Use this to tell the service which piece of information to use as the primary point of reference when appending or verifying data. |
Columns | No | String | By default requested columns are restricted to mapped outputs, this parameter allow to specifies (force) which column(s) to be requested, see Melissa documentation |
Diacritics | No | String | Possible values: auto, on, off. Determines whether or not French language characters are returned. If set to auto, those characters are only returned if they are in the input. |
Failure Error Codes | No | String | Comma-separated list of codes (AE01, AE02) or code families (AE). When this result code is returned by the API, the enrichment is failed. |
SSL Connection | No | Boolean | Default is true. Set to false if you don’t wish to use a secure connection. |
Use Preferred City | No | Boolean | There is an official name that is preferred by the U.S.PS and there may be one or more unofficial "vanity" names in use. Normally, Personator allows you to verify addresses using known vanity names. Setting this to true, will return the prefered city. |
Requests Limit | No | Number | When set, this numeric value limits the number of requests made to the Melissa API and the number of enriched records. Records after this limit are not enriched and the plugin returns blank outputs. This parameter is intended for tests purposes only. |
Plug-in Inputs
The following table lists the plug-in inputs.
Input Name | Mandatory | Type | Description |
---|---|---|---|
AddressLine1 | No | String | The input field for the address line 1. This should contain the delivery address information (house number, street, building, suite, etc.) but should not contain locality information (city, state, postal code, etc.) which have their own inputs. |
AddressLine2 | No | String | The input field for the address line 2. This can be a continuation of AddressLine1 (ex: suite) or another address. |
City | No | String | The city. |
CompanyName | No | String | The company name. |
Country | No | String | The country. |
No | String | The email address. | |
FirstName | No | String | The given (first) name. |
FreeForm | No | String | Single line contact information. Address, phone, email could be all in a single field and they will be parsed out. Please don’t map any other fields if using FreeForm. |
FullName | No | String | This field can contain a full name. The API will parse and check Names only if the First Name and Last Name fields are left blank. |
LastLine | No | String | The city, state, and ZIP. |
LastName | No | String | The family (last) name. |
Phone | No | String | The phone number. |
PostalCode | No | String | The postal code. |
State | No | String | The US state. |
Plug-in Outputs
The following table lists the plug-in outputs.
Output Name | Type | Description |
---|---|---|
AddressDeliveryInstallation | String | Returns the parsed delivery installation for the address entered in the AddressLine field. |
AddressExtras | String | Any extra information that does not fit in the AddressLine fields. |
AddressHouseNumber | String | Returns the parsed house number for the address entered in the AddressLine field. |
AddressKey | String | Returns a unique identifier for an address. This key can be used with other current and future Melissa services. |
AddressLine1 | String | These are the string values that will return the standardized or corrected contents of the input address. These lines will include the entire address including the locality, administrative area, and postal code. |
AddressLine2 | String | These are the string values that will return the standardized or corrected contents of the input address. These lines will include the entire address including the locality, administrative area, and postal code. |
AddressLockBox | String | Returns the parsed lock box number for the address entered in the AddressLine field. |
AddressPostDirection | String | Returns the parsed post-direction for the address entered in the AddressLine field. |
AddressPreDirection | String | Returns the parsed pre-direction for the address entered in the AddressLine field. |
AddressPrivateMailboxName | String | Returns the parsed private mailbox name for the address entered in the AddressLine field. |
AddressPrivateMailboxRange | String | Returns the parsed private mailbox range for the address entered in the AddressLine field. |
AddressRouteService | String | Returns the parsed route service number for the address entered in the AddressLine field. |
AddressStreetName | String | Returns the parsed street name for the address entered in the AddressLine field. |
AddressStreetSuffix | String | Returns the parsed street suffix for the address entered in the AddressLine field. |
AddressSuiteName | String | Returns the parsed suite name for the address entered in the AddressLine field. |
AddressSuiteNumber | String | Returns the parsed suite number for the address entered in the AddressLine field. |
AddressTypeCode | String | Returns a code for the address type in the AddressLine field. |
CBSACode | String | Census Bureau’s Core Based Statistical Area (CBSA). Returns the 5-digit code for the CBSA associated with the requested record. |
CBSADivisionCode | String | Returns the code for a division associated with the requested record, if any. |
CBSADivisionLevel | String | Returns whether the CBSA division, if any, is metropolitan or micropolitan. |
CBSADivisionTitle | String | Returns the title for the CBSA division, if any. |
CBSALevel | String | Returns whether the CBSA is metropolitan or micropolitan. |
CBSATitle | String | Returns the title for the CBSA. |
CarrierRoute | String | Returns a 4-character code defining the carrier route for this record. |
CensusBlock | String | Returns a 4-digit string containing the census block number associated with the requested record. |
CensusTract | String | Returns a 4-to 6-digit string containing the census tract number associated with the requested record. |
City | String | Returns the city entered in the City field. |
CityAbbreviation | String | Returns an abbreviation for the city entered in the City field, if any. |
CompanyName | String | Returns the company name. |
CongressionalDistrict | String | Returns the 2-digit congressional district that belongs to the requested record. |
CountryCode | String | Returns the country code for the country in the Country field. |
CountryName | String | Returns the country name for the record. |
DeliveryIndicator | String | Returns an indicator of whether an address is a business address or residential address. |
DeliveryPointCheckDigit | String | Returns a string value containing the 1-digit delivery point check digit. |
DeliveryPointCode | String | Returns a string value containing the 2-digit delivery point code. |
EmailAddress | String | Returns the email address entered in the Email field. |
EmailDomainName | String | Returns the parsed domain name for the email entered in the Email field. |
EmailMailboxName | String | Returns the parsed mailbox name for the email entered in the Email field. |
EmailTopLevelDomain | String | Returns the parsed top-level domain name for the email entered in the Email field. |
FormattedAddress | String | Mailing address. The full mailing address in the preferred format for the country of the address. This includes the Organization as the first line, one or more lines in the origin country’s format, and the destination country (if required). Separate lines will be delimited by what is specified in the option. |
Gender | String | Returns a gender for the name in the FullName field. |
Gender2 | String | Only used if 2 names are in the FullName field. Returns a gender for the second name in the FullName field. |
Latitude | String | Returns the geocoded latitude for the address entered in the AddressLine field. |
Longitude | String | Returns the geocoded longitude for the address entered in the AddressLine field. |
NameFirst | String | Returns the first name in the FullName field. |
NameFirst2 | String | Only used if 2 names are in the FullName field. Returns the second name in the FullName field. |
NameFull | String | Returns the full name for the record. |
NameLast | String | Returns the last name in the FullName field. |
NameLast2 | String | Only used if 2 names are in the FullName field. Returns a last name for the second name in the FullName field. |
NameMiddle | String | Returns a middle name for the name in the FullName field. |
NameMiddle2 | String | Only used if 2 names are in the FullName field. Returns a middle name for the second name in the FullName field. |
NamePrefix | String | empty |
NamePrefix2 | String | Returns a prefix for the name in the FullName field. |
NameSuffix | String | Returns a suffix for the name in the FullName field. |
NameSuffix2 | String | Only used if 2 names are in the FullName field. Returns a suffix for the second name in the FullName field. |
PhoneAreaCode | String | Returns the parsed area code for the phone number entered in the Phone field. |
PhoneExtension | String | Returns the parsed extension for the phone number entered in the Phone field. |
PhoneNewAreaCode | String | Returns the parsed new area code for the phone number entered in the Phone field. |
PhoneNumber | String | Returns the standardized phone number for the record. |
PhonePrefix | String | Returns the parsed prefix for the phone number entered in the Phone field. |
PhoneSuffix | String | Returns the parsed suffix for the phone number entered in the Phone field. |
PlaceCode | String | When ZIP codes overlap, the City field will always return the city that covers most of the ZIP area. If the address is located outside of that city but within the ZIP Code, Place Code will refer to that area. |
PlaceName | String | When ZIP codes overlap, the City field will always return the city that covers most of the ZIP area. If the address is located outside of that city but within the ZIP Code, Place Name will refer to that area. |
PostalCode | String | Returns the 9-digit postal code for U.S. addresses and 6-digit postal code for Canadian addresses. |
Results | String | String value containing a comma-separated list of status, error codes, and change codes for the record. Refer the the Melissa documentation for more details. |
Salutation | String | Returns a salutation for the name in the FullName field. |
State | String | Returns the state for the record. |
StateName | String | Returns the full name of the state entered in the State field. |
TransmissionResults | String | This is a string value that lists error codes from any errors caused by the most recent request as a whole. |
UTC | String | Returns the time zone of the requested record. All Melissa products express time zones in UTC (Coordinated Universal Time). |
UrbanizationName | String | Returns the urbanization name for the address entered in the AddressLine field. Usually only used if the address is in Puerto Rico. |
Google Maps Plug-in
The Google Maps Plug-in for Semarchy xDM provides an enricher for international postal addresses. This enricher cleanses, standardizes and enriches the postal addresses with geocoding information.
Google Maps Enricher
Plug-in ID
Google Maps Enricher - com.semarchy.integration.rowTransformers.googleMapsEnricher
Description
This enricher takes an input address, enriches and validates this postal address using the Google Geocoding Service.
http://maps.googleapis.com/maps/api/geocode/json?<parameters>
.
Make sure to make this URL accessible through your firewalls.Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Client ID or API Key | No | String | This parameter may contain either an API Key (for Standard API usage) or the Client ID (for Premium Usage), both provided by Google. The Client ID should begin with the |
Channel | No | String | This parameter assigns a specific channel name and allows tracking usage for this plugin in the Google Maps usage reports. |
Default Language | No | String | Code of the default language used for the returned results. For example, for same address, "Rue Mathieu Misery" would appear in French and "Mathieu Misery Street" in English. This code can be overridden by the Language plug-in input. See the list of supported domain languages for more information. |
Private Key | No | String | Cryptographic signature key provided by Google with the Client ID. |
Request per Second | No | Integer | This parameter limits the number of requests per second made by the enricher to remain within the limits of the API. It defaults to 50 requests per seconds. |
You can use the Google Maps service with one of the following authentication methods:
- With an API Key, with the Standard Usage Limits and a pay-as-you-go plan above the limits.
- With a Client ID and a Signature (private key) with a Google Maps Premium Plan.
Keyless access to this API is not supported by Google.
Plug-in Inputs
The following table lists the plug-in inputs.
Input Name | Mandatory | Type | Description |
---|---|---|---|
Address Line | Yes | String | Address line to process. If the address is composed of multiple lines, then these lines must be provided as a comma-separated list of address lines. |
Postal Code | No | String | Postal code of the address. |
City | No | String | City of the address. |
Country | No | String | Country of the address. |
Language | No | String | Code of the language for the returned result for this record. This language overrides the Default Language parameter. See the list of supported domain languages for more information. |
Address.City || ' ' || Address.State
Plug-in Outputs
The following table lists the plug-in outputs. Outputs marked with an * appear in a Full and a Short form in the output list.
Output Name | Type | Description |
---|---|---|
Address Types | String | Comma-separated list of address types (See Address Types for more information.). |
Administrative Area Level 1* | String | First-order civil entity below the country level. Within the United States, these administrative levels are states. Not all countries exhibit these administrative levels. |
Administrative Area Level 2* | String | Second-order civil entity below the country level. Within the United States, these administrative levels are counties. Not all countries exhibit these administrative levels. |
Administrative Area Level 3* | String | Third-order civil entity below the country level. Not all countries exhibit these administrative levels. |
Airport | String | Indicates an airport. NOTE: This output is deprecated. |
Country* | String | The national political entity. |
East Bound Longitude | String | Bounding box eastern limit. |
Floor* | String | Indicates the floor of a building address. |
Formatted Address | String | Human-readable version of the geocoded address. |
Intersection | String | Major intersection, usually of two major roads. NOTE: This output is deprecated. |
Latitude | String | Latitude of the address. |
Locality* | String | Incorporated city or town political entity. |
Longitude | String | Longitude of the address. |
Natural Feature* | String | Prominent natural feature. |
Neighborhood* | String | Named neighborhood. |
North Bound Latitude | String | Bounding box northern limit. |
Park* | String | Named park. |
Point of Interest* | String | Named point of interest. |
Post Box* | String | Specific postal box. |
Postal Code* | String | Postal code as used to address postal mail within the country. |
Premise* | String | Named location, usually a building or collection of buildings with a common name. |
Quality | String | The value of an Address Quality element defines the granularity of the location described by an address. Should return a value that expresses this quality between 0 and 100 (100 being the best quality) |
Room* | String | The room of a building address. |
Route* | String | Named route (such as |
South Bound Latitude | String | Bounding box southern limit. |
Status | String | Status of the request. |
Street Address | String | Precise street address. NOTE: This output is deprecated. |
Street Number* | String | Precise street number. |
Sub-Locality* | String | First-order civil entity below a locality. |
Sub-Premise* | String | First-order entity below a named location, usually a singular building within a collection of buildings with a common name. |
West Bound Longitude | String | Bounding box western limit. |
Embedded a Google Map in a Form
The Google Geocoding service data must be used to display maps rendered with the Google Maps service.
You can display such a map in Semarchy xDM in a form, by embedding generated HTML and JavaScript.
- Create a new form field with the SemQL expression given below.
- In the SemQL expression, modify the following line to concatenate your address information:
var address= "' || AddressLine || ' ' || PostalCode || ' ' || City || '";
- If you are a Google Maps API for Work customer, modify in the code the URL to the Google maps service to include your Google Client ID. Note that the embedded map will stop working after adding the client ID. You must register authorized URLs with Google by following the instructions given in the Google Maps API for Work site:
<script src="https://maps.googleapis.com/maps/api/js?client=YOUR_CLIENT_ID&v3.20&sensor=false"></script>
- Edit the field:
- In the Display Properties, Set the Component Type to Object, and in Data, set the Source Type to Content.
This configuration tells Semarchy xDM to interpret this code as HTML and JavaScript on the browser.
- In the Display Properties, Set the Component Type to Object, and in Data, set the Source Type to Content.
'<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<script src="https://maps.googleapis.com/maps/api/js?sensor=false"></script>
<script>
/* Modify the line below */
var address= "' || AddressLine || ' ' || PostalCode || ' ' || City || '";
var zoom = 18;
var mapType = google.maps.MapTypeId.ROADMAP;
var useMarker = true;
var map;
function initialize() {
var geocoder = new google.maps.Geocoder();
geocoder.geocode( { "address": address}, function(results, status) {
if (status == google.maps.GeocoderStatus.OK) { displayMap(results[0].geometry.location); }
});
window.onresize = resize;
}
function displayMap(latlng) {
var mapOptions = { zoom: zoom, center: latlng, mapTypeId: mapType }
map = new google.maps.Map(document.getElementById("map_canvas"), mapOptions);
if (useMarker) {
var marker = new google.maps.Marker({ map: map, position: latlng});
}
resize("");
}
function resize(e) {
var center = map.getCenter();
map.getDiv().style.height = window.innerHeight +"px";
map.getDiv().style.width = window.innerWidth +"px";
google.maps.event.trigger(map, ''resize'');
map.setCenter(center);
}
google.maps.event.addDomListener(window, "load", initialize);
</script>
</head>
<body style="margin:0px;">
<div id="map_canvas" style="margin:0px;"></div>
</body>
</html>'
Open Street Map Plug-in
The OpenStreetMap Plug-in for Semarchy xDM uses the OpenStreetMap API to provide an enricher for international postal addresses. This enricher cleanses, standardizes and enriches the postal address.
OpenStreetMap Enricher
Plug-in ID
OpenStreetMap Enricher - com.semarchy.engine.plugins.openstreetmap
Description
This enricher takes an input address, enriches and validates this postal address using the OpenStreetMap Service.
Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
OpenStreetMap URL | Yes | String | URL used to query OpenStreetMap API.
Typically |
Plug-in Inputs
The following table lists the plug-in inputs.
Input Name | Mandatory | Type | Description |
---|---|---|---|
Address Line | Yes | String | Address line to process. If the address is composed of multiple lines, then these lines must be provided as a comma-separated list of address lines. |
Postal Code | No | String | Postal code of the address. |
City | No | String | City of the address. |
Country | No | String | Country of the address. |
Plug-in Outputs
The following table lists the plug-in outputs.
Output Name | Type | Description |
---|---|---|
Address | String | Complete address of the location. |
City | String | City of the location. |
Country | String | Country of the location. |
Country Code | String | Country code of the location. |
County | String | County of the location. |
Latitude | String | Latitude of the location. |
Longitude | String | Longitude of the location. |
Postal Code | String | Postal code of the location. |
Process Code | String | Code that indicates the result status of the address processing. |
State | String | State of the Location. |
Street Number | String | Street number of the location. |
Street Name | String | Street name of the location. |
Microsoft Bing Maps Plug-in
The Microsoft Bing Maps Plug-in for Semarchy xDM uses the Bing Location API to provide an enricher for international postal addresses. This enricher cleanses, standardizes and enriches the postal address with geocoding information.
Bing Maps Enricher
Plug-in ID
Google Bing Enricher - com.semarchy.engine.plugins.bing.address
Description
This enricher takes an input address, enriches and validates this postal address using the Bing Maps Service.
Plug-in Parameters
The following table lists the plug-in parameters.
Parameter Name | Mandatory | Type | Description |
---|---|---|---|
Bing Maps Key | Yes | String | To use the Bing Maps Services, you must have a Bing Maps Key. |
Bing Location URL | Yes | String | This URL will be used to query Bing Location API. |
Plug-in Inputs
The following table lists the plug-in inputs.
Input Name | Mandatory | Type | Description |
---|---|---|---|
Address Line | Yes | String | Address line to process. |
Postal Code | No | String | Postal code of the address. |
City | No | String | City of the address. |
Country | No | String | Country of the address. |
Plug-in Outputs
The following table lists the plug-in outputs.
Output Name | Type | Description |
---|---|---|
Administrative District | String | The subdivision name within the country or region for an address, such as the abbreviation of a US state. |
Administrative District 2 | String | The subdivision name within the administrative district for an address. |
Confidence | String | Defines the confidence of the location match found by the geocoding service. Possible values: High, Medium, Low. |
Country or Region | String | The country or region name of the address. |
Formatted Address | String | A string specifying the complete address. This address may not include the country or region. |
Status Code | String | The HTTP Status code for the request. |
Status Description | String | A description of the HTTP status code. |
Latitude | String | Latitude of the location. |
Locality | String | The locality, such as the primary city, that corresponds to an address. |
Longitude | String | Longitude of the address. |
Match Code | String | Defines the geocoding level of the location match found by the geocoder. One or more of the following values: Good, Ambiguous, UpHierarchy |
Postal Code | String | The city or neighborhood that corresponds to the postal code. |
Process Code | String | Code that indicates the result status of the process. |
Appendices
Appendix A: Semarchy Email Enricher Domain Name Cache
The Semarchy Email Enricher uses a local cache to avoid repeating MX record lookups to check the validity of an email domain.
This domain name cache is used in priority, meaning that if a record is found in the cache, the enricher will use the information available locally and we will not issue a MX record lookup.
The plug-in stores the cache in the table name EXT_EMAIL_DOMAINS
. This table is created at first run of the enricher, by default in the data location served by the enricher.
You can specify a specific datasource location to store this table in the Datasource enricher parameter.
Domain Name Cache Table Structure
The structure of the EXT_EMAIL_DOMAINS
table is the following:
Column Name | Description |
---|---|
| Domain name. e.g. "gmail.com" |
| 2 first letters of the domain name. e.g. "gm" |
| 2 last letters of the domain name. e.g. "om" |
| Number of times this host name was processed by the enricher. This value is automatically incremented by the enricher. |
| Indicates whether this record was part of the seeded data, of created by the enricher. The value is |
| Indicates whether the domain name is valid |
| Latest correction found for an invalid domain. |
| Additional date information used to reconsider a domain validity after a certain period of time. |
Fixing Domain Names
The enricher automatically fixes invalid domain names by finding the closest domain name in the cache using a built-in algorithm based on:
- The Edit Distance between the invalid domain and cached domain.
- The hit count of the cached domain.
A cached domain that is very similar to an invalid domain name and that is frequently processed by the enricher is more likely to be used as a fix for the invalid domain.
Adding Records to the Cache
It is possible to force the creation of new records in the cache, for example to create new fix suggestions.
To manually insert a domain correction <domain_name_replacement>
for a <domain_host_name>
invalid domain, use the following query sample:
INSERT INTO EXT_EMAIL_DOMAINS (
HOST_NAME,
PREFIX,
SUFFIX,
HIT_COUNT,
SEED_DATA,
VALID,
SUGGESTION,
FIRST_INVALID_DATE,
LAST_INVALID_DATE
)
VALUES (
<invalid_host_name>,
SUBSTR(<invalid_host_name>, 0, 2),
SUBSTR(<invalid_host_name>, -2, 2),
0,
'1',
'0',
<host_name_replacement>,
CURRENT_TIMESTAMP,
CURRENT_TIMESTAMP
);
Cache Refresh
The Email enricher refreshes the local cache records after 3 months. This time duration is not configurable. The cache records the date information and will make a new call to the MX server to refresh the cache.
If there is good evidence that the cache is wrong about a domain’s validity, or if business users are certain they want to override the cache’s decision, the developer can set the Valid flag to 0 or 1 manually. To avoid the cache overriding this manual change, it is also important to see the date field to NULL so that the email enricher does not refresh the cache for that domain.
It is safe for developers to periodically truncate the cache table if they want the cache to refresh its results sooner than the 3 month period when the enricher automatically refreshes the cache. Developers can either drop the table entirely or delete the values they do not want and keep the seeded data as well as any other crucial domains they have manually overridden to keep this information.