Email enricher domain name cache

The Email enricher uses a local cache to prevent redundant MX record lookups when checking the validity of an email domain.
The domain name cache takes precedence; if a record exists in the cache, the enricher will use the locally available information, thus bypassing the need for an MX record lookup.

The plugin saves the cache in a table named EXT_EMAIL_DOMAINS. This table is created during the first execution of the enricher and is stored by default in the data location served by the enricher. You can designate a specific datasource location to store this table by configuring the enricher’s Datasource parameter.

Domain name cache table structure

The structure of the EXT_EMAIL_DOMAINS table is the following:

Column name Description

HOST_NAME

Domain name (e.g., gmail.com).

PREFIX

First two letters of the domain name (e.g., gm).

SUFFIX

Last two letters of the domain name (e.g., om).

HIT_COUNT

Number of times this hostname was processed by the enricher. This value is automatically incremented by the enricher.

SEED_DATA

Indicates whether this record was part of the seeded data or created by the enricher. The value is 1 for seeded data and 0 otherwise.

VALID

Indicates whether the domain name is valid (1) or invalid (0). The value is N/A if the validity is unknown (e.g., when a new domain is added to the cache in offline mode).

SUGGESTION

Latest correction found for an invalid domain.

FIRST_INVALID_DATE
LAST_INVALID_DATE
LAST_VALID_DATE

Additional date information used to reconsider a domain validity after a certain period.

Correcting domain names

The enricher automatically rectifies invalid domain names by finding the closest domain name in the cache using a built-in algorithm based on:

  • The edit distance between the invalid domain and cached domain.

  • The hit count of the cached domain.

A cached domain that is very similar to an invalid domain name and that is frequently processed by the enricher is more likely to be used as a fix for the invalid domain.

Adding records to the cache

It is possible to force the creation of new records in the cache (e.g., to create new fix suggestions).

Inserting valid domains

To manually insert valid domain records (<valid_host_name>) into the cache, use the appropriate query sample provided below based on your database.

For Oracle
INSERT INTO EXT_EMAIL_DOMAINS (
    HOST_NAME,
    PREFIX,
    SUFFIX,
    HIT_COUNT,
    SEED_DATA,
    VALID
    )
VALUES (
    '<valid_host_name>',
    SUBSTR('<valid_host_name>', 0, 2),
    SUBSTR('<valid_host_name>', -2, 2),
    0,
    '1',
    '1',
    );
For PostgreSQL
INSERT INTO EXT_EMAIL_DOMAINS (
    HOST_NAME,
    PREFIX,
    SUFFIX,
    HIT_COUNT,
    SEED_DATA,
    VALID
    )
VALUES (
    '<valid_host_name>',
    LEFT('<valid_host_name>', 2),
    RIGHT('<valid_host_name>', 2),
    0,
    '1',
    '1'
    );
For SQL Server
INSERT INTO EXT_EMAIL_DOMAINS (
    HOST_NAME,
    PREFIX,
    SUFFIX,
    HIT_COUNT,
    SEED_DATA,
    VALID
    )
VALUES (
    '<valid_host_name>',
    LEFT('<valid_host_name>', 2),
    RIGHT('<valid_host_name>', 2),
    0,
    '1',
    '1',
    );

Inserting invalid domains

To manually insert a domain correction (<host_name_replacement>) for an invalid domain (<invalid_host_name>), use the appropriate query sample provided below based on your database.

For Oracle
INSERT INTO EXT_EMAIL_DOMAINS (
	HOST_NAME,
	PREFIX,
	SUFFIX,
	HIT_COUNT,
	SEED_DATA,
	VALID,
	SUGGESTION,
	FIRST_INVALID_DATE,
	LAST_INVALID_DATE
	)
VALUES (
	'<invalid_host_name>',
	SUBSTR('<invalid_host_name>', 0, 2),
	SUBSTR('<invalid_host_name>', -2, 2),
	0,
	'1',
	'0',
	'<host_name_replacement>',
	CURRENT_TIMESTAMP,
	CURRENT_TIMESTAMP
	);
For PostgreSQL
INSERT INTO EXT_EMAIL_DOMAINS (
	HOST_NAME,
	PREFIX,
	SUFFIX,
	HIT_COUNT,
	SEED_DATA,
	VALID,
	SUGGESTION,
	FIRST_INVALID_DATE,
	LAST_INVALID_DATE
	)
VALUES (
	'<invalid_host_name>',
	LEFT('<invalid_host_name>', 2),
	RIGHT('<invalid_host_name>', 2),
	0,
	'1',
	'0',
	'<host_name_replacement>',
	NOW(),
	NOW()
	);
For SQL Server
INSERT INTO EXT_EMAIL_DOMAINS (
	HOST_NAME,
	PREFIX,
	SUFFIX,
	HIT_COUNT,
	SEED_DATA,
	VALID,
	SUGGESTION,
	FIRST_INVALID_DATE,
	LAST_INVALID_DATE
	)
VALUES (
	'<invalid_host_name>',
	LEFT('<invalid_host_name>', 2),
	RIGHT('<invalid_host_name>', 2),
	0,
	'1',
	'0',
	'<host_name_replacement>',
	GETDATE(),
	GETDATE()
	);
In online mode, MX record lookups are resolved by the DNS and flag any valid domain (such as gail.com) as valid. Consequently, the Email enricher does not apply user-created suggestions to replace such domains. Suggestions are only applied to syntactically invalid domains in offline mode or non-existing domains in online mode.

Cache refresh

The Email enricher refreshes the local cache records every three months. This duration is not configurable. The cache records the date information and makes a new call to the MX server to refresh.

Developers can safely truncate the cache table periodically if they want the cache to refresh sooner than the default three-month interval. They can either drop the table entirely or delete unwanted values while retaining the seeded data and any critical domains that have been manually overridden. This approach ensures important information is preserved.

If there is evidence that the cache is incorrect about a domain’s validity, or if business users need to override the cache’s decision, developers can manually set the Valid flag to 0 or 1 and set the date field to NULL to prevent the enricher from refreshing that domain’s cache entry.

To completely control the cache refresh process and prevent the email enricher from automatically refreshing cache entries after three months, use the script below.

UPDATE EXT_EMAIL_DOMAINS
SET
    FIRST_INVALID_DATE = NULL,
    LAST_INVALID_DATE = NULL,
    LAST_VALID_DATE = NULL
WHERE
    HOST_NAME = '<domain>';

Optimizing performance

To optimize the performance of the email enricher, especially during an initial load with a large number of domains, consider the following recommendations and best practices:

  • Pre-seed the cache table with known valid domains using batch insertions to reduce the overhead of multiple database transactions.

  • Utilize parallel processing or multi-threading to perform MX lookups concurrently.

  • Adopt an incremental loading approach for large datasets.