Email enricher domain name cache
The Email enricher uses a local cache to prevent redundant MX record lookups when checking the validity of an email domain.
The domain name cache takes precedence; if a record exists in the cache, the enricher will use the locally available information, thus bypassing the need for an MX record lookup.
The plugin saves the cache in a table named EXT_EMAIL_DOMAINS
. This table is created during the first execution of the enricher and is stored by default in the data location served by the enricher.
You can designate a specific datasource location to store this table by configuring the enricher’s Datasource
parameter.
Domain name cache table structure
The structure of the EXT_EMAIL_DOMAINS
table is the following:
Column name | Description |
---|---|
|
Domain name (e.g., |
|
First two letters of the domain name (e.g., |
|
Last two letters of the domain name (e.g., |
|
Number of times this hostname was processed by the enricher. This value is automatically incremented by the enricher. |
|
Indicates whether this record was part of the seeded data or created by the enricher. The value is |
|
Indicates whether the domain name is valid ( |
|
Latest correction found for an invalid domain. |
|
Additional date information used to reconsider a domain validity after a certain period. |
Correcting domain names
The enricher automatically rectifies invalid domain names by finding the closest domain name in the cache using a built-in algorithm based on:
-
The edit distance between the invalid domain and cached domain.
-
The hit count of the cached domain.
A cached domain that is very similar to an invalid domain name and that is frequently processed by the enricher is more likely to be used as a fix for the invalid domain.
Adding records to the cache
It is possible to force the creation of new records in the cache (e.g., to create new fix suggestions).
Inserting valid domains
To manually insert valid domain records (<valid_host_name>
) into the cache, use the appropriate query sample provided below based on your database.
INSERT INTO EXT_EMAIL_DOMAINS (
HOST_NAME,
PREFIX,
SUFFIX,
HIT_COUNT,
SEED_DATA,
VALID
)
VALUES (
'<valid_host_name>',
SUBSTR('<valid_host_name>', 0, 2),
SUBSTR('<valid_host_name>', -2, 2),
0,
'1',
'1',
);
INSERT INTO EXT_EMAIL_DOMAINS (
HOST_NAME,
PREFIX,
SUFFIX,
HIT_COUNT,
SEED_DATA,
VALID
)
VALUES (
'<valid_host_name>',
LEFT('<valid_host_name>', 2),
RIGHT('<valid_host_name>', 2),
0,
'1',
'1'
);
INSERT INTO EXT_EMAIL_DOMAINS (
HOST_NAME,
PREFIX,
SUFFIX,
HIT_COUNT,
SEED_DATA,
VALID
)
VALUES (
'<valid_host_name>',
LEFT('<valid_host_name>', 2),
RIGHT('<valid_host_name>', 2),
0,
'1',
'1',
);
Inserting invalid domains
To manually insert a domain correction (<host_name_replacement>
) for an invalid domain (<invalid_host_name>
), use the appropriate query sample provided below based on your database.
INSERT INTO EXT_EMAIL_DOMAINS (
HOST_NAME,
PREFIX,
SUFFIX,
HIT_COUNT,
SEED_DATA,
VALID,
SUGGESTION,
FIRST_INVALID_DATE,
LAST_INVALID_DATE
)
VALUES (
'<invalid_host_name>',
SUBSTR('<invalid_host_name>', 0, 2),
SUBSTR('<invalid_host_name>', -2, 2),
0,
'1',
'0',
'<host_name_replacement>',
CURRENT_TIMESTAMP,
CURRENT_TIMESTAMP
);
INSERT INTO EXT_EMAIL_DOMAINS (
HOST_NAME,
PREFIX,
SUFFIX,
HIT_COUNT,
SEED_DATA,
VALID,
SUGGESTION,
FIRST_INVALID_DATE,
LAST_INVALID_DATE
)
VALUES (
'<invalid_host_name>',
LEFT('<invalid_host_name>', 2),
RIGHT('<invalid_host_name>', 2),
0,
'1',
'0',
'<host_name_replacement>',
NOW(),
NOW()
);
INSERT INTO EXT_EMAIL_DOMAINS (
HOST_NAME,
PREFIX,
SUFFIX,
HIT_COUNT,
SEED_DATA,
VALID,
SUGGESTION,
FIRST_INVALID_DATE,
LAST_INVALID_DATE
)
VALUES (
'<invalid_host_name>',
LEFT('<invalid_host_name>', 2),
RIGHT('<invalid_host_name>', 2),
0,
'1',
'0',
'<host_name_replacement>',
GETDATE(),
GETDATE()
);
In online mode, MX record lookups are resolved by the DNS and flag any valid domain (such as gail.com) as valid. Consequently, the Email enricher does not apply user-created suggestions to replace such domains. Suggestions are only applied to syntactically invalid domains in offline mode or non-existing domains in online mode. |
Cache refresh
The Email enricher refreshes the local cache records every three months. This duration is not configurable. The cache records the date information and makes a new call to the MX server to refresh.
Developers can safely truncate the cache table periodically if they want the cache to refresh sooner than the default three-month interval. They can either drop the table entirely or delete unwanted values while retaining the seeded data and any critical domains that have been manually overridden. This approach ensures important information is preserved.
If there is evidence that the cache is incorrect about a domain’s validity, or if business users need to override the cache’s decision, developers can manually set the Valid
flag to 0 or 1 and set the date field to NULL
to prevent the enricher from refreshing that domain’s cache entry.
To completely control the cache refresh process and prevent the email enricher from automatically refreshing cache entries after three months, use the script below.
UPDATE EXT_EMAIL_DOMAINS
SET
FIRST_INVALID_DATE = NULL,
LAST_INVALID_DATE = NULL,
LAST_VALID_DATE = NULL
WHERE
HOST_NAME = '<domain>';
Optimizing performance
To optimize the performance of the email enricher, especially during an initial load with a large number of domains, consider the following recommendations and best practices:
-
Pre-seed the cache table with known valid domains using batch insertions to reduce the overhead of multiple database transactions.
-
Utilize parallel processing or multi-threading to perform MX lookups concurrently.
-
Adopt an incremental loading approach for large datasets.