Stateful ingestion

Semarchy xDG harvesting supports stateful ingestion to save checkpoints and make subsequent harvesting decisions based on these checkpoints.

This feature only applies to specific sources. The source page indicates whether it is supported.

Sample recipe

The following sample recipe configures stateful ingestion.

Example 1. Stateful ingestion sample recipe.
source:
    type: postgres # A source that support assets filtering.
    config:
        # Connection parameters for the source
        # ...
        # Stateful ingestion configuration
        stateful_ingestion:
            enabled: True # False by default
            remove_stale_metadata: True # default value

sink:
  # sink configuration

Configure stateful ingestion

The following source parameters configure the stateful ingestion:

stateful_ingestion.enabled

Set to true to enable stateful ingestion. Defaults to False.

stateful_ingestion.remove_stale_metadata

Set to true to soft-delete the entities present in the last successful run but missing in the current run with Stateful Ingestion enabled. Defaults to True.