Harvesting
Semarchy xDG uses harvesting as the mechanism to collect metadata from sources and publish assets to Semarchy xDG.
Overview
Harvesting is performed by the harvesting client, provided as a Docker image. This harvesting clients run recipe files that contain the configuration to collect metadata and metrics and push them to Semarchy xDG.
Recipe files
Recipes are YAML files that contain:
-
the source configuration to collect metadata. This configuration differs depending on your source system. Refer to Sources for the configuration details for each supported source technology.
-
the sink configuration to publish this metadata as assets. Refer to Sinks for the configuration details for each type of sink.
-
optionnally, transformers to transform this metadata prior to publishing. Refer to Transformers for more information about transformers.
The following recipe harvests metadata from a PostgreSQL database, and sends this metadata to Semarchy xDG.
postgresql.yaml
file.source: (1)
type: postgres
config:
host_port: localhost:5432
database: semarchyDemoDatabase
username: username
password: password
sink: (2)
type: "datahub-rest"
config:
server: "https://<your-tenant-name>.semarchy.net/api/xdg/v1/catalog" (3)
token: "<your-personal-access-token>" (4)
1 | Source Configuration. Set the connection information to your source database in the config element. |
2 | Sink Configuration. |
3 | The server property must point to your Semarchy xDG site. |
4 | Create a personal access token and set it in the token property. |
Run recipes
To run the above recipe, with the harvesting client configured, use the following command:
./xdg-harvest.sh -c postgresql.yaml
You can monitor your harvesting from the Semarchy xDG user interface.