Sinks

Sinks are the destination of a harvesting process. This destination is typically Semarchy xDG, but you can also use a file or the console as the destination for harvesting.

xDG

This sink pushes metadata to Semarchy xDG using its harvesting API. You need a Personal Access Tokey to use the API. See Install and run the harvesting client to configure this token.

Example 1. xDG sink sample recipe.
source:
  # source configuration

sink:
  type: "datahub-rest"
  config:
    server: "https://<your-tenant-name>.semarchy.net/api/xdg/v1/catalog"
    token: "<your-personal-access-token>"

Parameters

The following table lists the sink parameters.

Parameter Mandatory Description

server

Yes

URL of the Semarchy xDG site.

token

Yes

Personal access token used for authentication.

timeout_sec

No

Timeout in seconds for the HTTP requests made to the API. Defaults to 30 seconds.

retry_max_times

No

Maximum number of retries for failed HTTP requests. The delay between requests increases exponentially. Defaults to 1.

retry_status_codes

No

Also retry HTTP requests failing with these codes. Defaults to [429, 502, 503, 504].

max_threads

No

Experimental: Number of parallel threads for REST API calls. Defaults to 15.

File

This sink writes the metadata events generated by the harvesting process to a file. You can use the generated file using the File.

Using this sink, you can decouple metadata extraction from pushing this metadata to Semarchy xDG.
Example 2. File sink sample recipe.
source:
  # source configuration

sink:
  type: "file"
  config:
    filename: ./path/file.json

Parameters

The following table lists the sink parameters.

Parameter Mandatory Description

filename

Yes

Path to the target file.

Console

This sink prints the metadata events generated by the harvesting process to the console. You can use this sink for testing and debugging purposes.

Example 3. Console sink sample recipe.
source:
  # source configuration

sink:
  type: "console"