Transformers

Transforms modify and enrich the metadata collected by the harvesting process. Transformers run between the source and the sink, as part of the harvesting process.

Manage ownership

Using transformers, you can manage dataset ownership.

Example 1. Add owners to datasets
transformers:
  - type: "simple_add_dataset_ownership" (1)
    config:
      owner_urns: (2)
        - "urn:li:corpuser:john.doe"  # User URN
        - "urn:li:corpGroup:sales"    # Group URN
      ownership_type: "TECHNICAL_OWNER" (3)
1 Transformer type.
2 List of owner URNs to assign to the assets. These URNs may be user or group URNs.
3 Ownership type for these owners.
Example 2. Add owners to datasets by pattern
transformers:
  - type: "pattern_add_dataset_ownership" (1)
    config:
      owner_pattern: (2)
        rules:
          # Assign owners to the assets
          # which URN contains 'gd'
          ".*gd.*": ["urn:li:corpuser:john.doe", "urn:li:corpuser:jane.doe"]
      ownership_type: "TECHNICAL_OWNER" (3)
1 Transformer type.
2 List of regular expression patterns, each followed by a list of owner URNs. When a dataset URN matches a pattern, the owners listed for that pattern are assigned to that dataset.
3 Ownership type for these owners.

Add tags

Using transformers, you can assign tags to datasets and dataset schema fields.

Example 3. Add tags to datasets
transformers:
  - type: "simple_add_dataset_tags" (1)
    config:
      tag_urns: (2)
        - "urn:li:tag:ToDo"
        - "urn:li:tag:Review"
1 Transformer type.
2 List of tag URNs to assign to the assets.
Example 4. Add tags to datasets by pattern
transformers:
  - type: "pattern_add_dataset_tags" (1)
    config:
      tag_pattern: (2)
        rules:
          # Assign the Temporary and Review tags to the assets
          # which URN contains 'tmp'
          ".*tmp.*": ["urn:li:tag:Temporary", "urn:li:tag:Review"]
          # Assign the Obsolete tag to the assets
          # which URN contains 'old'
          ".*old.*": ["urn:li:tag:Obsolete"]
1 Transformer type.
2 List of regular expression patterns, each followed by a list of tag URNs. When a dataset URN matches a pattern, the tags listed for that pattern are assigned to that dataset.
Example 5. Add tags to dataset fields by pattern
transformers:
  - type: "pattern_add_dataset_schema_tags" (1)
    config:
      tag_pattern: (2)
        rules:
          # Assign the Review and Quality tags to the schema fields
          # which URN contains 'email'
          ".*email.*": ["urn:li:tag:Review", "urn:li:tag:Quality"]
          # Assign the Obsolete tag to the schema fields
          # which URN contains 'old'
          ".*old.*": ["urn:li:tag:Obsolete"]
1 Transformer type.
2 List of regular expression patterns, each followed by a list of tag URNs. When a schema field path matches a pattern, the tags listed for that pattern are assigned to that schema field.
Only the tags from the first matching pattern are applied and not the subsequent ones.

Add glossary terms

Using transformers, you can assign glossary terms to datasets and dataset schema fields.

Example 6. Add glossary terms to datasets
transformers:
  - type: "simple_add_dataset_terms" (1)
    config:
      term_urns: (2)
        - "urn:li:glossaryTerm:GoldenData"
        - "urn:li:glossaryTerm:Regulated"
1 Transformer type.
2 List of glossary term URNs to assign to the assets.
Example 7. Add glossary terms to datasets by pattern
transformers:
  - type: "pattern_add_dataset_terms" (1)
    config:
      term_pattern: (2)
        rules:
          # Assign the GoldenData and Certified terms to the assets
          # which URN contains 'gd'
          ".*gd.*": ["urn:li:glossaryTerm:GoldenData", "urn:li:glossaryTerm:Certified"]
          # Assign the MasterData term to the assets
          # which URN contains 'md'
          ".*md.*": ["urn:li:glossaryTerm:MasterData"]
1 Transformer type.
2 List of regular expression patterns, each followed by a list of term URNs. When a dataset URN matches a pattern, the terms listed for that pattern are assigned to that dataset.
Example 8. Add glossary terms to dataset fields by pattern
transformers:
  - type: "pattern_add_dataset_schema_terms" (1)
    config:
      term_pattern: (2)
        rules:
          # Assign the PII and Email terms to the schema fields
          # which URN contains 'email'
          ".*email.*": ["urn:li:glossaryTerm:PII", "urn:li:glossaryTerm:Email"]
          # Assign the Confidential term to the assets
          # which URN contains 'internal'
          ".*internal.*": ["urn:li:glossaryTerm:Condfidential"]
1 Transformer type.
2 List of regular expression patterns, each followed by a list of term URNs. When a schema field path matches a pattern, the terms listed for that pattern are assigned to that schema field.

Edit domains

Using transformers, you can assign domains to datasets.

Example 9. Add domains to datasets
transformers:
  - type: "simple_add_dataset_domain" (1)
    config:
      domains: (2)
        - "urn:li:domain:sales"
1 Transformer type.
2 List of domain URNs to assign to the assets. You can also use the domain name instead of the URN, for example, "sales".
Example 10. Add domains to datasets by pattern
transformers:
  - type: "pattern_add_dataset_domain" (1)
    config:
      domain_pattern: (2)
        rules:
          # Assign the sales domain to the assets
          # which URN matches the pattern
          "urn:li:dataset:\(urn:li:dataPlatform:postgres,postgres\.public\.n.*": ["urn:li:domain:sales"]
1 Transformer type.
2 List of regular expression patterns, each followed by a list of domain URNs or names. When a dataset URN matches a pattern, the domains listed for that pattern are assigned to that dataset.

Set properties

Using transformers, you can set properties on datasets.

Example 11. Set properties on datasets
transformers:
  - type: "simple_add_dataset_properties" (1)
    config:
      properties: (2)
        confidential: value_1
        property_2: value_2
1 Transformer type.
2 List of properties to set on the dataset, with their values.

Assign to initiatives

Using transformers, you can assign datasets to initiatives.

Example 12. Assign assets to initiatives
transformers:
  - type: "add_to_xdg_initiative" (1)
    config:
      initiative_urn: "urn:li:dataProduct:dataQuality" (2)
      xdg_backend_url: 'https://<your-tenant-name>.semarchy.net/api/xdg/v1' (3)
1 Transformer type.
2 List of initiative URNs assets are assigned to.
3 URL of the Semarchy xDG tenant.

Replace existing values

When using transformers, you can optionally define how tags, terms, etc that you set with the transformer should behave in regards to the tags, terms, etc collected from the source and those already present on the assets stored in Semarchy xDG

You can set two optional properties on each transformer to define this behavior:

  • replace_existing: When set to true, the transformer replaces - instead of adding - the values produced by the transformer to those collected from the source. This property defaults to false.

  • semantics: When set to OVERWRITE (default value), the transformer overwrites all the values stored in Semarchy xDG with those produced by the transformer. When set to PATCH, it adds the values to those in Semarchy xDG.

Example 13. Replace all tags
transformers:
  - type: "simple_add_dataset_tags" (1)
    config:
      tag_urns:
        - "urn:li:tag:ToDo"
        - "urn:li:tag:Review"
      replace_existing: true  (2)
      semantics: OVERWRITE (3)
1 Transformer adding tags to assets.
2 Replace all tags that may be present in the harvested assets.
3 Overwrite all tags for these assets in Semarchy xDG.