Working with Google BigQuery partitioned tables

A partitioned table in Google BigQuery is divided into segments, called partitions. By dividing a large table into smaller partitions, you can improve query performance and control costs by reducing the data read by a query. You specify one column to use for the segmentation.

In Semarchy xDI Designer, you configure BigQuery metadata to hold partition information, and optionally add a custom partition filter. xDI creates queries that only request the relevant partitions, improving query performance and reducing BigQuery costs.

To learn about partitions in Google BigQuery, see the official Google documentation.

Prepare the metadata

Metadata objects for BigQuery use a separate element to describe the partition. This element is saved at the same level as columns, under the DataStore element.

Screenshot of a BigQuery metadata object

To set up integration flows that recognize BigQuery partitions, start by creating BigQuery metadata that understands your partition. You can do this during the reverse engineering process, or manually.

Configure by reverse-engineering

If your BigQuery table already has a partition set up, the reverse engineering process is the easiest method to use.

Follow the process to create metadata for BigQuery tables. During the reverse engineering process, xDI Designer detects the partition, and adds the partition to the metadata automatically.

Configure manually

Add your partition information manually if you are editing or designing metadata manually, or if you cannot reverse-engineer the metadata.

  1. Find your BigQuery metadata object in the Project Explorer.

  2. Open the metadata.

  3. Navigate to the DataSchema element, then to the DataStore element representing the partitioned table.

  4. Open the context menu on the DataStore element, and select New > Partition. A Partition element appears next to the columns.

  5. Open the Partition element properties.

  6. Set the Type field, and other parameters, based on your BigQuery table.

The list of partition parameters is as follows:

Partition types

Type Description

Ingestion Timestamp

Use this type for tables partitioned by ingestion time.

Range

Use this type for tables partitioned based on ranges of values in a specific INTEGER column.

Timestamp

Use this type for tables partitioned on a DATE, TIMESTAMP, or DATETIME column in the table.

Parameters by partition type

Table 1. Parameters for Ingestion Timestamp partitions
Parameter Description

Timestamp Truncation

What time granularity your partitions use. Possible values are:

  • Day

  • Hour

  • Month

  • Year

Table 2. Parameters for Range partitions
Parameter Description

Column

The name of the column holding the partitioning integer range. You must define this column in your metadata first.

Min

The starting value for range partitioning (inclusive).

Max

The ending value for range partitioning (exclusive).

Interval

The interval of each range within the partition.

Table 3. Parameters for Timestamp partitions
Parameter Description

Column

The name of the column holding the partitioning time data. You must define this column in your metadata first.

Timestamp Truncation

The time granularity your partitions use. Possible values are:

  • Day

  • Hour

  • Month

  • Year

Turn on partition usage

After you define columns and a partition in your BigQuery metadata, you are ready to use them in your integration flows.

You do not need to drag and drop partition nodes to a mapping or process. xDI Designer detects partitions in your metadata automatically.

Use partitions automatically

In mappings and processes, xDI automatically calculates which partitions it needs to call.

  • xDI finds mapped BigQuery data.

  • xDI confirms that partitioning is enabled in the integration template.

  • xDI analyzes the source and target data.

    • For source data, it analyzes filters on the source.

    • For target data, it builds a filter based on the mapped source.

  • xDI makes an initial request to Google BigQuery to confirm the partitions that match the filters.

  • The integration flow continues, using the relevant partitions.

Use partitions with template parameters

If you want more control over partition usage, you can set partition parameters manually. This method avoids the initial query to Google BigQuery.

Manual partition settings are only available for the Template INTEGRATION BigQuery integration template.
  1. Open or create a mapping that integrates data to a BigQuery DataStore.

  2. Click the Integration icon on the BigQuery DataStore to open its properties.

  3. In the Integration Properties, make sure Template INTEGRATION BigQuery is the active template.

  4. Check one or both partitioning pruning parameters.

  5. If you checked target partition pruning, set a partition filter.

The parameters work as follows:

Parameter Description

Enable Partition Pruning

This checkbox turns partition usage on or off for source tables.

Source data partitions are calculated from filters on the source table. If there is no filter, xDI ignores partitions and reads the whole table.

Enable Target Partition Pruning

This checkbox turns partition usage on or off for target tables.

To use partitions on the target, the target column must be part of the update key, and mapped from a source.

Target partitions are calculated from the source data mapped to the target. You can override these calculations with the Target Partition Filter parameter.

Target Partition Filter

An SQL filter that xDI uses to set which partitions to use on a target table. This setting overrides automatic partition calculations on target tables.

When writing the filter, prefix the name of the target column with TRG..

For example:

TRG.DAT_VTE BETWEEN DATE_ADD(CURRENT_DATE, INTERVAL - 1 YEAR) AND CURRENT_DATE

Save and test your mapping to make sure your filter works.