Slowly Changing Dimensions

Overview

Use Slowly Changing Dimensions (SCD) to track changes in a dimension table over time while preserving the history of selected attributes.

With the IBM Netezza component, Semarchy xDI provides a dedicated template named INTEGRATION Netezza with Slowly Changing Dimension. This template reads the SCD roles defined on the target datastore columns and applies the corresponding insert, update, and historization logic.

Before you start

Before using the template, define an SCD role for each column in the target dimension datastore. The template requires at least one column with each of the following roles:

  • naturalKey

  • startDate

  • endDate

  • recordVersion

If your model uses a surrogate key, you can also configure a column with the surrogateKey role.

Configure the target metadata

Use the target datastore metadata to define how each column participates in SCD processing.

Role Description

historizeIfModified

When a change is detected on this column, the current row is closed and a new current row is inserted.

updateIfModified

When a change is detected on this column, the current row is updated in place.

startDate

Stores the date or timestamp when the current version of the row becomes active.

endDate

Stores the date or timestamp when the current version is closed.

recordVersion

Identifies the row status. The template sets this value to 1 for the current row and 0 for historized rows.

surrogateKey

Stores a technical key for the dimension row when your model uses one.

naturalKey

Identifies the business entity independently from its historized versions.

At minimum, define the key, version, and date roles required by the template. Then assign historizeIfModified and updateIfModified to the business attributes according to the behavior that you want.

Run the SCD integration

Use the INTEGRATION Netezza with Slowly Changing Dimension template to load the target dimension table. The template uses a pre-integration table to compare the incoming rows with the current target rows and then applies the following logic:

  • If no current target row matches the naturalKey, the template inserts a new current row.

  • If only updateIfModified columns changed, the template updates the current row in place.

  • If one or more historizeIfModified columns changed, the template closes the current row and inserts a new current row.

  • If no SCD-relevant column changed, the template ignores the row.

You can mix historized attributes and in-place updates in the same dimension table.

Date and version management

The template manages the recordVersion, startDate, and endDate columns automatically.

For inserted current rows, the following behavior applies:

  • recordVersion is set to 1

  • startDate comes from the mapped source value, a target-side expression, or current_timestamp if nothing is mapped

  • endDate comes from the Scd Default End Date parameter, which defaults to NULL

For historized rows, the following behavior applies:

  • recordVersion is set to 0

  • endDate comes from the mapped end-date value when one is defined

  • Otherwise, endDate falls back to the mapped startDate value or current_timestamp

Synchronize deletions

The Synchronize Deletions parameter closes rows that no longer exist in the source dataset. When you enable this parameter, the template does not physically delete the missing dimension members. Instead, it updates the current target row as follows:

  • recordVersion is set to 0

  • endDate comes from the Deletions Default End Date parameter, which defaults to now()

Use this parameter when your dimension history must reflect removals from the source system.

Netezza-specific considerations

The template also exposes options that can affect SCD execution and performance on Netezza:

  • Pre Integration Table Distribution Mode controls how the pre-integration table is distributed.

  • Generate Statistics On Integration Table and Generate Statistics On Target Table can improve subsequent query planning.

  • Groom Target Table can be useful after large SCD loads.

  • Clean Temporary Objects removes the pre-integration objects after execution.

Review and tune these options when they are relevant to the size of the dimension table, the volume of changed rows, or the performance of the integration.