Slowly Changing Dimensions
Overview
Use Slowly Changing Dimensions (SCD) to track changes in a dimension table over time while preserving the history of selected attributes.
With the IBM Netezza component, Semarchy xDI provides a dedicated template named INTEGRATION Netezza with Slowly Changing Dimension. This template reads the SCD roles defined on the target datastore columns and applies the corresponding insert, update, and historization logic.
Before you start
Before using the template, define an SCD role for each column in the target dimension datastore. The template requires at least one column with each of the following roles:
-
naturalKey -
startDate -
endDate -
recordVersion
If your model uses a surrogate key, you can also configure a column with the surrogateKey role.
Configure the target metadata
Use the target datastore metadata to define how each column participates in SCD processing.
| Role | Description |
|---|---|
|
When a change is detected on this column, the current row is closed and a new current row is inserted. |
|
When a change is detected on this column, the current row is updated in place. |
|
Stores the date or timestamp when the current version of the row becomes active. |
|
Stores the date or timestamp when the current version is closed. |
|
Identifies the row status. The template sets this value to |
|
Stores a technical key for the dimension row when your model uses one. |
|
Identifies the business entity independently from its historized versions. |
At minimum, define the key, version, and date roles required by the template. Then assign historizeIfModified and updateIfModified to the business attributes according to the behavior that you want.
Run the SCD integration
Use the INTEGRATION Netezza with Slowly Changing Dimension template to load the target dimension table. The template uses a pre-integration table to compare the incoming rows with the current target rows and then applies the following logic:
-
If no current target row matches the
naturalKey, the template inserts a new current row. -
If only
updateIfModifiedcolumns changed, the template updates the current row in place. -
If one or more
historizeIfModifiedcolumns changed, the template closes the current row and inserts a new current row. -
If no SCD-relevant column changed, the template ignores the row.
You can mix historized attributes and in-place updates in the same dimension table.
Date and version management
The template manages the recordVersion, startDate, and endDate columns automatically.
For inserted current rows, the following behavior applies:
-
recordVersionis set to1 -
startDatecomes from the mapped source value, a target-side expression, orcurrent_timestampif nothing is mapped -
endDatecomes from theScd Default End Dateparameter, which defaults toNULL
For historized rows, the following behavior applies:
-
recordVersionis set to0 -
endDatecomes from the mapped end-date value when one is defined -
Otherwise,
endDatefalls back to the mappedstartDatevalue orcurrent_timestamp
Synchronize deletions
The Synchronize Deletions parameter closes rows that no longer exist in the source dataset. When you enable this parameter, the template does not physically delete the missing dimension members. Instead, it updates the current target row as follows:
-
recordVersionis set to0 -
endDatecomes from theDeletions Default End Dateparameter, which defaults tonow()
Use this parameter when your dimension history must reflect removals from the source system.
Netezza-specific considerations
The template also exposes options that can affect SCD execution and performance on Netezza:
-
Pre Integration Table Distribution Mode controls how the pre-integration table is distributed.
-
Generate Statistics On Integration Table and Generate Statistics On Target Table can improve subsequent query planning.
-
Groom Target Table can be useful after large SCD loads.
-
Clean Temporary Objects removes the pre-integration objects after execution.
Review and tune these options when they are relevant to the size of the dimension table, the volume of changed rows, or the performance of the integration.