If you have not installed the Designer yet, follow the Install Semarchy xDI Designer tutorial from the main tutorials page. Additional setup information can also be found in the installation guide.
You must first create the metadata that you will use in this unit.
Reverse engineer the Hotel source database
In this tutorial, multiple mappings use a Hypersonic SQL database as a source.
Follow the same steps as in the Create Your First Data Flow With Semarchy xDI tutorial to create that metadata and run the reverse engineering, using the following parameters:
Property
Value
Location
Tutorial – Fundamentals
Technology
Hypersonic SQL
Metadata Model name
Hotel
URL
jdbc:hsqldb:hsql://localhost:62210
User
sa
Password
Schema
HOTEL_MANAGEMENT
Use default values for all the other properties.
Once you have created the Hotel metadata, review the content of the reverse engineered tables: in the Project Explorer, expand Hotel > Hotel > HOTEL_MANAGEMENT, right-click each datastore under the newly created Hotel Metadata and select Actions > Consult data.
Reverse engineer the other delimited files
In the Create Your First Data Flow With Semarchy xDI tutorial, you created the Reference Files metadata corresponding to a folder that contains files used in the tutorials. You also reverse engineered the DiscountRanges.txt file. You will now add another delimited file from the same folder to this metadata.
Add and reverse engineer the Time.csv file
Declare and reverse engineer the Time.csv file from the metadata editor:
In the Project Explorer, double-click the Reference Files metadata.
In the metadata editor, expand the Server node, right-click the Reference Files folder, and then select Actions > Launch File wizard.
On the Properties page of the File Wizard, enter the following properties:
Property
Value
Name
Time
Physical Name
Time.csv
Header Line Position
0
Click Refresh to check the content of the file, and then click Next
On the Reverse page of the wizard, click Reverse to launch the reverse engineering process. The following content appears in the results table:
Name
Position
Size
Type
F1
1
66
String
Click the F1 column and rename it to DAY_DATE.
Click Finish to close the wizard.
Click the icon in the Designer toolbar or press CTRL+S to save the metadata.
Right-click the Time datastore in the metadata editor and select Actions > Consult Data to view the file's content.
Add and reverse engineer the REF_US_STATES.csv file
Follow the same steps to define a delimited file with the following properties:
Property
Value
Name
Physical Name
US_States
REF_US_STATES.csv
Field Separator
,
Header Line Position
1
Reverse engineer this file and review the resulting columns:
Name
Position
Type
STATE_UPPER_CASE
1
String
STATE
2
String
STATE_CODE
3
String
Reverse engineer a positional file
A positional (or fixed) file is a file in which columns are defined by their position and size instead of a text character.
The ref_us_cities.txt file found in the same folder is a positional file. You will now add it to the Reference Files metadata.
Launch the File Wizard as previously.
Click Browse and select the ref_us_cities.txt file.
In the Name field, enter US_Cities.
Modify the Type and choose POSITIONAL.
Check the content of the file by clicking Refresh.
Click Next to go to the Reverse page.
Contiguous columns in a positional file cannot be detected automatically by reverse engineering them. You must define them manually.
To define the three columns for the ref_us_cities.txt file, perform the following steps:
Click to declare a new column.
Set the Name, Position, Size, and Type properties with the values provided in the following table.
Do these steps until the three columns have been added.
Name
Position
Size
Type
ZIP_CODE
1
5
String
CITY
6
72
String
STATE_CODE
78
10
String
Once done, click Refresh to preview the file and check the layout is correctly defined.
Congratulations
You have successfully created all the metadata you will need for the rest of this tutorial.
In the next section, you will learn how to use a filter in a mapping.
Filters in Semarchy xDI allow you to refine data from a source datastore before passing it to a target. In this step, you will create a new mapping with a filter.
Create the Load DIM_PAYMENT_TYPE mapping
Create a new mapping under the mappings folder:
In the Project Explorer view, go to the Tutorial – Fundamentals project and right-click the Mappings folder.
Choose New > Mapping...
Set the File name to Load DIM_PAYMENT_TYPE then click Finish.
Add elements to the mapping:
Add the source datastore by dragging and dropping the T_PAYMENT_TYPE datastore from the Hotel metadata into the mapping diagram.
Add the target datastore by dragging and dropping the DIM_PAYMENT_TYPE datastore from the Datamart metadata into the mapping diagram.
On the mapping diagram, drag and drop the PMT_CODE column from the T_PAYMENT_TYPE datastore onto DIM_PAYMENT_TYPE and select Map - all matching fields.
Map - all matching fields will automatically create mapping expressions for all source and target columns with identical names.
Add a filter
In this section, you will filter out obsolete data coming from the T_PAYMENT_TYPE source table by only keeping records which have their PMT_ACTIVE column set to 1.
To create a Filter on the source datastore:
Drag and drop the PMT_ACTIVE column to the empty space on the mapping diagram.
Select Create a Filter in the dialog.
Click on the created filter in the Mapping Diagram, and in the Expression Editor, review the generated filter expression: T_PAYMENT_TYPE.PMT_ACTIVE=1
Click the icon in the Designer toolbar or press CTRL+S to save the mapping.
Run the mapping
Your mapping is now complete. To run it:
Make sure you have saved the mapping.
Right-click an empty area in the mapping editor and click Run.
Go to the Statistics view and review the results:
Name
Value
SUM(SQL_NB_ROWS)
12
SUM(SQL_STAT_INSERT)
4
SUM(SQL_STAT_UPDATE)
0
In the mapping, right-click the target table and select Action > Consult Data. The four following lines are returned by the query:
PMT_CODE
PMT_NAME
PMT_DESCRIPTION
BT
Bank Transfer
CC
Credit Card
visa,mastercard
CH
Cash
PP
PayPal
only web payment
Explore the generated process
Semarchy xDI Designer allows you to browse the generated process and its code.
Go back to the process generated by xDI and open its editor.
This process is composed of 3 steps:
L1_DIM_PAYMENT_TYPE-Load is the Load step. It transfers the data from the source table to the target database. This step was generated using the Load template defined in the mapping.
L1-DIM_PAYMENT_TYPE-Integration is the Integration step. It was generated using the Integration template defined in the mapping.
L1_DIM_PAYMENT_TYPE-Cleanup is a Cleanup step that cleans up temporary objects created during the execution of the L1_DIM_PAYMENT_TYPE-Load step. This step was also generated using the Load template.
Double-click the L1_DIM_PAYMENT_TYPE-Integration step to open it further. It contains two sub-steps: L1_DIM_PAYMENT_TYPE-Prepare and L1_DIM_PAYMENT_TYPE-Integration.
L1_DIM_PAYMENT_TYPE-Integration integrates the source data into the target table.
Select the Load DIM_PAYMENT_TYPE tab above the editor to navigate back to the parent process.
Double-click the L1_DIM_PAYMENT_TYPE-Load step. It is composed of process Actions and Sub-Processes:
Process Actions are the atomic elements xDI uses during execution: They use the same color coding scheme as process steps.
Sub-Processes are containers used to group and organize Actions.
Double-click the For Simple Table sub-process to open it, then select the Action named Select on source table (SRC).
Review the code that was executed on the source database in the Generated tab of the Step Detail view. Notice the where clause contains the filter that you added to the mapping:
Congratulations! You have successfully used a filter in a mapping. In the next step, you will use complex expressions and learn about the importance of business rules' execution location.
In this step, you will create and optimize a mapping with complex expressions.
Create the Load DIM_BEDROOM mapping
Create a new mapping with the following properties:
Property
Value
Parent folder
Mappings
Mapping name
Load DIM_BEDROOM
Target Datastore
DIM_BEDROOM
Source Datastore
T_BEDROOM
Define the business rules: for each row in the following table, select the Target Column in the target datastore and enter its corresponding Transformation Expression in the Expression Editor.
Target column
Transformation expression
BDR_ID
T_BEDROOM.BDR_ID
BDR_NUMBER
T_BEDROOM.BDR_NUMBER
BDR_FLOOR
case
when lower(T_BEDROOM.BDR_FLOOR) = 'gf' then 0
when lower(T_BEDROOM.BDR_FLOOR) = '1st' then 1
when lower(T_BEDROOM.BDR_FLOOR) = '2nd' then 2
end
BDR_BATH
case
when T_BEDROOM.BDR_BATH = 'true' then 1
else 0
end
BDR_SHOWER
case
when T_BEDROOM.BDR_SHOWER = 'true' then 1
else 0
end
BDR_BAR
case
when T_BEDROOM.BDR_BAR = 'true' then 1
else 0
end
BDR_BED_COUNT
convert(T_BEDROOM.BDR_BED_COUNT,NUMERIC)
BDR_PHONE_NUMBER
T_BEDROOM.BDR_PHONE_NUMBER
BDR_TYPE
T_BEDROOM.BDR_TYPE
UPDATE_DATE
current_timestamp
Run the mapping.
Got to the Statistics view and review the results:
Name
Value
SUM(SQL_NB_ROWS)
60
SUM(SQL_STAT_INSERT)
20
SUM(SQL_STAT_UPDATE)
0
Modify the execution location of an expression
Run this mapping a second time. It will return the following statistics:
Name
Value
SUM(SQL_NB_ROWS)
80
SUM(SQL_STAT_INSERT)
00
SUM(SQL_STAT_UPDATE)
20
20 rows are marked as updated (SUM(SQL_STAT_UPDATE)) because the UPDATE_DATE column is set to the source database's current_timestamp value at every execution. By default, this expression runs on the source datastore, and the mapping always considers it a change when it compares source and target records before integrating the data.
To prevent UPDATE_DATE from being included in the source/target data comparison, you must execute the business rule for this column on the target datastore instead of the source datastore. When you do this, the current_timestamp expression is computed after the data is integrated into the target table, and the field is excluded from the comparison.
To change where the rule is executed:
Select the UPDATE_DATE target column on the mapping.
Click the Execution location icon to the left of the datastore.
Click the Target icon.
Run the mapping.
Go to the Statistics view to check the results. Changes to the UPDATE_DATE field are not counted as updates anymore.
Name
Value
SUM(SQL_NB_ROWS)
60
SUM(SQL_STAT_INSERT)
0
SUM(SQL_STAT_UPDATE)
0
In the next step, you will learn how to remove duplicate rows in a mapping.
In this step, you will create and configure a mapping to deduplicate rows found in the source datastore in order to only work with unique data.
Create the Load DIM_TIME mapping
Create a new mapping with the following properties:
If you view the data in the Time file, you will notice that it contains 4 records per day (one record for every 6-hour time slot):
01/01/2010 00:00
01/01/2010 06:00
01/01/2010 12:00
01/01/2010 18:00
02/01/201000:00
02/01/201006:00
02/01/201012:00
02/01/201018:00
03/01/2010 00:00
03/01/2010 06:00
However, the DIM_TIME target table expects only one line per day. To filter out the duplicates from the dataflow, proceed as follows:
Select the Integration Template on the DIM_TIME datastore.
Open the Properties view.
Click the Use distinct link to enable the checkbox.
Check the Use distinct box.
Run the mapping and review the results in the Statistics view. Observe that the number of inserted rows is inferior to the total number of processed rows thanks to the Use Distinct option.
Name
Value
SUM(SQL_NB_ROWS)
6 576
SUM(SQL_STAT_INSERT)
1 096
SUM(SQL_STAT_UPDATE)
0
In the next step, you will create another mapping and define the functional key for a datastore.
The functional key in a mapping identifies a record and is used to define whether an incoming record is an existing or new record for the target datastore. By default, xDI Designer uses the target datastore's primary key as the functional key.
In this step, you will change a target datastore's functional key, as the default functional key cannot be used.
Create the Load DIM_GEOGRAPHY mapping
First, create a new mapping:
Create a new mapping with the following properties:
Property
Value
Parent folder
Mappings
Mapping name
Load DIM_GEOGRAPHY
Target Datastore
DIM_GEOGRAPHY
Source Datastores
US_States, US_Cities
Define the transformation business rules and their respective execution location as described in the following table:
Target column
Transformation expression
Execution location
GEO_KEY_ID
next value for HOTEL_DATAMART.SEQ_GEO_KEY_ID
Target
GEO_ZIP_CODE
US_Cities.ZIP_CODE
Source
GEO_CITY
US_Cities.CITY
Source
GEO_STATE_CODE
US_Cities.STATE_CODE
Source
GEO_STATE
US_States.STATE_UPPER_CASE
Source
Define the functional key of the mapping
In the DIM_GEOGRAPHY table, the primary key GEO_KEY_ID is not mapped with the source datastore because it is populated with a sequence number. It cannot be used as the functional key to identify new or existing records.
To define another functional key for this datastore:
In the mapping, select the GEO_KEY_ID column on the DIM_GEOGRAPHY datastore and then click the icon to disable this column as the functional key.
Select the GEO_ZIP_CODE column and click the icon to define it as the functional key for this mapping.
The functional key is now GEO_ZIP_CODE, which means that:
If a source record has a GEO_ZIP_CODE value that does not exist in the target table, xDI will create a new record.
If the GEO_ZIP_CODE value already exists in the target table, the matching record in the target datastore will be compared with the source record and updated if needed.
In the next step, you will modify the Load DIM_GEOGRAPHY mapping to apply a specific business rule only to updates.
In the Load DIM_GEOGRAPHY mapping, the GEO_KEY_ID target column is populated by a database sequence that generates a new value every time it is invoked.
Now, imagine that a second US_Cities source file is loaded to update an existing record. If the business rule for GEO_KEY_ID is applied, a new identifier will be generated and a new record will be created instead of updating the existing one.
To avoid this issue, you must configure this business rule so that it is applied only for insertions and not for updates:
Select the GEO_KEY_ID column.
Click the icon next to the datastore's left border to disable the execution of the business rule for updates.
In the next step, you will modify the Load DIM_GEOGRAPHY mapping to perform an inner join between the two source datastores.
Semarchy xDI supports joins to combine data between several source datastores. In this step, you will use an inner join in a mapping.
Add an inner join to Load DIM_GEOGRAPHY
In this mapping, the two data sets US_States and US_Cities are currently disjoint, which means that xDI will combine each line of the first set with each line of the second set. To avoid performing this cartesian product, you must create a join between these two datastores:
Drag and drop the STATE_CODE column from US_States to the STATE_CODE column in US_Cities.
Select Join in the dialog.
Review the generated join expression in the Expression Editor.
Run the mapping and go to the Statistics view to check the results. Note that the number of inserted lines corresponds to the number of rows in US_Cities having a STATE_CODE that exists in REF_US_STATES.
Name
Value
SUM(SQL_NB_ROWS)
125 628
SUM(SQL_STAT_INSERT)
41 693
SUM(SQL_STAT_UPDATE)
0
Run the mapping again.
Name
Value
SUM(SQL_NB_ROWS)
125 628
SUM(SQL_STAT_INSERT)
0
SUM(SQL_STAT_UPDATE)
0
Check the data in the target table.
Great job! You have successfully added an inner join to a mapping. In the next step, you will create outer joins.
By default, the joins added by the Designer are inner joins. In this step, you will define outer joins between datastores to keep the rows that do not have matching values in both tables involved in the join.
Create the Load DIM_CUSTOMER mapping
Create a new mapping with the following properties:
(This column will be loaded later in the tutorial)
Follow the same procedure as in the Create an inner join step to add the following joins. Make sure you select the correct execution location for each join.
First Datastore
Second Datastore
Business rule
Execution location
T_CUSTOMER
T_TITLE
T_CUSTOMER.TIT_CODE=T_TITLE.TIT_CODE
Source
T_CUSTOMER
T_ADDRESS
T_CUSTOMER.CUS_ID=T_ADDRESS.CUS_ID
Source
T_ADDRESS
DIM_GEOGRAPHY
DIM_GEOGRAPHY.GEO_ZIP_CODE=T_ADDRESS.ADR_ZIP_CODE
Staging Area
Define outer joins
In this mapping, we want to retrieve all the records that are in T_CUSTOMER, including the records that have no match in T_ADDRESS or T_TITLE according to the join expressions. Similarly, we want to include the records from T_ADDRESS that have no match in DIM_GEOGRAPHY. To do this, you must use outer joins instead of inner joins:
Select the join between T_CUSTOMER and T_ADDRESS.
In the Properties view, select the radio button next to T_CUSTOMER to define it as the master table (main table) for the outer join.
Follow the same procedure to define the following outer joins:
First Datastore
Second Datastore
Master table
T_CUSTOMER
T_TITLE
T_CUSTOMER
T_ADDRESS
DIM_GEOGRAPHY
T_ADDRESS
Run the mapping and check the results in the Statistics view:
Name
Value
SUM(SQL_NB_ROWS)
300
SUM(SQL_STAT_INSERT)
100
SUM(SQL_STAT_UPDATE)
0
Right-click the target datastore, select Action > Consult data, and then modify the query as follows:
select count(*) from HOTEL_DATAMART.DIM_CUSTOMER where GEO_KEY_ID is null
Run the query and check on the result grid that it returned 11. The source data contains 11 customers that have no recorded address or have an incorrect address. The outer joins defined in this mapping include these customers, but with a NULL address.
In the next step, you will use aggregation in mappings.
In this step, you will create a mapping to load The FACT_BILLING table. This table will be loaded with data from the joined T_BILLING and T_BILLING_LINES source datastores from the Hotel metadata.
Two columns from FACT_BILLING will be populated by aggregating data from source data:
BIL_AMOUNT must contain the total amount of the billing lines (T_BILLING_LINES records) linked to a given billing (T_BILLING record).
BIL_QTY must contain the total quantity from these billing lines.
Define the following transformation business rules. Notice that the rules for BIL_AMOUNT and BIL_QTY contain the SUM aggregate function.
Target column
Business rule
Execution location
Execution scope
Key
BIL_KEY_ID
BIL_ID
T_BILLING.BIL_ID
Source
I/U
Functional key
CUS_ID
T_BILLING.CUS_ID
Source
I/U
TIME_KEY_DAY
DIM_TIME.TIME_KEY_DAY
Source
I/U
DIS_RANGE
DIM_DISCOUNT.DIS_RANGE
Source
I/U
Functional key
PMT_CODE
T_BILLING.PMT_CODE
Source
I/U
BIL_AMOUNT
sum(T_BILLING_LINES.BLL_AMOUNT)
Staging Area
I/U
BIL_QTY
sum(T_BILLING_LINES.BLL_QTY)
Staging Area
I/U
UPDATE_DATE
current_timestamp
Target
I/U
Add the following outer joins:
First Datastore
Second Datastore
Business rule
Execution location
T_BILLING
DIM_TIME
T_BILLING.BIL_DATE=DIM_TIME.TIME_DATE
Staging Area
T_BILLING
T_BILLING_LINES
T_BILLING_LINES.BIL_ID=T_BILLING.BIL_ID
Source
T_BILLING_LINES
DIM_DISCOUNT
round(case
when T_BILLING_LINES.BLL_DISCOUNT_AMOUNT != 0
then (T_BILLING_LINES.BLL_DISCOUNT_AMOUNT / (T_BILLING_LINES.BLL_AMOUNT + T_BILLING_LINES.BLL_DISCOUNT_AMOUNT))*100
else T_BILLING_LINES.BLL_DISCOUNT_RATE
end)
between DIM_DISCOUNT.DIS_MIN and DIM_DISCOUNT.DIS_MAX
Staging Area
Activate aggregation
Before running the mapping, you must indicate which columns use aggregate functions:
Select the BIL_AMOUNT target column:
Click the icon on the left border of the datastore to activate aggregation of this column.
Apply the same procedure to the BIL_QTY column.
Run the mapping and check the results in the Statistics view:
Name
Value
SUM(SQL_NB_ROWS)
53 314
SUM(SQL_STAT_INSERT)
12 107
SUM(SQL_STAT_UPDATE)
0
View the data in the FACT_BILLING table and check that the BIL_AMOUNT and BIL_QTY columns are loaded as expected with the aggregated amounts and quantities.
In the next step, you will learn how to add constraints and leverage reject detection.
Data quality is a major topic in data integration. Semarchy xDI allows defining constraints on datastores that include:
Primary, alternate, and foreign keys
Format constraints
Nullable/mandatory constraints
Custom constraints, configured with conditions.
Designers configure reject detection in Semarchy xDI to detect and possibly quarantine those records that violate the constraints.
In this step, you will learn how to add constraints and enable reject detection.
Enable data quality for DIM_CUSTOMER
Create user-defined constraints
The DIM_GEOGRAPHY table in the target demo database contains two particular records that correspond to invalid geographical data ("No address" and "Unknown zip code").
GEO_KEY_ID
GEO_ZIP_CODE
GEO_CITY
GEO_STATE_CODE
GEO_STATE
0
No Address
1
?
Unknown Zip Code
?
?
You will now create two constraints on DIM_CUSTOMER to check that the value of the GEO_KEY_ID column in DIM_CUSTOMER is different from these values, and reject customers records with invalid geographical data.
To add the constraints:
In the Project Explorer, double-click the Datamart metadata to open the metadata editor.
Find and expand the DIM_CUSTOMER node in the editor.
Right-click DIM_CUSTOMER and select New > Condition.
In the Properties view, click on each field legend to activate the box, and enter the following values for the first constraint:
Property
Value
Name
CK_CUS_GEO_KEY_ID_001
User Message
No address
Reject Code
CUS_001
Description
Keep track of each record loaded into DIM_CUSTOMER without any address.
Condition Expression
DIM_CUSTOMER.GEO_KEY_ID <> 0
In the Project Explorer, right-click DIM_CUSTOMER and select New > Condition again to create a second constraint.
In the Properties view, click on each field legend to activate the box, and enter the following values for the second constraint:
Property
Value
Name
CK_CUS_GEO_KEY_ID_002
User Message
Unknown Zip Code
Reject Code
CUS_002
Description
Keep track of each record loaded into DIM_CUSTOMER with an address containing an unknown Zip Code
Condition Expression
DIM_CUSTOMER.GEO_KEY_ID <> 1
Use default values in Load DIM_CUSTOMER
To use the conditions created in the previous section, you must modify the business rule for the GEO_KEY_ID column in DIM_CUSTOMER: the current rule populates the GEO_KEY_ID target field with the value found in DIM_GEOGRAPHY or sets it to NULL if no value is found.
You will now modify the business rule to set GEO_KEY_ID to 0 if the customer had no address in the source, or to 1 if the customer had an address with an invalid zipcode:
Open the mapping Load DIM_CUSTOMER.
Select the target column GEO_KEY_ID.
In the Expression Editor, replace the existing business rule with the following expression:
case
when T_ADDRESS.ADR_ID is null then 0
when T_ADDRESS.ADR_ID is not null and DIM_GEOGRAPHY.GEO_KEY_ID is null then 1
else DIM_GEOGRAPHY.GEO_KEY_ID
end
An error icon appears next to the GEO_KEY_ID column to alert that the execution location for this business rule has become invalid: the rule cannot run in the source anymore because it involves two different source datastores (leave your mouse pointer a few seconds over the red icon to see the error details as a tooltip).
Modify the execution location so that it runs in the Staging Area.
Enable reject detection on Load DIM_CUSTOMER
You are now ready to enable reject detection on the Load DIM_CUSTOMER mapping:
Select the DIM_CUSTOMER datastore on the mapping diagram.
Click the icon on the left border of the datastore to enable reject management.
A Reject step is added to the mapping and the DIM_CUSTOMER datastore now shows the available constraints.
In this mapping, we want to keep track of the rejects without removing them from the dataflow. To do this:
Select the Reject step on the DIM_CUSTOMER datastore.
In the Properties view, click the Delete Rejects On Checked Table parameter name, and then unselect that option.
Your mapping is now configured to detect and track the rejects while keeping them in the dataflow.
Run the mapping and analyze the results
Run the mapping and review the results in the Statistics view. A new SUM(SQL_STAT_ERROR) statistic provides the number of rejects. Check that 11 rejects were detected:
Name
Value
SUM(SQL_NB_ROWS)
322
SUM(SQL_STAT_ERROR)
11
SUM(SQL_STAT_INSERT)
0
SUM(SQL_STAT_UPDATE)
11
Right-click the target datastore DIM_CUSTOMER in the Project Explorer, select Action > Consult Reject Table.
Run the query to review the 11 lines with errors.
Right-click the target datastore again and select Action > Consult data.
Modify the query as follows to return the number of customers without address:
select count(*) from HOTEL_DATAMART.DIM_CUSTOMER where GEO_KEY_ID = 0
Run the query and check it returns 10 in the result grid.
Modify the query as follows to return the number of customers having an invalid zip code in their address:
select count(*) from HOTEL_DATAMART.DIM_CUSTOMER where GEO_KEY_ID = 1
Run the query again and check it returns 1 in the result grid.
Enable data quality for FACT_BILLING
In this section, you will add a constraint to FACT_BILLING to check that the value of the BIL_AMOUNT column is greater than 10 and enable reject detection for this table. The rejects will be tracked in the rejects table but must still be inserted in FACT_BILLING.
Follow the same procedure as for DIM_CUSTOMER to add the following user-defined constraint to the BILL_AMOUNT column of the FACT_BILLING table: FACT_BILLING.BIL_AMOUNT > 10
Enable reject detection for FACT_BILLING.
The BIL_KEY_ID column is not loaded by this mapping but generated by the target database, hence it must be ignored by the reject detection step. To disable the control of this column, select PK_FACT_BILLING (pk): true on the datastore then click the icon on its left border.
The full name of the constraints becomes PK_FACT_BILLING (pk): false to indicate that the primary key constraint will be excluded from reject detection.
Run the mapping and check the results in the Statistics view. Observe that 2 rows were tracked down by reject detection.
Name
Value
SUM(SQL_NB_ROWS)
53 316
SUM(SQL_STAT_ERROR)
2
SUM(SQL_STAT_INSERT)
0
SUM(SQL_STAT_UPDATE)
0
View the 2 reject lines in the Reject table (BIL_ID 224 and 5994).
Congratulations
You have successfully added constraints and configured reject detection for two mappings. In the next step, you will create a stage to reuse some of the transformations defined in the Load DIM_TIME mapping.
A Stage in Semarchy xDI is a temporary area used in a mapping to combine multiple sources or host reusable transformations.
In this step, you will use a stage to host some of the transformation rules defined in the Load DIM_TIME mapping.
Create a stage
Create a stage in the Load DIM_TIME mapping:
Drag-and-drop the DAY_DATE column on the mapping.
Select Create a Stage > Browse all staging areas.
A dialog box opens to select the stage's schema. Select the HOTEL_DATAMART schema.
Give your stage a more user-friendly name: Select the stage on the diagram and set the Alias to mutualization in the Properties view. This name will be used in all transformation business rules that involve the stage.
Define the stage's columns
Define all the needed fields for the stage:
The DAY_DATE column was automatically added to the stage at creation. Select this column, go to the Expression Editor, and then replace the generated expression with the following business rule:
Run the mapping and review the results in the Statistics view:
Name
Value
SUM(SQL_NB_ROWS)
6 576
SUM(SQL_STAT_INSERT)
0
SUM(SQL_STAT_UPDATE)
0
Mutualizing the transformations in the stage did not change target table contents (no insert and no update were performed on DIM_TIME), but reduced code redundancy in your project.
In the next step, you will create another stage to combine several sources.
In this step, you will combine two different sources to load the DIM_TIME table: the Time.csv file and the T_PLANNING table.
Data from several sources can be combined in a stage before being loaded to a target. Using that feature, designers do not have to create a mapping for each source.
Modify the DIM_TIME load mapping to load both sources in the mutualization stage:
Select the mutualization stage on the mapping diagram.
Click the icon to the left of the Stage to add a set. The stage now shows 2 sets that are represented as tabs: A and B.
Rename the first set: Select the A dataset and then set the Alias to FILE in the Properties view.
Follow the same procedure to rename the B dataset to RDBMS.
Specify the type of union between these datasets: Select the mutualization stage and, in the Expression Editor, enter the following business rule: [FILE] union [RDBMS].
In the Project Explorer, select the T_PLANNING table from the Hotel metadata and drag it into the mapping diagram.
Select the RDBMS set of the stage and enter the following business rules:
Target column
Business rule
DAY_DATE
T_PLANNING.PLN_DAY
TIME_MONTH_DAY
lpad(day(T_PLANNING.PLN_DAY), 2, '0')
TIME_MONTH
lpad(month(T_PLANNING.PLN_DAY), 2, '0')
TIME_YEAR
convert(year(T_PLANNING.PLN_DAY), VARCHAR)
Save the mapping.
Run the mapping and review the results in the Statistics view. Observe that 730 rows were inserted thanks to the union with the T_PLANNING datastore.
Name
Value
SUM(SQL_NB_ROWS)
9 132
SUM(SQL_STAT_INSERT)
730
SUM(SQL_STAT_UPDATE)
0
Congratulations! You performed the union of two sources within a stage.
In this last step, you will create a mapping that loads the FACT_BOOKING table using an inner join. This table will be used in the next unit, Work With Processes.
Create the mapping with the following properties:
Property
Value
Parent folder
Mappings
Mapping name
Load FACT_BOOKING
Target Datastore
FACT_BOOKING
Source Datastores
DIM_TIME and T_BDR_PLN_CUS
Define the transformation business rules:
Target column
Business rule
Execution location
Execution context
Key
BOK_KEY_ID
CUS_ID
T_BDR_PLN_CUS.CUS_ID
Source
I/U
Functional key
TIME_KEY_DAY
DIM_TIME.TIME_KEY_DAY
Source
I/U
Functional key
BDR_ID
T_BDR_PLN_CUS.BDR_ID
Source
I/U
Functional key
BOK_PEOPLE
T_BDR_PLN_CUS.PLN_CUS_PERS_COUNT
Source
I/U
BOK_BOOKED
case
when T_BDR_PLN_CUS.PLN_CUS_BOOKED = 'true' then 1
else 0
end
Source
I/U
UPDATE_DATE
current_timestamp
Target
I/U
Define the following inner join:
First Datastore
Second Datastore
Business rule
Execution Location
DIM_TIME
T_BDR_PLN_CUS
T_BDR_PLN_CUS.PLN_DAY=DIM_TIME.TIME_DATE
Staging Area
Run the mapping and check the results in the Statistics view:
Name
Value
SUM(SQL_NB_ROWS)
43 650
SUM(SQL_STAT_INSERT)
14 550
SUM(SQL_STAT_UPDATE)
0
Great job going through this tutorial! With this unit, you learned to use essential design features available in Semarchy xDI.
What we've covered
You created metadata for various technologies: database, delimited, and fixed files.
You added a filter to a mapping.
You modified the functional key of a datastore.
You learned how to apply a business rule for updates only.
You used joins to combine datastores.
You enabled reject detection.
You learned how to handle duplicate rows in a mapping.
You created a stage to mutualize transformations and combine data sources.
What's next?
In the next unit, Work with processes, you will learn how to use processes to design end-to-end integration flows.