Entities
Overview
Entities are the key components of logical modeling. Entities are not database tables. They represent business entities of the domain being implemented in the data hub. Examples of entities are Customers, Contacts, Parties, etc.
Attributes
Entities are defined by a set of properties, called attributes. Each attribute has a given datatype.
For example, the Contact entity may contain the following attributes:
-
FirstName and LastName: simple attributes using the user-defined type called GenericNameType.
-
Comments: simple attribute using the built-in type LongText.
-
Gender: simple attributes based on the GenderLov list of values.
-
Address: complex attribute using the GeocodedAddress complex type.
Entity types
Each entity has a given entity type. This type expresses the entity capabilities for match and merge, and authoring. Entity types are:
-
Basic: basic entities do not support match and merge of records. They assume that data comes from a single datasource. This entity type is suitable for entities for which data is authored (or imported) exclusively in the hub, or for simple reference data entities.
-
ID-matched (formerly known as UDPK): ID-matched entities support match and merge of records. They assume that data comes from many data sources, and similar records share an identifier common to all sources. Records in entities using ID matching are matched if they have the same ID and then merged into golden records. This entity type is well suited when there is a truly unique identifier for all the applications communicating with the MDM hub.
-
Fuzzy-matched (formerly known as SDPK): fuzzy-matched entities support match and merge of records. They assume that records come from many data sources that do not share a common identifier. Records need to be matched using their data content. Records in entities using fuzzy matching are matched using a set of match rules defined in a matcher.
The choice of an entity type is important. When creating an entity, make sure to take the differentiators described below into account. |
Basic entity
Here is some useful information about basic entities:
-
Basic entities do not support match and merge.
-
With this type of entity, you must assume that all data come from a single (de-duplicated) source or is authored exclusively in the MDM hub.
-
When authoring or loading data in a basic entity, you simply overwrite the existing golden record with the new data, possibly keeping track of the changes. There is no notion of multiple master records merging into a golden record.
-
This type of entity is particularly suitable for reference data or classification data, which are typically managed only in the hub.
-
Due to the simple nature of these entities, the certification process is simpler and faster for basic entity records than for ID- or fuzzy-matched entity records.
-
A matcher can be defined for the entity, for detecting potential duplicates when manually creating records in the hub.
Use basic entities for data that only exist in the hub, or for data coming from a single datasource into which there are no duplicates. |
ID-matched entity
Here is some useful information about ID-matched entities:
-
ID matching assumes that the multiple applications providing data to this entity share a common ID. This ID can be used as a unique identifier, even for the golden records.
-
This ID is stored in a single attribute which will be the golden data primary key. If the ID in the information system is composed of several columns, you must concatenate these values into the primary key (PK) column.
-
As this ID is common to all systems, similar records are always matched using the ID. A survivorship rule defines how they are consolidated into a single golden record, and how users can override the consolidated values.
-
Although ID matching is faster than fuzzy matching, it still requires consolidating multiple records into a single golden record. The certification process for ID-matched records is slower than for basic entity records.
-
A matcher can be defined for the entity, for detecting potential duplicates when manually creating records in the hub.
-
When authoring an ID-matched entity record in a stepper, you may create a new record that only exists in the hub, or override the consolidated values. The survivorship rule defines how data creation or override takes place for attributes.
Use ID-matching when you need to match and merge records from various sources and have a truly unique and shared identifier for all these sources. |
Fuzzy-matched entity
Here is some useful information about fuzzy-matched entities:
-
Fuzzy-matching means that applications in the enterprise have different IDs, and Semarchy xDM needs to generate a unique identifier (i.e., the PK) for the golden records. This PK can be either a sequence or a unique ID (UUID).
-
Similar records may exist in various systems, representing the same master data, with different IDs. These similar records must be matched using fuzzy-matching methods that compare their content.
-
A matcher defines how similar master records are matched. A survivorship rule defines how they are consolidated into a single golden record, and how users can override the consolidated values.
-
Duplicate managers define the user interfaces into which users review, merge, or split groups of matching records.
-
Due to the complex processing involved with fuzzy-match and then merge records, the certification process for fuzzy-matched records is slower than for ID-matched or basic entity records.
-
When authoring a fuzzy-matched entity record in a stepper, you may create a new record that only exists in the hub, or override the consolidated values. The survivorship rule defines how data creation or override takes place for attributes.
Use fuzzy-matching when you need to match and merge records from various sources and do not have a shared identifier for all these sources. |
ID generation
The entity type impacts the method used for generating the record IDs:
-
Basic entities: the golden record primary key is the ID sent or authored when creating a record. When creating records, this ID may be:
-
Manually provided by users.
-
Automatically generated using a sequence, a universally unique identifier generator (UUID), or a SemQL expression.
-
-
ID-matched entities: the golden record primary key is also the ID that exists in the source systems. When creating source or master records, this ID may be:
-
Manually provided by users.
-
Automatically generated using a sequence, universally unique identifier generator (UUID), or a SemQL expression.
-
-
Fuzzy-matched entities: the golden record primary key is managed and always generated by the system, using a sequence or UUID.
When creating source or master records, this source or master ID may be:-
Manually entered by users.
-
Automatically generated using a sequence, universally unique identifier generator (UUID), or a SemQL expression.
-
When using a sequence for ID generation, you must take into account external source records—that is, records integrated from publishers or records that you will import using applications. These records may arrive with ID values above the current sequence value. Such IDs will collide with future records created in the hub, potentially resulting in unexpected matching (for ID-matched entities) or record updates (for basic entities). To separate the records created into the hub with a sequence ID from external records, it is recommended to set the sequence Start With value above the range of IDs used by possible records. Note that the interactive import available in applications prevent importing records when their ID is above the Start With value. |
An ID generated with a SemQL expression is immutable, which means that it will not change after the initial record creation even if the value of the attributes used in the expression changes. This ID is created when a record form is saved for the first time (e.g., when a record is imported from a file containing no IDs). |
References
Entities are related using reference relationships. A reference relationship defines a relationship between two entities. For example, Employee is related to CostCenter by the EmployeeHasCostCenter relationship.
Data quality rules
Data quality rules are created in the design of an entity. These constraints include:
-
Mandatory columns
-
A list-of-values (LOV) range check
-
A unique key
-
Record-level validations
-
Reference relationships
These constraints are checked on the source records and the consolidated records as part of the certification process. They can also be checked to enforce data quality in data authoring.