I had an interesting discussion with some friends about Master Data Management, and one of them had the following claim:
With a data quality tool, a data integration product, and a database, I can build an MDM hub from scratch.
Needless to say that this friend is a very good one, and he is a very skilled developer. So he can do what he claims.
I asked him to provide the details, and he did!
- The database would provide the storage and access capabilities for the MDM hub
- The data integration tool (or ETL) would provide the movement and transformation capabilities within the hub.
- The DQ tool would provide the cleansing and de-duplication features (if the ETL does not provide sufficient capabilities).
My answer was:
First, this looks like a data warehouse project, and you would not have shared metadata across the board!
He could only agree: Good metadata management has been the holy grail in data management for years. We still dream of centralized metadata stores supporting any DBMS, DI, DQ, BI (and MDM) product on the market. This store would expose the consolidated metadata in a single place for all of us (business users, developers, data stewards, etc) to search, browse, analyze.
Well, on the metadata front, as far as I know, we are not there yet! But, what about metadata for MDM?
If you look at it, the relevant information for building a master data hub is very straightforward: It is the description of the things to store in the hub (customers, products, etc…, aka entities), along with their attributes, constraints, references, enrichment and de-duplication rules, display options, etc.
This metadata may be complex to design (well, as complex as the entities to store in the hub), but should not be complex to manage and should be entirely leveraged for building/running the hub.
What is sold today as MDM “Suites” or “Offerings” are often bundles of several products providing features to do MDM one way or another. They frequently associate a modeling tool supporting hierarchies with an ETL and a DQ product.
Designing an MDM project using such sets of tools simply creates silos of metadata into which the MDM designer must navigate painfully. Besides, it requires to master three distinct products:
- You must design the Physical structure of your hub in a modeling tool and have this structure pushed to the DBMS.
- You must design the data movement and transformation within the hub in a Data Integration tool, and add some business processes as well (Did I mention the BPM tool ?).
- You must design and tie back the Data Quality processes to the data integration process manually.
In addition to mastering these various tools, you must then invent the patterns and logic to make the MDM work correctly. All the processes and successive structures needed for cleansing, de-duplicating, consolidating data from several sources, and so forth end up being homemade. In a nutshell, the metadata may be there but it is not really used. Now, let’s look again at the relevant information for building a master data hub. If you remember correctly, I wrote: … description of the things to store in the hub (customers, products, etc…, aka entities ), along with their attributes, constraints, references, enrichment and de-duplication rules, display options, etc.
With all this information, it should be possible to automatically generate the bits and bytes of the MDM hub, including:
- Physical structures for storing the golden data.
- Endpoints (inputs) for applications to publish their raw data to the hub
- Endpoints (outputs) for applications to consume the master data from the hub
- All the processes and structures to automatically enrich, validate, match, de-duplicate and certify golden data from the raw data.
- User interfaces to view and edit the master data as well as the various stages of the process described above.
- Human workflows to validate and correct erroneous data
This is the “magic” that you should expect from a true MDM platform. With a completely metadata-driven approach, you would focus on what really counts: the metadata, that is in essence the knowledge of your master data.
At the end of the discussion, my friend and I agreed (and I hope you too will agree) on the following statement:
MDM? It’s all about Metadata (Baby)!