master data management

This is part four in our five part series on the essential capabilities of the competitive data-driven enterprise.

Businesses have been deploying enterprise data governance (defining what the data should be) and master data management (ensuring the data is as defined) programs for decades. Even if your company doesn’t have a formal master data management program by name, chances are good that they are doing some form of master data management in your data warehouse, CRM or ERP systems. As the trend towards decentralized data analysis continues to progress we see a few forces in play that make the case for incorporating a master data management capability into your organizational roadmap:

The Power of a Single View

Organizations have acknowledged the benefits of bringing together all of their data from all of their disparate systems to maximize their data-driven problem-solving potential, identify new business opportunities, and increase the accuracy of machine learning models. This single view is not only used to power more accurate data analysis, but is also flexible enough to drive your operational business process.

MDM as a component of your broader data governance program can power both analytical and operational functions of your business, creating a powerful feedback loop that continuously improves data maturity.

Increased Machine Learning Potential

Businesses that are investing in data science and machine learning are quickly realizing that some of their greatest opportunities for optimization are stifled by a lack of usable machine learning training data. This is most often due to both insufficient data collection and/or variable execution (e.g. human error, governance) of the business process. An MDM program can benefit your data science initiatives by applying the taxonomies and hierarchies to data that would be needed to power machine learning.

Figure Eight (formerly CrowdFlower) provides a human micro-task driven platform for curating machine learning training data – or performing any MDM related data cleanup task. You can configure various data labeling tasks and leverage their technology platform to improve your data without having to increase your labor force.

The Democratization of Data Analysis

Domain and subject matter experts are becoming increasingly more responsible for developing their own data value hypothesis (and modeling their own data), and they are autonomously doing so with highly accessible and capable self-service analytical tools. Often the most difficult, time consuming, and error prone data wrangling task is entity resolution; Standardizing, deduplicating, cleansing and keying so that data seamlessly resolves down to the unique entity at the center of an analysis (e.g. Customer, Location, Product, Employee, etc.) In this context, mastered data is often vastly easier to blend, analyze, interpret and trust – overall reducing the time and cost for an individual to derive insight from business data.

The Limits of Data Integration Silos

When master data management is implemented within specific functional business systems (e.g. CRM), it can limit the efficacy of the program because:

  • Access to these systems is not always universal
  • Not all relevant business data is integrated into such systems
  • These systems are not often designed with universal data integration in mind, and the cost to master or integrate data that is not native to the business processes they manage can be very high both in terms of initial development and accumulated long term technical debt

The Importance of Data Privacy

As data privacy regulation is rolled out, organizations will be required to manage how key customer data is used in the business. They will need to manage and track consent and usage across all sources and ensure that information is only being utilized for purposes that were authorized by its owner. In addition, concepts such as data obfuscation and masking could enable broader business innovation through both internal and external crowd-sourcing.

Take for example Numerai, a hedge fund that has encrypted sensitive elements of the training data they use to power their trading algorithms, and then published that data as part of an ongoing Kaggle-style data science competition where anyone can compete to improve their performance and earn financial rewards. They’ve successfully used advanced data privacy techniques to both negate potential bias and crowd-source the engine that runs their business without revealing any of their most valuable intellectual property. What could your business do if it viewed data privacy as more than just risk management?

Master Data Management Essentials

If you are in agreement that decentralized data analysis is the most-likely model to which your organization will continue to evolve and mature, then implementing a centralized master data management capability to maximize the efficacy of enterprise data assets and unburden knowledge workers at the edge will be a meaningful way to reduce the organization’s overall time and cost to produce valuable insight from data. Most centralized MDM programs and the platforms that succeed in this pursuit will demonstrate some or all of the following elements:

In addition to classic MDM steward-driven approaches to standardizing corporate data, from the data science orientation new options have emerged for outsourcing the labor of creating good training data for machine learning models. We recommend these new alternatives also be considered to accelerate the process of creating good training data for time sensitive business opportunities.

Vendor Spotlight: Pitney Bowes

Pitney Bowes is a relative newcomer to the master data management space and they have used this stature to their advantage by building their customer-domain focused platform from the ground-up on powerful NoSQL and graph database technologies that allow for extremely powerful relationship analysis and heaps of design agility when building out your domain models.

In addition Pitney Bowes has also capitalized on roughly 100 years of experience in the management of addressable locations in the form of their Master Location Data (MLD) product. If your business deals with customers, addresses and locations at all, and you are attempting to standardize and master that location, we would strongly advise you to look at MLD as turn-key alternative to building your own. The level of accuracy (as well as the ongoing management of change) paired with their extensive geo-enrichment capabilities that extend locations with additional data context (e.g. risk, household income, property attributes, boundaries, etc) is an incredibly compelling value proposition. To brainstorm the possibilities this could create for your business, you can sign up to browse and explore their data marketplace.

Next Up: Elastic Data Processing & Storage

Elastic Data Processing & Storage is the concept around how the cloud changes the economics of data storage and processing in order for an organization to reduce the time and cost of problem solving. Moving your analytic query, data processing and storage to elastic will allow you to see many benefits for the business. There are several features one should consider when deciding how to implement this capability.

 

Incorporate Master Data Management

Designing and deploying a Master Data Management program for your organization will help to maximize your problem solving, increase use of your machine learning, and seamlessly resolve your data down to the center of your analysis. That may be easier said than done, but with a holistic approach and the use of a framework like Ironside’s, you can be sure you have all your business needs covered. We can help you implement a centralized MDM capability to maximize your enterprise data sets and your knowledge to reduce your organization’s time and cost to insight from your data.