Metadata Demystified: A Bottom-Line Definition
What Is Metadata?
Metadata is a broad term referring to information about data. Its purpose is to provide us with an easily understandable context for the information we use. It answers questions we’re likely to ask about any source of data, such as where it comes from, how it’s generated, and when it was created. In data analytics, metadata comes in three main types:
This consists of the business rules and concepts that define the terms we use to describe our data. It’s important because it allows business users to communicate about data in a way that’s meaningful and consistent. Say Sally is a data consumer at her company and she’s discussing requirements for a report with Paul, a developer. Sally asks Paul to create a report showing all transfers from retail stores to the main distribution warehouse due to seasonal inventory clear-outs. Sally explains that these orders are distribution orders from any of the retail store locations to the distribution warehouse and that post-season transfers are identified through a specific order type: 555. Retail store locations have a location type of ‘S’ and the distribution warehouse is location 5432. Sally’s explanation is business-related, and it allows Paul to understand specifically what data he needs to build into the report.
This refers to the locations, mappings, and specifications for technical data resources. Continuing with the previous example, imagine that Paul is unable to find the distribution order data required to create Sally’s report. Paul then meets with John, who is an analyst who has queried transfer data in the past for Sally. John explains to Paul that distribution orders are located in the LOC_TRANSFER_HIST table within the Global_Retail database, and that the location type information he needs is located in the LOC_DETAIL table. Paul is able to identify .. joined to .. on the field LOC_ID as the needed data source for the report. This information is technical, and it allows Paul to find and use the information identified by Sally’s business metadata.
This includes information about the execution and processing of data-related jobs and applications. The last execution time of a job to populate a database is an example of operational metadata. This can also refer to front-end resources. The amount of time a report takes to execute is also an example. Say Paul has developed Sally’s requested report and scheduled it to run and be emailed to Sally every Monday morning at 8am. However, Sally never receives the report. Paul can review the run history of the report to see that the report did run at the correct time but failed due to a connection issue. This information is operational. It helps Paul troubleshoot problems that arise in the data system.
The Value behind Metadata
The three types of metadata mentioned above have different levels of importance for different people within a company, but all are vital parts of a high-functioning business analytics environment. Here are a few key reasons:
Good metadata allows for fast and accurate data analytics. It creates consistency in how a business understands and uses their data, and therefore is one of the primary tools used for data governance.
Bridging the Technology Gap
Data consumers need to be able to ask for the data they need and make use of it once they get it; however it is not practical or desirable for them to have in-depth knowledge of the underlying data structures. The solution is a good metadata framework, which provides consistent and understandable mappings from the business terms these users are familiar with to the technical data resources they need to access. When analysts and developers need to build applications and reports for consumers, good metadata resources provide a very efficient tool for converting business requirements into technical requirements.
Tracking and Troubleshooting
You just received a daily inventory report but something doesn’t seem right. Did the data update last night? How do you find out? Technical metadata allows you to trace back the data in that report to the source and operational metadata tells you when the source last updated and if there were any issues. These play a key role in defining the paths and flows of your data so that when there’s a problem, you can find it.
There’s a lot that can potentially go wrong with metadata. Developing a layer that works with your environment and helps reveal/integrate your data assets is a complex science. Here are a few things to watch out for as you manage it:
Where Is It?
Metadata is ingrained in data analytics software and applications. The problem is rarely whether or not it exists, but rather where it is and how people access it. If users can’t access the metadata they need to answer their questions, then it might as well not be there. If users at your company are forced to ask vague questions about why data resources failed to deliver expected results, they may not have access to the information necessary to understand their issues.
Allowing different business areas to define their relevant metadata can be an effective practice for building it within a company, but be wary of creating islands where definitions from different business areas are incompatible. Maybe Finance claims a sale occurs when the payment posts but Retails says it is when the order is placed. Both may be useful definitions in their areas, but if you don’t have a method to reconcile them you can run into problems.
Inability to Adapt
Metadata needs to be able to grow and change as the business and data behind the business grows and changes. If it can’t be updated, it will quickly become stale and not only useless but potentially misleading and harmful. Avoid static resources in which changes can’t be made efficiently.
Tips for Success
There are a few easy ways to keep your metadata in control and avoid the pitfalls we just discussed. Here’s how you can ensure success:
Metadata should be actively managed and accounted for through metadata management. This can be done in different ways depending on the business need, but it should not be ignored except in small operations where data is controlled by a very limited group of stakeholders. Even in those cases a lack of management will be a problem when the company grows or stakeholders change. When planning for any data project be sure to consider how you can effectively create and manage the metadata behind it.
Plan for Change
Acknowledge from the start that your business and the market you are in will change. If you want to keep up, your data is likely to change as well. As part of your metadata management plan you should include processes for reviewing and modifying it so it doesn’t become stale.
Keep It Simple
Recall that a primary function of metadata is to bridge technology gaps for consumers of data. Create an easy, user friendly interface for those consumers to access the metadata they need. This will allow them to answer some questions themselves and create more specific questions when calling on analysts and developers for assistance. This same principle applies to your processes for updating. If it is not easy for users to do, they will avoid doing it.
Now that you’ve got a handle on what metadata does in your environment and how you can take advantage of it, you can start seeing it as part of the larger analytics process going on at your organization. This includes both the back-end resources bringing you the data you transform and the front-end results you design. Ironside can help with both. Check out these additional resources:
- The Dashboard Design DL: 5 Steps to Get Started
- Get By with a Little Help from Empathy
- ETL vs. ELT: What’s the Big Difference?
- Do You Have a Big Data Problem?
Metadata Management. (n.d.). http://www-01.ibm.com/software/data/metadata-management/
Bloor, Robin. “Does Big Data Mean Big Metadata?” Information Management RSS. 30 June 2014. Web.
Srivastava, Kumar. “The Grand Unified Theory of Metadata Governance.” CIO. 23 June 2015. Web.