Labs & musings
Data engineer's guide to data governance (part 1/3) Data engineer's guide to data governance (part 1/3)
Musings / 02.03.2021
This blog series will cover topics of explaining data governance as a process, the technology elements it covers, and describing data engineer's role in data governance.
If you happen to be related to IT in any way, there is only a small chance that you haven’t heard of data governance, especially in recent years when it gained its popularity thanks to the rise of GDPR. Some might call it a buzzword, and just like most things standing behind buzzwords, it is something most people talk about, but do not do. We decided to spend some time discovering what it is all about, and what should the role of a data engineer be in all this. However, we believe that before we start with the engineering part, we need to first go through the basics and create common ground on which to build knowledge further, which we will try to do in this blog. That being said, let us start with the questions and topics we plan to cover today:
- WHAT is data governance?
- WHY data governance?
- HOW LONG and HOW MUCH?
What is data governance?
“The exercise of authority, control, and shared decision making (planning, monitoring and enforcement) over the management of data assets.” would be an official data governance definition by DMBOK.However, whilst investigating this subject, we found a variety of definitions on data governance which mostly overlap but sometimes not so much. Considering how we like things to be simple and understandable, we settled on:
Data governance is a set of principles and practices that ensure high quality throughout the complete lifecycle of your data. It is practical and actionable framework which helps stakeholders identify and meet their information needs.
To summarize, data governance should serve all stakeholders - people that need data in that organization (and even beyond, for instance if data is shared between multiple companies) and it should be applicable for any organization, no matter what its primary business is (for example: pharmacy, finance, retail, telco)
Having said that, data governance is something that you do when you need to have data that can be trusted, easily available, usable, integrated, and secure.
What does data governance cover?
You might wonder whether it’s something that most organizations already do, as who doesn’t ensure high quality of data or follow some principles? Well, the truth is, yes, you probably do some of the things that fall under data governance, especially master data management (MDM) which is a crucial step in any data-related business. However, data governance is here to put all those things “under one governing hat,” to systematize it better and push necessary organizational or technological changes that are needed so all this work will not be in vain. Other elements are risk management, metrics, data quality, policies, processes, and others you can see on picture below.
We will cover more details on organization and technology elements in our next blog, and in the one after that will be MDM and Data Quality, where we plan to concentrate on the data engineer’s role in data governance.
What is NOT data governance?
Now that we have gone through the basic definition of data governance and what that covers, it would be good to explicitly state what data governance is not.
- Data governance is not an exact process.
- There is no algorithm or book that can tell you what the exact steps are to achieve data governance. As we said, it is a set of principals specific to your company and your data but created with global policies like GDPR, PCI DSS or others in mind.
- Data governance is not only about data privacy.
- It is about implementing the process around data privacy or security, and inter-departmental data exchange agreements, but as we stated in the previous section, it covers much more such as MDM, organizational changes etc.
- Storing data in a central repository or a data lake is not governance.
- However, controls around accessing and processing the critical data from that repository that make sense to your organization and bring value, are part of data governance.
- To conclude, data governance is NOT a function performed by those who manage information, meaning there must be a separation of duties between those who manage and those who govern.
Data governance Goals
J. Ladley in his book “Data Governance How to Design, Deploy, and Sustain an Effective Data Governance Program” proclaims that the ultimate goal of data governance is for it to disappear as a stand-alone program and become a part of the business core. Similar to how financial controls and events are already seen as regular activities and not special programs.
Although this is a great end goal, we need to find our own starting point and build the processes from there. We need to find few goals that would highlight what needs to be achieved with these data governance activities that we can later incorporate into our everyday processes. Here we decided to present four as an example. Bear in mind that data governance is a tailored process and the goals, although very similar, might vary from business to business and process to process.
Why data governance?
By now you hopefully have some basic understanding of what data governance is, and what it is not, and why it might be important, but to sum it in an easy and understandable way:
Benefits of data governance
The benefits of successfully implemented data governance for a company are enormous and have a compound effect. From cleaner and leaner data, to data-driven decision making, creating better business results. Positive business results play a key role in increasing your company's reputation, leading to increasing the company's overall market value.
How long and how much?
You might be eager to start implementing its principles as soon as possible, now that we have gone through the definitions and benefits of data governance. Two final questions to think about though are– how long does it take to implement data governance, and how much does it cost?
Well, the bad news is that there is no straight answer to the first question, and the shortest answer to the second is- “a lot.” However, when done properly,data governance is still cheaper than paying for the consequences of not having it implemented. If looking at financial loss, we can mention that cost of breaching GDPR alone can go up to€20 million or 4% of the company’s global annual turnover.
Now, CFO, you might want to take out the calculator and compare the cost of fines with cost of setting up teams (the good news being that you might already have some initiatives in place) and implementing data governance, but before you do that, remember that leaking confidential or sensitive data can have far greater consequences than what is seen on the surface, such as customer turnover or loss of reputation that could bring further financial harm to your company.
If you haven’t started with data governance yet, developing a plan is a good place to begin. Start small, go one step at a time, improve based on feedback and remember that data governance is a continuous and iterative process, and not a one-time project.
Not so fun for the companies in question, but here are some examples of when things went wrong for some well-known companies.
- Changing regulations are undoubtedly the biggest driver for data governance. For example, when the EU’s General Data Protection Regulation (GDPR) made the first attempt at a near-global, uniform approach to regulating the way organizations use and store data.
- Data governance is mandatory under the new law, and failure to comply will leave organizations liable for huge fines – up to €20 million or 4% of the company’s global annual turnover. For context, GDPR fines could wipe off two percentage points of revenue from Alphabet (Google).
- Reputation management can be a huge driver for data governance implementation. A high-profile data breach affected companies like Equifax, Uber and Yahoo. All were met with costly PR fallout. In the case of Equifax, their breach had a price tag of $90 million, as of November 2017.
- A mother who had not yet disclosed her sexuality to her family sued Netflix for privacy invasion, alleging the movie rental company made it possible for her sexuality to be discovered against her wishes when it disclosed insufficiently anonymous information about nearly half-a-million customers as part of its $1 million contest to improve its recommendation system.
LINKS AND REFERENCES
- Ladley John, “Data Governance How to Design, Deploy, and Sustain an Effective Data Governance Program”, 2012 Elsevier Inc.
- COLIBRA, What Data Governance is Not
- PROFISEE, DATA GOVERNANCE – What, why, how, who & 15 best practice
- SNOWFLAKE, Data Governance best practices
- CIO, What is data governance? A best practices framework for managing data assets