Reaching data governance is an extensive journey, so let us start with the first steps towards it. Something you cannot skip, if you want to go forward, is documenting what data you currently use, why, and where do you store it. The “grand audit”.
There are two major roads you can take to reach that destination. Bottom-up and top-down.
You set out on this highway for a specific reason. A company runs numerous reports and the databases the reports take their data from. The company has decided that a proper data governance must be established in order to keep track of what data is stored in them and how it is used. Now all the cogs must turn to make it so.
This company already has an idea of what data it has and uses it on a daily basis. What they have to deal with, however, is redundancy and lack of centralization. As the company has evolved with the philosophy of “live and let everyone deal with their own data alone“, for every report there are separate metrics, each with their own data requirement, taking data from different sources. The report stakeholders seldom know what data was used to populate the report and where it is stored. In a setup like this, who could possibly keep track of all the data sources and where is the data from them used? Making sense of such chaos is beyond the capabilities of any human. Yet, in order to continue using data for decision making, they must make sense of it.
Because most of the data is dedicated to reporting use, that is the place from which everything will unravel.
Given that data are measured and understood through the metrics used in the report, the first step must be making sure each metric is properly defined. To that end, companies can utilize a business glossary solution that enables documentation of each metric, its used synonyms, its proper definition, and the way it is calculated.
This first step alone can let us know where the points of redundancy in reports are. It is important to unify metrics that describe the same thing and make sure all terms in the business glossary are mutually exclusive and collectively exhaustive.
The next step comes after the realization that every metric represents an area of interest in data for the company. Each business term acts as a headline explaining what data you keep and why. Now, for the purposes of clarity, you also want to add a “where“.
Now that you realize what data you have, and use on a daily basis, your data infrastructure suddenly turns into a blank map. You know what countries are on the map, all you need to do is assign the proper names to those shapes on the paper. Now comes the time to create a data catalog.
A data catalog helps you make sense of your data infrastructure. It is precisely a catalog of all places that you store data in. Think databases, tables, columns, and everything in between. We have a blog post that explains more about what a data catalog can do.
Normally, creating a list like this would take a person a long time and a lot of effort. That is why data catalog tools exist, that can be connected to each one of your data sources and use metadata to draw a mirror image of your data infrastructure. Using such tools (like Accurity’s Data Catalog) can make creating a data catalog just a matter of a few clicks and creating the connections quick and easy.
Once you have a data catalog in place, documenting every column of every table in every database you run, you will want to connect the previously defined metrics to places in your data catalog that they draw their data from.
You might see this as extra work. You have created the data catalog, as well as created and filled the glossary, you have a sufficient overview over both your business areas of interest, as well as your data storage places. Why would you want to connect them together?
It turns out connecting them comes with innumerable benefits. And they are all enabled by a concept of business data lineage (also known as vertical data lineage). We elaborate in more detail about the concept of data lineage in our free e-book.
Figure 1. Data governance framework with data catalog and business glossary management
The top-down approach presents you with the added benefit of clarity into your infrastructure. It presents a goldmine for business analysts who can use this approach to identify previously hidden opportunities for optimization and effectivization.
But this semantic highway does not always have to be built from the reporting perspective. There are other options. Such as…
The bottom-up approach is beneficial in the way that it allows you to not make mistakes while growing with your data. You will never have to fear the data bloat in your infrastructure.
This is not a highway. It is a slower kind of a road. The requirements for this path are not quite so rapid as with the top-down approach. This time, the systems a company possesses are generally much less developed and matured. They grow along with the company that creates them. And one day, for the first time, the people responsible for data will sit down together and come to the conclusion that their growth is forcing them to change the way they approached data management. From that day on, things must be done in a more documented and organized fashion. In other words, things must be much more methodically planned out.
In this approach the data catalog will be built first because this approach requires establishing an overview of what the data infrastructure looks like in the current moment. That is because the goal here is not to identify and fix mistakes of the past, but rather to avoid making these mistakes as you grow and evolve. The objective here is to make sure things are done properly from this point onwards.
Since the motivation for setting out on the data governance journey, in this case, comes from technical personnel, their first course of action will be to gather information about the various databases, tables, and columns they use. Thus, the information about “what data do we have inside“, will likely be described on the database, table or column level itself. They will use the data catalog to describe what is inside columns and tables, but soon they will find that these definitions repeat themselves – and that, rather than writing the same definition over and over and over again, it is methodically more viable to have the definitions written separately, so that they can be more easily reused.
Having the definition as a separate object they can simply map places in their data catalog to also allow the users to search for other places that share this definition by means of these mappings, and… wait! Aha! Do you see that?
Yes, that’s right. They just found out they need a business glossary and business lineage to connect it all together.
Two very different ways of tackling data governance and they both naturally come to the same conclusion and result. It’s like that evolutionary phenomenon (called carcinization) that causes every organism to eventually evolve into a crab.
These two approaches represent examples of methodology for achieving data governance. They are heavily derived from their use cases, because these two actual examples have been documented with real-world customers. But there are other scenarios that can kickstart this process. Eventually, it all boils down to the individual company situation.
What always remains the same is the need for the methodology to successfully reach a sufficient level of data governance. And the way a proper methodology always ends up leading you to a business glossary and a data catalog connected through business data lineage.
How long and difficult your journey towards data governance will be, depends on your choice of tools. Just like a good pair of quality shoes can make a challenging hike a pleasant walk, and save you from blisters, a good data governance tool will take weeks and months off your journey and save you from methodological pitfalls.
Make sure to choose a tool that supports all three of the initial steps towards data governance: a business glossary, a data catalog, and business lineage – like Accurity!
Accurity was built to enable business and IT users to work in tandem on building proper data governance from whichever direction they choose.
If you would like to know how Accurity would help solve your particular use case, be sure to schedule a free demo with us and we will show you around.