A data catalog is a necessity for any data-conscious organization that knows what proper data governance is and wants to adhere to its principles to ensure the data they generate, store, and use every day, are of sufficient quality.
Yet, it can be difficult to initially understand what the exact benefits of having a data catalog are. It’s easy to be preachy and tell you that you simply should have it (read “buy it”) and hide behind a cohort of buzzwords that leave most people with an urge to scratch their heads with confusion rather than an understanding of what a data catalog can actually solve for you.
So, to turn this trend around, let us demonstrate the most beneficial use cases of a data catalog by likening them to five everyday things that you have definitely used to your benefit before.
Staying with the idea of an approaching summer for a while, let’s imagine you’re planning a summer vacation. The first step you want to do is to open a map of the region you will be visiting to plan out the journey. You have a general idea that you will be visiting Italy and you roughly know what cities and landmarks are in Italy, but you want to get the exact picture of what places your journey will take you around. Where can you take a shortcut, where do you take a detour to go sightseeing, and what places should you avoid because they lack anything that would interest you on your road trip?
It's exactly the same with databases. You know that you have databases and are roughly aware of what can be found in them. But to be able to efficiently use that data, you need to get the complete picture of what is where, and how it connects together.
You can create this metadata model rather easily by connecting your database to a good data cataloguing tool and capturing the metadata of your data infrastructure. This will help you to be always in the know about your data, down to being able to learn what type of data is stored where, even drilling down to the column level.
A data catalog is able to give you information about its permitted parameters, such as precision, length, and scale of the values, or whether it is permitted to have NULL values or not.
Overall, a data catalog created to capture the metadata model of a real data source is very much like a map of your holiday destination – an efficient overview of places of interest of varying size and their interconnections.
As established in use case one, while planning a vacation in Italy you want to have a map to see what’s out there. However, when you’re actually there and know exactly what place you want to visit, an entire map can be a little confusing. It’s much better to be able to navigate the map using GPS or a mobile, route planning app that guides you directly to your destination using the shortest possible route.
With a properly created data catalog you are able to discover the physical location of your data of interest very quickly, as long as you know what you are specifically looking for.
Let’s say you want to provide information about revenue generated by a particular product in your next report. You know what product you’re interested in and you know that you’re looking for data about its sales performance.
Data cataloguing tools, like Accurity, feature a variety of ways to identify a certain object in your data catalog.
The most obvious one would be to use an intuitive, Google-like search bar or a set of content filters to simply look up objects that have something to do with “product revenue”, just like you would look up a place on your favorite maps application e.g., Rome on Google Maps, to plot a route there – such definitions are always an integral part of a complete data catalog and are exactly for this purpose.
Another way presents itself when using the data catalog in tandem with a business glossary. If you know what information you’re looking for thematically, you can find a match among the business terms in a business glossary, then use the business data lineage to see which parts of the data catalog correspond to the information described by the business term.
Either way, a data catalog presents you with an easy and quick way to locate in which column, table, and database you can find the data you need to finish your report – directly and without detours.
Let’s say, by way of the above, you’ve arrived at your Italian vacation destination safe and sound. After a long journey all you can think of is how you’re now going to relax with a tasty meal and drink when you go out for a dinner. As you’re sitting down the waiter offers you a choice of local delicacies for a starter – either a Nduja or a Sanguinaccio. But you have no clue what either of these are and could end up being served something that is, in fact, quite gross. It is always better to know what something means.
The same thing can happen when you try to understand what data your organization keeps. Your databases are full of tables named “dbo.SUBJ_PL” or “dbo.QRep” and those contain columns with names like “DEL_COSTS_YTD” or “generaldisc”.
If you’re not the one who originally created those tables and columns, you may be forgiven for thinking “What does that even mean?”. What good does it make to have data about something if you don’t know what that something is?
Just like in that restaurant, you are expected to consume this data without knowing what it is and what it is supposed to tell you, while risking not having valuable monetizable insights if you choose not to consume the data.
That’s why a data catalog can be a very powerful comprehension tool. A completed data catalog not only describes what is contained in a column or a table but using it in tandem with a business glossary gives you access, by using the business data lineage, to a much greater in-depth understanding.
As explained previously, business data lineage in a data catalog is a semantic connection. By generating a business data lineage for a table or a column within the data catalog you can, with just a few clicks, discover that the dreaded “dbo.QRep” contained that consolidated data for quarterly reports you were looking for all along.
Just like in a restaurant, it’s much more reassuring to consume something when you know exactly what’s on the menu.
Ever been to an IKEA store? They are known to be quite accommodating to people who are currently in the process of designing their homes, offices, or overall living spaces, to the point of having entire sections in their stores where customers can sit down and model their new home and furniture, all while the actual home is still under construction.
A data cataloguing tool can provide you with this exact option, only this time for databases and data warehouses (DWHs)!
Your data catalog doesn’t have to be just a captured image of an existing data source that you connected to the tool. You can find yourself in a situation where a new data repository needs to be developed.
Making sure everything in your newly designed space has its proper allocated spot, is arranged and systematically (logically) stored, and everyone involved is aware where that is, is crucial to designing your new database. Just as it is when designing, for instance, your new kitchen.
Not only will this approach greatly increase the systematic organization with which the database architects and developers create the new system, but it also gives you, the business user, a clear and comprehensive view into how exactly it is going to look. Knowing the technical team’s plans also grants you the say in the development and a platform to press your business vision onto the new database.
You wouldn’t leave the task of furnishing your kitchen up to IKEA employees alone, would you? Why would you leave designing your database up to the technical team alone?
Once you have your new kitchen set up and furnished it’s time to cook something, right? And if you want to know how a meal is made properly you want to use a recipe.
A recipe shows you what ingredients are needed as you start the cooking, what the resulting dish should look like, what are the individual steps between the start and the finish, and how exactly each ingredient is used and changed, in order for the mixture to take the form of the succulent dish you want to enjoy.
Keeping data also involves a lot of cooking and you want to keep track of the recipes that are used within your organization for those purposes.
These recipes are called technical data lineages and they document data flows. They describe where a piece of data appears in your systems, where it goes after that, how does it change in order to fulfil its function in each place, and how it eventually reaches the endpoint of its journey.
This is especially useful when you want to keep track of all the various transformations the data undergoes as a part of its life cycle. It’s also crucial to impact analyses that tell you how a change to one part of the data catalog can affect others just by slightly changing the data that’s being kept inside.
It’s kind of like how adding the baking powder at a later stage of the recipe than you’re supposed to can mess up the whole cake, you wouldn’t want to mess up your data by breaking an important link of the data flow. And that’s what a data catalog’s technical lineages are all about.
The above use cases are the most common, and coincidentally, the most beneficial challenges you will be able to solve with a data catalog. By relating buzzwords to real-life scenarios this deep dive strives to prove that data governance is not as complex as it may sometimes seem.
All the pain points highlighted above are what you can conveniently solve with Accurity – the metadata management and data governance platform by Simplity – which contains a powerful data catalog, among many other modules.
Accurity enables the creation of metadata models and data lineages that together make up a detailed, yet holistic, picture of your entire data infrastructure. It is made to document past changes and help you plan future ones. There are also many features built on top of our data catalog, such as data profiling, which allows you to statistically analyze real data contained within your data catalog objects.
Overall, having a data catalog can easily solve major data problems and bring enormous benefits, with a small time and resource investment, versus the expense of efficiency, time and money that stack up, unseen, by not having a data governance framework in place.
You may not see it just yet but imagine what it would feel like to try to find a distant place without using a map, conversing in a foreign language without a dictionary, or preparing a meal with no recipe. Sure, it’s all possible – but, oh, the time it would consume! The sheer number of trials and errors it would take!
Isn’t it easier to simply buy that map?
If you are interested in creating a data catalog for your organization’s data, or if you’d like to know more about Accurity and its features, feel free to schedule a demo with us! We will gladly personally showcase each of the use cases mentioned in this article to you and answer any questions you might have.