Data governance

Data Mesh: The Key to Advanced Data Governance and Efficiency

David Vavruska
April 25, 2023 | 23 min read
Every now and then, the IT industry sweeps businesses around the world off their feet with a new trend. These trends promise to make day-to-day operations faster, cheaper, more effective – overall simpler.

Trends rise and fade away again in varying timeframes depending on how relevant the technology that enabled them is. If the technology required to adopt a new trend is complex and expensive, chances are the trend itself will end before it has been put to wide practical use.

Lately, though, we have seen a data trend that is interesting mainly because it goes against the idea of technology-based improvements. Don’t get me wrong, there still are technologies required, but they are mostly things that have been around for ages already, nothing new or groundbreaking.

No, this trend enables cheaper, faster, and more efficient data processes via the sheer power of organizational skills. It is called a data mesh.

What is data mesh?

To understand the difference data mesh is making, you must first understand the way things have been done up to this point. IT and new technologies overall were originally very distinct disciplines both in the business world and outside of it. People’s skill sets were valued based on their ability to master technologies, programming languages, etc., in a world where the vast majority could not. Many prestigious specialized schools exist today to transform young minds into computer science professionals.

Businesses were mostly thinking along this pattern. Each company would have many different teams and departments based on its market offering, plus a dedicated data team to solve all data-related issues. They would manage what data would get stored, where, how much of it, and who would have access to what.

Do you need to build new reporting? Talk to the data team, and they will arrange for all the data you need to be stored and accessible. Do you need to collect customer information for compliance? Coordinate with the data team, and they will make sure we store everything correctly and make it available when the auditor comes to check.

Convenient, isn’t it? It would seem so – a group of highly specialized professionals dedicated solely to all things data. The problem is as time went on, larger and larger parts of companies became dependent on data. Suddenly every department in the company needs to use data to some extent to do their job properly, and the number of requests on data teams only continues to grow.

The problem is that there are only ever so many people in the data team and insurmountably more people in the rest of the company – and all of them ask the data team for help every day.

With this being the recent state of things, a common saying comes to mind:
“Give a man a fish, and you feed him for the day. Teach a man to fish, and you will feed him for the rest of his life.”

The crux of the data mesh trend could be summarized by a paraphrased version of that saying:
“Give a colleague their data, and you satisfy their requirements for today. Teach a colleague to work with their own data themselves, and you satisfy every data-related requirement of theirs for the rest of their career.”

Data mesh versus centralized data architectures

As established, the strain put by businesses on data teams today is incredibly huge. Day after day, specialized data teams are getting swamped by tasks with no end in sight. Of course, that causes them to underperform in the eyes of their employers. Not that they would be lacking in efficiency; it’s just that there are simply too many requests from all sides.

A lot of processes, therefore, hit the brakes when cooperating with the data team is required, resulting in major process bottlenecks. That, in turn, leads the company to conclude that their data teams are intolerably slow. And as their organization grows, it will only get worse.

There are two possible ways out. Either hire more people to service the system or change the system. Depending on how large the company already is, chances are changing the system is the cheaper option here.

But how to change a system that depends on a group of people having all the knowledge about working with data? Why, that’s simple: you distribute the knowledge about data. To whom? To practically everybody.

You see, as a society, we have achieved amazing leaps forward in technology literacy. Even though they might not have a degree in computer science, it’s most probable many of your line consultants, analysts, and project managers would recognize SQL and be able to comprehend it, at least on a basic level. Many companies have already recognized this and begun further educating their employees on the basics of working with data. One doesn’t have to be a coder by profession to save time by writing a line of the syntax that will make desired results happen today instead of waiting for somebody else to write it four days from now.

This process of distributing data knowledge (and, by extension, responsibility for data) is called data democratization. Its premise is that if you require data for your work, you should be able to understand your data, know how to read it, correctly process it and reach conclusions based on it. Everyone who needs data should be able to work with said data to satisfy their own needs.

Data mesh goes one step further. Data mesh states that the most efficient way of working with data is not to have the company organized with data as a separate discipline but rather to embed data in all the processes and areas that require it. Does the BI team require data to operate? Let’s find a way the BI team could create, store, access, and use data for their reporting on their own!

Does the regulatory team need to check and improve the quality of crucial compliance data before the auditor makes a visit? Maybe it would be easier if they themselves were in charge of it. Data mesh takes many of the responsibilities of a dedicated data team and distributes them across the whole company based on domains. Data relevant to a domain (along with any technology optimal for that data) should be fully managed by those who understand the unique aspects of that domain (and, therefore, its data) the most. A BI specialist probably knows best what kind of data they need for their work, and a compliance specialist knows what form their data needs to come in to ensure their employer’s compliance.

This distribution of data responsibility removes many steps present between identifying a data requirement and actually fulfilling it with relevant data or a data process.

It may seem intimidating at first. Many businesses adapt their teams’ data requirements to fit the database and application technology they use instead of doing it the other way around. However, with the right tools, the smoothness of this approach will eventually outweigh most initial discomfort that such a change would bring to the company.

Another thing that may seem quite scary from the outset but will bring long-term benefits is the dissemination of technology used based on business domains. Following the thought that the best people to deal with domain data are those who are intimately familiar with said domain comes to the idea that the same people know best what tech makes the most sense for them to use. The same goes for the decisions of how your application environment will be interconnected and who will get access to work with what. Not based on their IT expertise but based on their immediate need in order to do their job.

Of course, you will not want each team to just order whichever tool or platform they like at the moment and send you the bill. There needs to be a coordinated selection and implementation process that would ensure there is an IT architecture in place that satisfies everyone. The difference here is that instead of having a centralized team of architects deciding what satisfies other people, you will have groups of stakeholders representing each domain cooperating on creating a collectively beneficial, efficient infrastructure that actually reflects how your business works (not just what methodology your senior architect thinks is neat).

It's often tempting to let the data team select and connect the platforms making up the infrastructure in a way that makes the most technological sense, i.e., the data doesn’t need to undergo transformations when going from system to system or using two platforms because it’s quick and easy to connect them instead of using them because they fulfill all the requirements of those who are supposed to use them – that kind of thing.

Designing an IT infrastructure in a way that reflects the real connections between departments and their purpose makes it also easier for business analysts to identify process bottlenecks and inefficiencies down the road since there is a clear, tangible connection between data, software, and the producers and consumers of data.

Indeed, even before data mesh became a recognizable market trend, the notion that a data infrastructure should reflect the business reality of an organization has existed for over a decade now. In fact, Accurity has been at the forefront of this push for greater organizational alignment between the business and IT worlds.

That is what makes tools like Accurity great enablers of implementing the recent data mesh trend. They allow non-technical users to get hands-on in defining what data is needed in their organization, checking how much in line with their needs the current IT infrastructure is, how business domains connect to each other and to specific data, and much more.

Tools such as Accurity even enable the development of database and data warehouse systems based on the creation of conceptual data models that directly follow their business’ organizational structure and business model, bringing the clarity and transparency to data management that is necessary to achieve any level of data democratization.

The bottom line is that easy-to-use, affordable, and self-service tools that enable the dissemination of data management responsibility among non-technical domain specialists are certainly out there, making any organization’s decision to adapt the data mesh principles a rather quickly realizable goal.

Data mesh versus data fabric

An observant reader probably noticed that I keep pointing out differences between data mesh and “centralized data architectures”, but nowhere did I provide any indication that a data mesh is a decentralized architecture. That’s because data mesh is not an architecture.

According to Dr. Torsten Priebe, lecturer at the Technical University of St. Pölten and Head of Data Intelligence Research Institute (and the technical mastermind behind Accurity), data mesh can be better understood as a set of organizational principles that govern how a company can most efficiently utilize a decentralized data architecture.

The real decentralized data architecture that enables data mesh goes by another term that you have probably also heard – data fabric.

If data mesh is the more conceptual set of organizational principles – a matter of business culture, some might say – data fabric is the technical expression of data democratization that enables data mesh.

A classical data infrastructure would usually adhere to a strict scheme that has a few central data repositories that are being fed from a wide variety of data-producing systems. From these central repositories, data would be distributed on demand among the organization.

If we were to return to our previous fish metaphor, imagine the classic data infrastructure as a supply chain of a fish-producing company. There are elements of the company that catch fish – produce the raw product. The fish needs to be stored in a distribution warehouse before it goes to the market and can be bought. To do that, the fish needs to be frozen to last – just like data needs to be transformed in order to be saved in a data warehouse. From there, the frozen fish is distributed to points of sale where consumers pick them up and transform them again – bake them, fry them, cook them, etc. – based on their individual needs.

The problem with this setup is that you need that comparably small, centralized data (or fish) team that services everyone else. And there is only so much capacity they can have.

Compare this to an example of a decentralized data architecture. Imagine a vast sea (to keep it topical).

Here you don’t have a dedicated fishing company that is taking care of the fish industry. The sea is home to teams of fishermen, each interested in different kinds of fish and with different ways to use them in mind. Each kind of fish is “stored” in a different area of the sea, and nothing really stands in the way of the fishing teams that try to catch them, except that they need to be trained on how to catch the fish, and they need to be given a boat and some fishing rods – the technology – to see it through.

But once you’ve done that, they will serve their own needs on their own, autonomously, and more efficiently than a centralized data (or fish) team ever could.

Having a data mesh without a data fabric architecture in place would be like gathering all those independent fishing groups and saying: “Here is the key to the warehouse. Now go manage this large fishing conglomerate and its supply chains.” That is why you can view data mesh and data fabric as two sides of the same coin. Each can exist on its own, of course, but when both exist in an organization at the same time, they complement each other perfectly and provide more benefits together than each would alone.

How to enable a data mesh in your company

As prefaced above, when adopting data mesh, one exchanges the need to have a small team of data experts for a need to have a tool that enables the majority of your workforce to work with data-like experts. The one downside of a decentralized architecture is that you lose the simple overview and control of the data architecture. To mitigate that, a platform is required that documents all the disparate data sources, describes the data within, and how they connect to which part of your business.

You will need a tool non-technical professionals can use that will help you connect domains to data and to places of origin and consumption of data so that a single point of control is established over the decentralized data landscape.

A platform capable of this, Accurity among them, includes a business glossary, a data catalog, and data lineages.

A business glossary is a place for defining the business part of data. The meaning of it, the domain use of it, the purpose of it. In a business glossary, you justify the business reason for the existence of data. Some glossaries, like Accurity’s, allow you to even create a semantic business model out of the described items to document the ways the domains are interconnected in real life or to project these domain definitions into applications commonly used by the business, such as Power BI. And, as established, there are tangible benefits to building data infrastructure that essentially imitates the real life of your business.

A data catalog is a place that holds the technical information about your entire data infrastructure. It is a large map of each data source, however different from the others it is. You document the data’s technology, type, and parameters. Catalogs provide their users with clarity in navigating a decentralized infrastructure, having all information about the diverse sources of data in one place despite those sources having no centralized point of contact. Some catalogs, Accurity included, also provide statistical information about the data contained inside these data sources.

Data lineages then act as the connective tissue that allows users to multiply the benefits of a business glossary and a data catalog used together. There are several types of data lineage, but the most common are the business data lineage and the technical data lineage.

A business data lineage documents the connection of data and domain. It tells you which data is relevant to which domain team and how that data needs to be used within that domain. This lineage is important for the concept of data mesh, as it enables untrained users to quickly locate the data relevant to their work and put it to use as fast as possible.

A technical data lineage documents the flow of data from data source to data source. It also shows how data transforms in transit. Why is it important? Well, if you have a decentralized architecture where you don’t put emphasis on the compatibility of the ways different data sources you use to store their data, you will end up with many formats data must take. Without it, your decentralized data sources would be unable to communicate with one another. By documenting these operations in the technical data lineage, you are enabling the data fabric concept to be put efficiently to work.

Accurity’s unique offering

This is, of course, made possible by a variety of tools currently offered on the market. However, very few tools actually provide all these crucial functionalities in a single package. Most offer a platform capable of efficiently servicing one of the abovementioned enablers. A data catalog. Or perhaps a technical data lineage tool. Most companies looking to adopt the data mesh then end up having to construct a platform that is made out of three or more tools (and having to pay three or more annual licenses, as well as integration services to make those tools work together).

Accurity comes with a unique combination of advanced semantic models and a business glossary, along with a capable data catalog and a wide array of lineages, and a data quality management solution all in one platform at a competitive, pay-as-you-grow license price.

If a data mesh is a trend your company is currently considering jumping into, why not discuss your possibilities with us in person? Our product experts will demonstrate to you how to enable data mesh in your organization using the best tools for the job.

If you are interested in learning more about how data mesh can accelerate your data governance efforts, we invite you to watch the Accurity Data Vibes LinkedIn live event featuring Petr Mahdalicek, our CEO and co-founder, and Stefan Ruhland, Director at Deloitte Deutschland, as they discuss the topic. Alternatively, you can watch the recording here.

David Vavruska
Product Analyst

Want to see Accurity for yourself?

Get a free personalized 1-on-1 demo.