Data Mesh

Data Mesh is a relatively new approach to data platform architecture that aims to address the challenges faced by organizations dealing with large-scale, complex data environments. It was introduced by Zhamak Dehghani, a thought leader in the data engineering space, in 2019. The core idea behind Data Mesh is to treat data as a product and shift away from the traditional centralized data architecture to a more decentralized, domain-oriented model.

DATA MESH BENEFITS

Domain-oriented ownership

Data Mesh encourages cross-functional teams to own, produce, and maintain their domain-specific data products. This approach fosters a sense of accountability and responsibility for data quality and availability.

Data as a product

Data Mesh emphasizes the importance of treating data as a valuable product rather than just a byproduct of operational systems. This perspective encourages teams to focus on providing high-quality, accessible, and useful data for internal and external consumers.

Self-serve data infrastructure

Data Mesh promotes the use of self-serve infrastructure and tooling, enabling teams to independently discover, access, and use data. This approach helps to reduce bottlenecks and streamline data workflows across the organization.

Federated governance

Data Mesh supports a decentralized governance model, where domain teams are responsible for managing their data products while adhering to organization-wide standards, policies, and practices.

Data Mesh approach

The Data Mesh approach is designed to overcome challenges such as data silos, complex data pipelines, and the inability to scale data platforms efficiently. By promoting a decentralized, domain-centric approach.

How can Syntio help?

Syntio has been delivering Data Mesh projects for many enterprise customers over the years. We understand the obstacles in an organisation, and the changes needed for this to succeed. We know how to approach the solution to get the best outcome for value creation and cost reduction. We have a technology solution that can accelerate the delivery of data mesh principles, allowing our customers to focus on their business and organisation changes rather than the technology enablers.

More on Data Mesh

Every part of your business needs data to operate. Every decision you make is based on data, as even a gut feeling is just an analysis of the data you have and making a best-guess decision. Your company is the same, you need data to make any business decision or to understand what is happening within your business. The bigger the business, the more data you have and will create over time.

Data Mesh is a paradigm for making data available and sharing data within an organisation. But what is Data Mesh? Well, this is what Wikipedia has to say about it:

“Data Mesh is a sociotechnical approach to build a decentralized data architecture by leveraging a domain-oriented, self-serve design (in a software development perspective) and borrows Eric Evans’ theory of domain-driven design and Manuel Pais’ and Matthew Skelton’s theory of team topologies. The main concern of Data Mesh is about the data itself, taking the Data Lake and the pipelines as a secondary concern. The main proposition is scaling analytical data by domain-oriented decentralization. With Data Mesh, the responsibility for analytical data is shifted from the central data team to the domain teams, supported by a data platform team that provides a domain-agnostic data platform.”

This sounds great, but what does it really mean? And why would my company need it?

On top of the ability to share data which most data platforms can do, Data Mesh looks at how to decentralise the data platform. Decentralisation is a very important part of the success of your Data Strategy. In this case, decentralisation has multiple benefits, not just enabling business agility by removing complex dependencies, but also moving the creation of the shared data to the business domain that creates the data. This means that the data is better understood and that any data shared is shared faster and has better quality. Did we also mention that it is much cheaper to operate than a traditional data platform?

Many companies typically follow the crowd when it comes to innovation. Getting access to data to enable business use cases can take a very long time, with centralised teams working on operating and maintaining a service that is typically very complex and difficult to run. This focus on operations means that requests for new services often have a long lead time. Add to this that the teams running centralised systems usually have less understanding of the actual data they serve can mean that the result, once you do manage to get it, is often of questionable quality.

This is the impact of innovation, if it takes several months and a large financial commitment to try something new, then you will do something that is guaranteed to, most likely, be successful. You can do this by copying what your competitors have done. However, you will always be behind them to market, and this can affect your share as customers will move to competitors that can offer them new, interesting, or easier ways to meet their needs. If you adopt the data mesh approach, however, you can accelerate and reduce the cost of any innovation. When data is already available easily via a product catalogue that any team in the organisation can use, it becomes very easy to try new things, whether it’s customer-facing and offering new ways of interacting, or understanding customer needs, or whether it’s an internal change that frees up the time of your employees so they can work smarter instead of harder. You can even use this approach to reduce cost and complexity within an organisation, reducing the overall cost of operations.

Let’s delve a little deeper into how Data Mesh can enable innovation and reduce risk. As we mentioned above, it’s very risky to spend large amounts of time and money on a hypothesis. It’s far less risky to copy another’s success and indeed, this is a valid business model. However, being at the forefront can help you get new customers, and not just keep your existing ones. When data is available to enable new use cases very quickly and easily you can take more risk, you don’t have to bet a lot of valuable time and money on those amazing hypothesises you have – you can take more risks. You can try maybe a hundred of them and see what works, for the same time and costs as you would use copying 5 use cases from others.

You may be saying to yourself that this sounds fantastic, but what steps do I need to follow to get to this utopia you mention?

The internet is swimming with articles on Data Mesh right now, and many will still be discussing what it means as a concept. In a nutshell, you can boil it down to decentralising the data platform – with each domain in your business having its own data platform, using modern decoupled software. Next, you will need to publish the data to make it available for others to use. This becomes the responsibility of the data creator in each domain, as they have the best knowledge of the data, it’s structure, and its quality. We call this data-as-a-product. Once you have the data published, people need to be able to find it and understand it, and for this, you need a data catalogue. A data catalogue is typically a tool that lists all the data products you have, with an explanation on how to access them, a description of the data and structure, and usually some information on its quality and the rules on its format so you can use it without having to go to the creator and ask questions. The idea is that the data be described well enough that you can use it very quickly.

Ideally, you would also share data in a decoupled way, such that any changes to the data or its structure are additive instead of breaking, or that you share it in such a way that you don’t impact the consumers of the data.

Once you have followed these steps of data-as-a-product, in a decentralised and decoupled way, you have essentially not just enabled Data Mesh, but also Data Democratisation. You will be able to quickly test your new hypothesis, enabling new value in your existing data. On top of this, you will be reducing the complexity of your existing platforms, enabling business agility, and cutting considerable cost from operations.