How and why Data Mesh is shaping the data management’s evolution
Data Mesh is completely changing the perspective on how we look at data inside a company. Read about what Data Mesh and how it works.
Automation can help companies in reaching the scale of their Data Mesh journey
Just like every informative content in the Data Mesh area should do, I’d like to start quoting the reference article on the topic, by Zhamak Dehghani:
[…] For data to be usable there is an associated set of metadata including data computational documentation, semantic and syntax declaration, quality metrics, etc; metadata that is intrinsic to the data e.g. its semantic definition, and metadata that communicates the traits used by computational governance to implement the expected behavior e.g. access control policies.
A “set of metadata” is hereby to be associated with our Data Products. I’ve read a lot of articles (I’m gonna report some references along the way in the article) about why it’s important and what theoretically can be achieved by leveraging them… they all make a lot of sense. But — as frequently happens when someone talks about Data Mesh related aspects — there’s often a lack of “SO WHAT”.
For this reason, I’d like to share, after a brief introduction, a practical approach we are proposing. It’s open-source, it’s absolutely not perfect for every use case, but it’s evolving along with our real-world experience of driving enterprise customers’ Data Mesh journeys. Most importantly, it’s currently used — in production, not just on articles — at some important customers of ours to enable the automation needed by their platform supporting their Data Mesh.
I believe that automation is the only way to survive in a jungle of domains willing to create Data Products. Automation with respect to:
The pillars together are all mandatory to reach the scale. They are complex to set up and spread, both as cultural and technical requirements, but they are foundational if we want to build a Data Mesh and harvest that so-widely promised value from your data.
But how can we get to this level of automation?
Such a model must be general, technology agnostic, and imagined as the key enabler for the automation of a Data Mesh platform (to be intended as an ecosystem of components and tools taking actions based on specific sections of this metadata specification, which must be standardized to allow interoperability across the platform’s components and services that need to interact with each other).
In order to provide a broader view on the topic, I think it’s important to report some references to understand what brought us to do what we did (OK, now I’m creating hype on purpose ).
According to these references, there’s still no clear metadata-based representation of a Data Product addressing — specifically — the Data Mesh and its automation requirements.
We believe in an Infrastructure-As-Code declarative, idempotent, and versioned approach. The goal is to have a standardized yet customizable specification addressing this holistic representation of the Data Mesh’s architectural quantum that is the Data Product.
Principles involved:
The Data Product Specification aims to gather together crucial pieces of information related to all these aspects, under the strict ownership of the Data Product Owner.
In order to fill up this empty space, we tried to create a Data Product Specification by starting this open-source initiative:
https://github.com/agile-lab-dev/Data-Product-Specification
The repo contains a detailed documentation field-by-field; however, I’d like to point out here some features I believe to be important:
The Data Product Specification itself covers the main components of a Data Product:
NOTE: what is here presented as a “conceptual” quantum can be (and it is, in some of our real-world implementations) split into its main componing parts that are then git-version controlled under their own repositories (belonging to a common group, which is the Data Product one).
The Data Product Specification is intended to be heavily exploited by a central Data Mesh-enabling platform (as a set of components and tools) for:
As I said, this specification is evolving and can be surely improved. Being open-source, every contribution is welcome.
How can a Data Mesh journey reach the scale? With automation to guarantee out-of-the-box compliance of Data Products to the Data Mesh principles.
Data Mesh is completely changing the perspective on how we look at data inside a company. Read about what Data Mesh and how it works.
Integration patterns between operational and analytical monoliths with a data mesh and some use cases.
Data Mesh is completely changing the perspective on how we look at data inside a company. Read what is Data Mesh and how it works.
If you enjoyed this article, consider subscribing to our Newsletter. It's packed with insights, news about Witboost, and all the knowledge we publish!