How and why Data Mesh is shaping the data management’s evolution
Data Mesh is completely changing the perspective on how we look at data inside a company. Read about what Data Mesh and how it works.
Discover the potential of data contracts in Witboost and how they enhance data quality, governance, and interoperability.
Data contracts have emerged as one of the hottest topics in data engineering, gaining momentum over the past two years due to their critical role in managing data quality, governance, and interoperability.
In this comprehensive guide, we'll delve into the transformative potential of data contracts and explore how Witboost's unique approach stands out in the crowded field.
Unlike typical discussions that often focus on the theoretical aspects, we'll provide a detailed, practical look at how data contracts can be implemented, managed, and enforced using Witboost's advanced tools and technologies. Whether you're a data engineer, business analyst, or organizational leader, this post will equip you with actionable insights to leverage data contracts using Witboost.
A data contract is a promise made by a data producer towards data consumers. The latter accepts this promise, transforming it into a contract.
Data contracts aim to establish ownership boundaries around data, their movement and characteristics. A good data contract defines the following:
Check out our recent panel at Data Innovation Summit 2024 in which we discussed how Data Contracts enhance Data Governance and Data Quality
For more on these concepts, you can deep-dive into these resources:
In a data contract's lifecycle there are three phases:
A data contract must be both machine and human-readable. Humans need to be able to understand it and agree on it. For this reason, data contracts must be defined in a declarative way. That's why Witboost leverage YAML when declaring something:
Each data contract has its own YAML descriptor. There are several specifications to describe a data contract:
In Witboost we strongly believe that each company should choose its own standards, regardless if that means picking up existing or creating new ones.
The first step to support data contracts in Witboost is to create a template that defines the standard way to describe the contract and makes it discoverable to the whole organization.
This makes it easy for the entire organization to adopt it. Witboost templates are versionable, so it will be easy for organizations to evolve in parallel with the data contracts. Similarly, it will be easy to create an initial success case and scale up from there. You can also define multiple data contract templates based on your needs (batch, streaming, with circuit break, etc.
The interactive UI guides you to properly define the contract while respecting the standard and avoiding record. This way guardrails are put in place to ensure consistency and quality.
Data contracts are complex objects, but the goal is to democratize them in the organization. They should be easy to use for users who are not familiar with git, YAML editing, and CI/CD.
Our goal in Witboost is to make them completely configurable and deployable within the UI. This ensures that their creation will not be perceived as an extra effort and will not create a tech barrier across all the domains.
Witboost Templates create a user-friendly interface that prevents typos and limits the range of options available. At the same time, the integrated AI/LLM can assist and speed up the creation of all contract parts.
An important part of a contract is the semantic one, including business descriptions and business terms from the glossary. It is important to describe data contracts also from a business standpoint, not only from a technical one.
Consumers need to fully understand the meaning of each field they are consuming. Thinking that your functional/business analyst will collaborate with you directly on git for this simply will... not work.
Witboost helps this cross-functional collaboration by minimizing git and YAML editing complexities, while still relying on both. Here's how it does it.
Witty, the personal assistant within Witboost, helps generate business descriptions and suggests the right business terms. Its LLM/AI capabilities understand the functional context where the data contract will be defined. To give you a specific example: A Data Contract in the finance domain that exposes CashFlow data. This also supports the user in the metadata curation process, which is an often overlooked aspect.
Witboost can be integrated with your existing business ontology and taxonomy, retrieving the right business terms to properly document the data contract.
When you define and declare a data contract you must also physically create it.
Data contracts are an architectural pattern and you can define them in several ways. Each company needs to choose at least one of them. Thanks to its technology agnosticism, there are no limits to how this architectural pattern should look in Witboost.
The platform will never make an architectural decision on your behalf (as it should be everywhere). It's one of the founding principles that guarantees no technology lock-in, setting your company up for success.
Our vision is to enable organizations to embrace the change and evolve their practices.
Under the hood, the templates link with tech adapters/provisioners. These are microservices capable of understanding the descriptor of the data contract and transforming it into physical resources. More than just IaC, tech adapters take care of the application level and the full orchestration across components.
Look at this example in which the adopted architectural pattern is the following:
*hover over the top right side of the image below to zoom in
When a data contract descriptor is submitted, Witboost creates a full self-service experience by:
After creating the data contract descriptor, users will not need to deal with IaC. They don't even need to know anything about it and how it works, and they will not need to fill variables in terraform modules.
A computational policy will check the data contract descriptors before creating them. This check guarantees that all data contracts in the organization respect the standards, traits, and lifecycle.
Let's look at an example: you can define computational policies at deploy-time and runtime in Witboost. The platform team needs to create a deploy time policy that we will call "deploy time data contract guardian".
This guardian will verify that the data contract's new deployment has all the metadata described with the proper format and completeness. You can also check if the contract is doing a breaking change compared to its previous version.
These controls protect the ecosystem of data producers and consumers. They also protect the platform so that it can enforce all the contracts.
Policies can also be easily integrated on Pull Request by implementing a GitHub action or a webhook. They provide information and visualization directly in the PR discussion and avoid merging and approving contracts that are not compliant.
Data contracts can be versioned in two ways:
Finally, you can deploy through the UI or by triggering the CI/CD according to the git flow of the company. You can see the result of the deployment in the Witboost UI. This helps you understand potential errors and lets you obtain the URLs to inspect physical resources.
Once the data contract is in production, it will be visible in the marketplace and Witboost will start the enforcement.
Enforcing a data contract involves verifying if the promise is respected at runtime. The platform must be able to do this by itself.
To verify the promise, Witboost uses a runtime local policy called contract guardian, which runs for each data contract. This means that the contract guardian is only defined once, while each data project gets its own instance of the guardian.
The guardian can be implemented with any kind of pattern. You can use a microservice or a spark job, or anything else because there are no limitations.
It all depends on your organization's architectural choices.
The only caveat is that the guardian implements an API contract, which is considered a local policy by Witboost.
Once the guardian is up and running, it will take the data contract declaration as an input. It will then verify all the traits against the actual data. Every time the contract guardian detects a breach, it raises a policy result in Witboost.
When Witboost receives "K.O." policy result, it notifies the data contract owner and all the downstream data consumers so they can react quickly.
All data contract details are visible in the marketplace, including breaches happening in real time.
Finally, data contracts in Witboost are discoverable and accessible. Data consumers can request access (using the access control mechanism) in case they want to subscribe to the contract. The access request to a data contract implicitly means accepting terms, conditions, and all its clauses.
In the rapidly evolving landscape of data engineering, Witboost's approach to data contracts offers a distinctive blend of flexibility, user-friendliness, and technology-agnosticism.
By emphasizing advanced customization, AI-driven assistance, seamless integration capabilities, and a strong focus on security and governance, Witboost ensures that organizations can implement data contracts effectively without technological lock-in or excessive complexity.
As the concept moves forward, the ability to adapt and evolve with changing data practices will be crucial, and Witboost is uniquely positioned to support this journey. Explore the power of data contracts with Witboost, and transform your data management practices into a streamlined, future-proof system that delivers unparalleled value and reliability.
Data Contracts is just a small part of what Witboost can do. There's a lot more where that came from. Find out what Witboost can do for you below.
Data Mesh is completely changing the perspective on how we look at data inside a company. Read about what Data Mesh and how it works.
The race to become a data-driven company is increasingly heated, challenging the data engineering practice foundations.
Discover the role of Data Mesh Observability in decentralized data ecosystems. Explore its conceptual aspects that can enhance data product quality.
If you enjoyed this article, consider subscribing to our Newsletter. It's packed with insights, news about Witboost, and all the knowledge we publish!