Why You Shouldn't Build Your Own Data Product Management Platform

Written by Witboost Team | 4/30/25 9:00 AM

With the explosion of Data Mesh and the Data Product concept, data architecture, data platform, and SDLC teams have been working to understand how to enable the adoption of these new patterns within their organizations.

The adoption of data products promises to solve three critical challenges within data organizations:

Data Ownership
Data Quality
Data Intelligibility

By increasing levels of ownership and autonomy, connecting the software development lifecycle (SDLC) with the way data use cases are developed, and providing platform infrastructure as a service, it is possible to improve the quality of data products at scale and accelerate the speed at which data connects to business problems.

However, adopting data products and increasing autonomy in the development of data-driven business solutions introduces several challenges:

It is necessary to change how people work, as they are traditionally accustomed to different processes.
Standardizing the architectural solutions adopted within data products is essential to prevent IT governance from losing control over tools, practices, and skills and, most importantly, to ensure technical interoperability between data products.
Governance processes must be automated to prevent excessive autonomy from leading to security and regulatory compliance issues.
Effort and data duplication within the ecosystem must be minimized to improve efficiency.
Organizations must have visibility of their data products across all domains, how they are performing, and ensure that users can easily find them.
Reducing the cognitive load of those developing these data solutions is crucial, to prevent them from spending too much time on technical tasks instead of focusing on developing the actual use cases.

Many companies have decided to develop an internal platform in-house to pursue these objectives. The reason is clear: these are not just technological problems but soft problems that involve processes and people. The real challenge is how people work, ensuring they follow best practices and produce high-quality artifacts.

Once organizations recognize these challenges and understand that solving them is critical for successfully adopting and scaling a Data Product-centric model within complex and diverse enterprises, IT will have a huge opportunity.

For IT, this is a great chance to build something valuable for the business and position itself as an enabler of transformation. However, this opportunity, combined with the natural inclination of technical teams to build things, often leads to the decision to develop a completely in-house platform for managing data products.

We call these platforms DIY (Do-It-Yourself) Data Product Management Platforms.

Before creating Witboost, I developed several DIY platforms for Agile Lab customers. Over the following years, I carefully observed how these platforms evolved and what happened to them. Additionally, these experiences provided me with the information necessary to build the business case for Witboost, which we will share and analyze below.

Technical Solution Analysis

Now, let’s examine how DIY platforms are typically built.

DIY platforms almost always start with a strong reliance on CI/CD and Terraform (or other Infrastructure-as-Code tools). This happens for several reasons:

Companies already have experience building SDLC platforms, primarily for application and operational environments such as microservices.
An Infrastructure-as-Code (IaC) tool is already in place within the company to automate infrastructure provisioning.
Platform teams in large enterprises consist of technology experts rather than product experts. They prioritize what they believe to be important or what is technically interesting, rather than analyzing real user problems and what would provide the most value. This is not a criticism — it is a normal phenomenon. Throughout our experience, we've built many frameworks that I we thought were brilliant, only to later discover that nobody wanted to use them.

Another common approach is to create a Data Product Descriptor, which is stored in Git and processed by CI/CD pipelines that trigger Terraform scripts accordingly.

Whenever a change is made to the Data Product Descriptor:

The CI/CD pipeline is triggered, detects the changes in the descriptor, and determines what has been modified.
The pipeline then activates the Terraform scripts to apply the necessary updates.

This approach seems logical and functional at first glance. After an initial MVP, teams begin expanding their capabilities.

However, as complexity grows, a series of problems start appearing. Let's take a look at them all.

Issues with CI/CD

1. How can you control what the user is creating?

The pipeline must validate the structure and content of the Data Product Descriptor.
It must also enforce governance policies to ensure Terraform scripts comply with security and architecture rules.
As the pipeline grows in complexity, it must be modularized and managed like software, introducing dependencies and maintenance challenges.

2. How do you allow sufficient flexibility in Data Products?

Traditional CI/CD pipelines are linear sequences of predefined steps that work for simple Data Products.
However, as use cases become complex, dynamic workflows become necessary — but introducing dynamic workflows into CI/CD pipelines is incredibly difficult.

3. Who manages CI/CD updates and testing?

When the CI/CD pipeline becomes complex software, it requires its own SDLC.
Who ensures the CI/CD system is stable and backward-compatible?
If the CI/CD pipeline breaks, all Data Products are impacted, meaning the entire company is affected.

4. As complexity increases, debugging becomes harder.

Users struggle to configure the pipeline correctly and interpret error messages, leading to frustration and wasted time.

Problems That Arise with Terraform and Infrastructure-as-Code

IaC is designed for infrastructure, not for managing the application lifecycle.

Creating resource groups, workspaces, or databases works fine.

However, managing Data Products requires much more than infrastructure provisioning, such as:

Schema changes
Managing visibility rules and ACLs
Handling policy enforcement dynamically
Reverse engineering

Terraform was never designed for this, leading to unstable and fragile workarounds.

2. Terraform scripts and variables must reside within the Data Product repository.

This means users need to understand Terraform, which adds technical complexity.
Users can modify Terraform files, potentially breaking governance and implementation patterns.

3. Pre-existing Data Products cannot be imported.

Enterprises often already have many Data Products or similar artifacts.
Terraform does not support initializing its state from pre-existing resources, causing significant friction when onboarding existing assets.

🚨 Just to be clear:

I am not saying you shouldn’t use CI/CD or Terraform.
I am saying that these tools alone are insufficient to achieve scalable and successful Data Product adoption.

User Experience Challenges

One of the pillars of Data Product Management is creating cross-functional teams, which means integrating:

Software engineers to build data pipelines.
Data modeling & governance experts to ensure interoperability and quality.
Business stakeholders to define quality rules and ensure alignment with business needs.

If we truly want this model to succeed, we cannot:

❌ Force non-technical users to edit YAML and Terraform files.
❌ Make them work with Git commits and CI/CD debugging cycles.

🚨 Without a strong UX, business users will disengage, pushing all responsibility back to IT, and defeating the entire purpose of Data Products.

DIY platforms often fail because they are designed for hard-core engineers rather than for other users.

Economic Analysis

Before making any evaluations, it is crucial to recognize that technical teams are strongly biased toward underestimating the effort required to build any platform or IT system. The stronger the internal technical team, the greater this bias tends to be.

Below is a list of the main capabilities that a Data Product Management Platform should have. I will divide them into Platform Foundation and Data Product Management-specific features.

Platform Foundation:

SSO (Single Sign-On)
Administration Panel
Integration with SDLC
Technology Abstraction
Notification Engine → to keep people informed about platform and data product changes
Authorization Workflows
RBAC (Role-Based Access Control)
Endpoint Security
Audit Logging
Professional UI/UX → must meet usability and intuitiveness criteria
Accessibility Compliance
Error Handling
Installation and Upgrade Management
Release Management
Documentation & Tutorials
User Support System

Data Product Management-Specific Features:

Data Product Automation for infrastructure creation
Data Product Automation for application deployment
Reverse Engineering of already created infrastructure
Data Product Descriptor Editing
Validation and Testing of Data Product descriptors
Data Product Versioning → minor and major versions
Data Product Cloning
Data Product Deployment Orchestration → must be flexible to support different data products, technologies, and patterns
Access Control List (ACL) Management
Data Product Marketplace
Data Product Prototyping
Computational Policy Engine → for deployment and runtime policy enforcement
Integration with Data Catalogs

Estimating Development Effort

To avoid biases, we asked ChatGPT to estimate the development cost for a platform with these characteristics.
Here is its response:

💡 Estimated total effort: ~3,800 man-days

In our experience, this number is very optimistic, let’s assume it is a valid baseline.

If we assume a team structure consisting of:

1 Product Owner
1 UX/UI Designer
1 Software Architect
~13 Developers

After one year of focused work (without distractions or external dependencies), this team could deliver the platform's foundation. Consider that a team with 17 people is not easy to manage and coordinate; very often, companies start with way smaller teams before realizing they need more power to speed up the process.

However, at this stage, a significant amount of work would still remain, including:

Technology integrations
Implementation of Data Product patterns
Governance policy enforcement
And much more…

This creates two major problems:

Time to Market

What do business teams do while waiting for the platform to become fully operational?

Do they create Data Products informally, without governance?
Are we generating technical debt that will be painful and costly to fix later?

Delaying the official platform may result in the unstructured development of Data Products, which contradicts the very purpose of governance. Also, it will require spending money to reduce the technical and data debt created while the platform reaches a decent state.

2. Long-Term Maintenance Costs

In the software industry, the ongoing maintenance cost of developed code is estimated at 15% per year's initial development cost.

So, even assuming that no new features are added (an unrealistic assumption), let’s do some calculations:

Development Cost (CAPEX)

Developing the core functionalities of the platform will require between €1.5M and €3M.

This estimate assumes an FTE daily cost of €400–800 (varies by geography).

Annual Maintenance Cost (OPEX)

The developed platform will generate an annual maintenance cost of €225K–€450K.

This is equivalent to permanently occupying ~3 full-time employees (FTEs).

💡 However, platforms do not remain static — they require new features and improvements.
Thus, we must also budget at least €500K annually for evolutive development.
These new features will, in turn, generate even higher maintenance costs over time. Progressively, after 3–4 years, the maintenance cost alone will easily exceed 1M Euro.

Is This Estimate Too High?

No — if you want a platform that can evolve.

This estimate is very conservative if the goal is to build a system that does not overfit current requirements and can expand in the future.

Why?

If the platform is too rigid, it may become obsolete when the next paradigm shift occurs (e.g., Data Fabric, Agentic AI).
If the platform is not built with modularity, its evolution will become too expensive.

💡 From our experience, a more realistic estimate is 2–3x higher than 3,800 man-days to reach a high-quality level.

Investment Strategy

Wardley Maps are a well-established tool for strategizing internal investments and identifying where it makes sense to develop internal tools — and where it does not.

Data Product Management is a domain with an emerging but rapidly maturing market. Several companies have already invested over €5 million in R&D, making it highly unlikely that an internal effort could achieve better results, faster time to market, or any meaningful differentiation.

In the 1990s, companies built custom in-house databases. Today, databases are fully commoditized, and developing one internally does not offer a competitive advantage. A similar shift is happening in Data Product Management.

Final Considerations

🚨 Why does this matter?
Because many companies underestimate the complexity of building and maintaining a Data Product Management Platform.

In many cases, an internal DIY approach:

❌ Takes too long to reach adoption
❌ Fails to meet real business needs
❌ Becomes a long-term maintenance burden

💡 This is why many companies are now considering hybrid models:
✅ Buying a mature platform as a foundation
✅ Extending it with adapters, webhooks, policies, plugins

This approach significantly reduces risk, accelerates time to market, and controls long-term costs. 🚀

View full post