The Real Reason Why Chat with Your Data Fails (And Why the Problem Isn’t AI)
Why conversational AI with your data often fails and how proper data governance can transform the outcome. Learn how context is key to unlock AI.
Why treating everything as code is the only path to data that AI can actually trust.
Ask any CDO whether their organisation is AI-ready, and you will hear "yes" — or at least "we're getting there." Press further and a familiar picture emerges: a proof of concept that ran successfully, a handful of LLM-powered dashboards, and an AI strategy deck that was presented to the board last quarter.
Now ask a different question: where does your metadata live? The answer is almost always a patchwork.
This is the real cost of AI-readiness that nobody puts in the business case. Not the model. Not the compute. Not the talent. The cost is the structural inability to produce data that AI can trust — because the artifacts that describe, govern, and guarantee that data are scattered across a dozen systems, maintained by different people, following different processes, with no automated quality control whatsoever.
The industry has spent two decades perfecting infrastructure-as-code. It is time to apply the same discipline to everything else — metadata, governance policies, data contracts, documentation, and quality rules. Not as a nice-to-have, but as the non-negotiable prerequisite for any serious AI initiative.
Metadata Fragmentation Tax
An invisible but compounding cost that grows with every data product. Each product adds more untracked metadata, more ungoverned policies, more stale documentation. The organisation pays this tax in duplicate data, inconsistent semantics, compliance gaps, and — most critically — data that an AI model cannot trust because nobody can prove it means what it claims to mean.
Infrastructure-as-code won. Nobody debates this anymore. Terraform, Pulumi, CloudFormation — the idea that infrastructure should be defined declaratively, versioned, reviewed, and deployed through CI/CD is settled practice. But step outside infrastructure and the picture collapses.
In a typical enterprise data platform, the assets that matter most for AI-readiness are managed through fundamentally different — and incompatible — workflows:
|
Artifact |
Where It Lives |
How It Changes |
Quality Gate |
|
Pipeline code |
Git repository |
PR → review → CI/CD |
Automated tests |
|
Business metadata |
Catalog UI |
Manual edits, Disconnected from change management |
Human |
|
Governance policies |
PDF/Wiki /email |
Committee meeting |
Human judgment |
|
Data contracts |
Spreadsheet/registry |
Ad-hoc updates |
None or manual |
|
Documentation |
Confluence/Wiki |
Sporadic updates |
None |
|
Quality rules |
Embedded in pipelines |
Code change |
Partial |
This fragmentation is not merely inconvenient. It is structurally incompatible with automated quality control. You cannot run a governance check across artifacts that live in six different systems. You cannot enforce metadata completeness if metadata is edited through a point-and-click interface with no validation pipeline. You cannot guarantee that documentation reflects reality if documentation and implementation follow completely different change management processes.
The result is what we call the Metadata Fragmentation Tax: an invisible but compounding cost that grows with every data product. Each product adds more untracked metadata, more ungoverned policies, more stale documentation . The organisation pays this tax in duplicate data, inconsistent semantics, compliance gaps, and — most critically — data that an AI model cannot trust because nobody can prove it means what it claims to mean.
The current conversation around AI-readiness focuses almost entirely on the data itself: is it clean? Is it complete? Is it semantically rich? These are the right questions — but they are asked in the wrong frame.
Making data AI-ready is not a one-time curation project. It is a continuous production process that must produce trustworthy, self-describing, semantically unambiguous data — every day, at scale, across hundreds of data products.
It is a matter of discipline and automation. This is a software engineering problem, not a data stewardship problem.
An AI-ready data product must satisfy 5 key demanding requirements:
In manufacturing, nobody would ship a product without quality gates and well-defined industrial processes. Data management should be no different. If you want to produce AI-ready data, you need processes, and you need quality controls. The question is: what kind of process makes this possible at scale?
The answer is a principle that software engineering settled decades ago: Everything as Code.
This does not mean that every contributor must write YAML or JSON. It means that the underlying model for authoring, change management, and release follows the same discipline used in software development — regardless of whether the user interacts through a form, a template, or a text editor. What matters is not the input surface. It is how the artifacts are represented and managed internally.
When everything-as-code is applied to a data platform, the following artifacts all become versioned, declarative definitions stored in version control:
|
Artifact |
As-Code Form |
What Changes |
|
Metadata |
Declarative descriptors in Git |
Curated through PR workflow; validated automatically |
|
Governance policies |
Computational policy definitions |
Evaluated programmatically at deploy time, not by committee |
|
Data contracts |
Machine-parsed contract specs |
Breaking changes detected automatically; versioned semantically |
|
Documentation |
Structured docs co-located with code |
Updated in the same PR as the implementation change |
|
Quality rules |
Declarative quality assertions |
Tested pre-production; failures block deployment |
|
Configuration |
Environment-specific config files |
Promoted across environments via CI/CD |
The critical insight: when every artifact follows the same change management process, governance becomes enforceable in a single place — the delivery pipeline. The same automated checks validate infrastructure definitions, metadata descriptors, documentation completeness, contracts, and policies before any change is promoted. Governance becomes systematic rather than procedural.
It is also important to distinguish between how artifacts are stored and how they are presented. The internal representation may be a declarative definition in version control, but the way it is visualised can vary: metadata appears as searchable catalog entries, policies as structured rules, documentation as rich pages. The platform optimises presentation for usability while maintaining a consistent operational model behind the scenes.
Everything-as-code transforms governance from an organisational function into an engineering capability. When governance policies are expressed as computational rules rather than PDF guidelines, they can be evaluated automatically at every deployment — without human intervention, without committee meetings, without email chains.
This is the Governance Shift-Left Model, built on four pillars:
Pillar 1 — Metadata as code. Metadata is not an afterthought curated in a catalog UI after the fact. It is a first-class artifact produced during development, versioned alongside the code, and validated before deployment. If metadata is incomplete or meaningless, the CI/CD pipeline rejects the deployment.
Pillar 2 — You build it, you govern it. The team that builds the data product is responsible for its governance. Policies are not imposed externally after the fact — they are injected into the team's development workflow as automated checks.
Pillar 3 — Turn guidelines into guardrails. Written governance guidelines become computational policies. "All data contract fields must have a description" is not a guideline in a wiki — it is an automated check that blocks deployment if violated. "DORA classification must be present" is not a recommendation — it is a deploy-time policy.
Pillar 4 — Context-aware computational policies. Policies are not one-size-fits-all. A data product classified as DORA-critical requires backup policy definitions, RTO and RPO declarations, and audit trail integration. A non-critical product does not. The policy engine evaluates context — classification, domain, sensitivity level — and applies the right rules automatically.
Concrete examples of deploy-time policies that replace manual governance:
These policies are implemented as three types:
There is a compounding benefit that most organisations miss entirely: when artifacts are structured code, AI-assisted tooling can generate, validate, and maintain them.
Modern AI tools — copilots, LLMs, code generation assistants — dramatically accelerate activities that operate on structured artifacts: generation, refactoring, validation, review, documentation, and debugging. Developers already experience this acceleration when writing software.

The same acceleration applies to metadata, contracts, policies, and documentation — but only if these elements are expressed as structured artifacts within the engineering lifecycle. If metadata is trapped inside a graphical catalog interface, if documentation lives in a wiki, if governance definitions exist only in PDFs — they remain largely inaccessible to the AI tooling that is transforming software engineering.
When they are defined as versioned artifacts in Git:
Treating everything as code therefore unlocks the same productivity gains for the entire data product lifecycle that developers already experience in software engineering. Metadata becomes easier to create, governance becomes easier to enforce, documentation stays aligned with implementation, and the platform benefits from the accelerating capabilities of modern development tooling.
This is the virtuous cycle: everything-as-code makes governance automatable, automated governance produces trustworthy metadata, trustworthy metadata makes data AI-ready, and AI tools accelerate the creation of more metadata. Each turn of the cycle increases both quality and velocity.

Organisations cannot move from fragmented metadata management to everything-as-code in a single sprint. The transition follows a deliberate sequence:
|
Stage |
What You Do |
What Changes |
|
1. Inventory |
Map every artifact type: where does metadata, governance, documentation, contracts live today? |
The fragmentation becomes visible and measurable |
|
2. Express |
Convert highest-value artifacts to declarative definitions in version control |
Metadata and contracts enter the PR workflow; changes become reviewable |
|
3. Integrate |
Plug artifact validation into the existing CI/CD pipeline as custom deployment steps |
Governance checks run automatically at every deployment |
|
4. Enforce |
Define computational policies for metadata completeness, business semantics, compliance |
Non-compliant data products cannot reach production |
|
5. Accelerate |
Enable AI-assisted curation: copilots generate metadata, suggest terms, draft contracts |
Velocity increases while quality remains enforced by the pipeline |
Most organisations jump directly to Stage 5 — deploying an AI tool on top of ungoverned metadata — and wonder why the results are unreliable. The sequence is non-negotiable: you cannot trust AI-generated metadata if you have no pipeline to validate it, and you cannot validate it if the artifacts are not versionable and testable in the first place.
The practical starting point: pick your next data product deployment and require that every artifact — metadata descriptor, data contract, quality rules, documentation — lives in the same Git repository as the pipeline code and goes through the same PR review and CI/CD deployment. Add one computational policy: "all data contract fields must have a meaningful description." That single constraint forces the entire workflow to change.
The organisations that will successfully scale AI are not the ones with the best models or the most GPUs. They are the ones that treated their data platform like a software engineering discipline — where every artifact is versioned, every change is reviewable, every quality expectation is automated, and governance is not a committee but a pipeline.
Why conversational AI with your data often fails and how proper data governance can transform the outcome. Learn how context is key to unlock AI.
Transform data silos into interconnected networks with Witboost's approach to data contracts, governance, and automation for enhanced access,...
The Witboost team reimagined software development by locking themselves in a room for two days to create a workflow that challenges traditional...