Building a Sovereign Data Platform: An EU-Native Stack Scenario

Written by Witboost Team | 6/5/26 2:02 PM

Let’s start with a clarification: this article is not about picking sides. The major cloud providers, AWS, Azure, Google Cloud, offer outstanding services, and Witboost runs beautifully on all of them. We work with customers using every major hyperscaler, and we have no intention of changing that.

This article is a practical walkthrough for organisations that want to understand what a fully EU-native data platform looks like in practice. We’ll walk through a reference architecture that combines three European technologies: Scality, Stackable, and Witboost. This combination acts as an end-to-end stack for data products and data contracts. Not as the only way, but as one credible, production-ready way for those who have this specific requirement.

We’ve noticed something in the conversations we’ve been having with CDOs and Heads of Data across Europe over the past 18 months. A question that used to come from public-sector procurement offices is now landing in the boardrooms of private enterprises: "Can we run our entire data platform on infrastructure that is fully European-controlled?"

The drivers are varied. For some, it’s regulatory: NIS2, DORA, the AI Act, and the ongoing debate around the European Cybersecurity Certification Scheme (EUCS) is raising questions about jurisdictional control that didn’t exist five years ago.

For others, it’s about risk management in a geopolitical landscape that has become harder to predict. And for a growing number, it’s simply about having a credible option on the table; not as a replacement for hyperscaler services, but as a well-understood alternative that can be activated if circumstances change.

Why is Data Sovereignty Important?

In 2024 data sovereignty was mostly a concern for defence contractors and government agencies. Today, it’s a boardroom topic at banks, utilities, telcos, and manufacturing groups across Europe. What changed?

Several regulatory and geopolitical shifts converged:

Driver	Implementation date	What It Means for Data Platforms
NIS2 Directive	October 2024	Extended cybersecurity obligations to a wider set of “essential” and “important” entities. Supply chain risk assessment now explicitly includes cloud service dependencies.
DORA	January 2025	Financial entities must demonstrate operational resilience of their ICT supply chain, including concentration risk on non-EU providers.
AI Act	Phased 2025-2026	High-risk AI systems require transparency and auditability of data pipelines. Jurisdictional clarity of data processing is becoming a compliance accelerator.
EUCS debate (ongoing)	Ongoing	The European Cybersecurity Certification Scheme initially included a “sovereignty” tier requiring EU jurisdiction. The debate continues and signals the direction of travel.
Geopolitical dynamics	Ongoing	Shifting policies, supply chain bottlenecks due to regional conflict, and tariff discussions have increased awareness of dependency risks in technology stacks, even where no regulation mandates change.

None of these drivers, on their own, mandate that European enterprises abandon hyperscaler cloud services. Most organisations will and should continue to use them, where they provide the best fit. But taken together, they create a strategic imperative for boards and CDOs: know your options.

Understand what a European-controlled alternative looks like, how it performs, and how quickly you could activate it if the regulatory or geopolitical landscape shifts further.

We’ve had customers who started exploring this question purely as a risk management exercise and ended up discovering that a sovereign stack gave them unexpected advantages. Just to name a few:

Full control over upgrade cycles
No surprise pricing changes
The ability to run in air-gapped environments that some of their business units required anyway

The Anatomy of "Sovereign Enough"

One of the traps in the sovereignty conversation is treating it as binary: either you’re on a hyperscaler or you’re sovereign. Reality is more nuanced. Data residency (where data is stored) is not the same as data sovereignty (who controls the infrastructure, the software, and the operational processes around it).

We help our customers with sovereignty requirements think in three layers that needs to be addressed independently:

Layer	What It Covers	Sovereignty Question
Infrastructure	Compute, storage, networking	Is the hardware in an EU data centre, operated by an EU-headquartered entity, under EU legal jurisdiction?
Data Platform	Processing engines, query engines, orchestration, data formats, streaming	Is the software open-source or EU-controlled? Are there dependencies on non-EU SaaS services for core functionality?
Governance & Lifecycle	Data product management, data contracts, metadata, access control, change management	Does the governance layer impose technology choices, or does it work across any infrastructure?

A common pattern we see is organisations that solve the first layer is that they put their data in an EU data centre but overlook the other two. They run proprietary SaaS processing tools that route control plane traffic through non-EU jurisdictions. Or they adopt a governance tool that is tightly coupled to a specific cloud provider, making portability difficult.

True architectural sovereignty means addressing all three layers. And critically, it means doing so without sacrificing the governance and lifecycle management capabilities that make data products operationally viable. Sovereignty without governance is just ungoverned data sitting in a European data centre. It solves a compliance checkbox but not the actual business problem.

A Reference Architecture: Scality + Stackable + Witboost

What follows is a concrete, production-tested reference architecture for organisations that want to run data products and data contracts end-to-end on an EU-native stack. Each component is European-headquartered, open-source or open-core, and independently replaceable. All these components can run in private and air-gapped environments.

The stack is organised in 3 tiers, matching the three sovereignty layers we’ve outlined:

Tier 1 – Sovereign Storage: Scality

Scality is a French company that provides enterprise-grade object storage, deployed on-premises or in sovereign cloud environments. Its S3-compatible API means that any application written for cloud object storage works without modification.

Scalability: From petabytes to exabytes, with sub-millisecond latency and millions of S3 transactions per second.
Cyber resilience: CORE5 five-layer defence model with S3 Object Lock immutability, zero-trust IAM, and erasure coding for multi-fault tolerance.
EU jurisdiction: Headquartered in Paris. Deployed by European governments, healthcare institutions, and defence organisations. No dependency on non-EU control planes.

Scality replaces the role that S3, Azure Blob Storage, or Google Cloud Storage would play in a hyperscaler deployment but with full EU jurisdictional control and on-premises flexibility.

Tier 2 – Open-Source Data Platform: Stackable

Stackable is a German company that provides a modular, Kubernetes-native data platform built entirely on open-source components: Apache Spark, Trino, Apache NiFi, Apache Kafka, Apache Airflow, Apache Hive, and others.

Kubernetes-native: Every component runs as a Kubernetes operator, meaning the platform can be deployed on any K8s cluster: on-premises, in a sovereign cloud, or on a hyperscaler. The infrastructure choice is yours.
Modular composition: You pick only the components you need. Need Trino for SQL federation and Spark for batch processing? Deploy those. Need Kafka for event streaming later? Add it without rearchitecting.
Infrastructure-as-code: All configurations are declarative YAML, managed via GitOps. Repeatable, testable, auditable — no click-ops.
NIS2 and DORA support: Stackable explicitly positions itself for regulated environments, with data sovereignty and compliance traceability as first-class design principles.

Stackable replaces the managed data services that a hyperscaler would provide (e.g., EMR, Dataproc, Synapse) — but with fully open-source components that you operate on your own terms.

Tier 3 – Governance, Data Products, and Data Contracts: Witboost

Witboost is the governance and data product management layer. This is where data products are bootstrapped, data contracts are defined and enforced, metadata is curated, and the entire lifecycle (from creation to retirement) is managed.

Technology-agnostic by design: Witboost does not impose any technology choices. It works with Stackable and Scality in this reference architecture, but equally works on AWS, Azure, GCP, Databricks, or any combination. The governance layer is decoupled from the infrastructure layer.
Data contracts as an open pattern: Data contracts in Witboost are an architectural pattern, not a proprietary format. You define them in the way that fits your organisation. The platform uses tech adapters (microservices) to interpret contract descriptors and transform them into physical resources — regardless of the underlying technology.
Computational governance: Governance policies are expressed as code, tested in CI/CD, and enforced automatically at every lifecycle stage. A shift-left approach that scales governance without creating bottlenecks. This directly supports DORA and AI Act compliance requirements.
Full lifecycle management: From data product bootstrap through development, curation, validation, release, operation, and retirement. Every phase has defined entry and exit conditions, automated guardrails, and clear ownership.
Self-service marketplace: Data producers and consumers interact through a marketplace that handles discovery, access management, and consumption — with governance embedded in every interaction.

The critical point is that Witboost’s governance and lifecycle management capabilities are identical whether you run on a hyperscaler or on this EU-native stack. You don’t lose any functionality by choosing a sovereign deployment. Your data contracts, data products, policies, metadata, and lifecycle processes are portable across any infrastructure.

What This Stack Gives Enterprise Organisations and What It Doesn’t

We believe in honest assessments. A sovereign EU-native stack is not a free lunch. Here’s what you gain and what you’re trading off:

Dimension	What You Gain	What Requires More Effort
Jurisdictional control	Full EU control over data, infrastructure, and software. No non-EU entity can access your data by legal compulsion.	You need to manage your own infrastructure or work with an EU hosting partner (e.g., IONOS, OVHcloud, Hetzner).
Vendor independence	Fully open-source data platform components. No proprietary lock-in at any layer. Every component is replaceable.	You lose the convenience of fully managed services. Your platform team takes on operational responsibility.
Regulatory readiness	Clean compliance story for NIS2, DORA, AI Act. No concentration risk on a single non-EU provider.	You still need to do the regulatory work — the stack provides the foundation, not automatic compliance.
Pricing predictability	No surprise egress fees, no opaque pricing tiers. Infrastructure cost is fully within your control.	You need capacity planning skills. There’s no elastic auto-scaling managed by someone else.
Governance parity	Identical Witboost governance capabilities as on any hyperscaler. No feature gaps.	The initial integration between Witboost, Stackable, and Scality requires platform team investment but our starter kit covers them.

The truth is that this architecture is best suited for organisations that already have (or are willing to build) a capable platform team. The operational model is closer to what you’d expect from a self-hosted Kubernetes environment than from a managed cloud service. For organisations with the right skills, this is a feature, not a bug: it gives you complete control and eliminates the “someone else’s computer” risk factor.

For organisations that prefer fully managed services and don’t have sovereignty as a hard requirement, the hyperscaler path remains excellent. Witboost supports both equally well.

This is the power of technology agnosticism: the same governance layer, the same data contracts, the same lifecycle management, regardless of what runs underneath.

Start from Optionality, Not Ideology

If there’s one message we want to leave you with, it’s this: data sovereignty is an architectural requirement, not a political statement.

The organisations we see making the best decisions are those that approach sovereignty as a dimension of their platform architecture — like scalability, security, or cost efficiency. They don’t start from ideology (“we must avoid US cloud”) or from inertia (“we’ll deal with it if regulations force us”).

They start from optionality: building a governance and lifecycle layer that works across any infrastructure, so that the infrastructure choice becomes a deployment decision rather than an architectural constraint.

Here’s the practical approach we recommend:

Assess your sovereignty requirements honestly. Map your data assets against regulatory exposure (NIS2, DORA, AI Act, sector-specific rules). Identify which data products genuinely need sovereign infrastructure and which are fine on a hyperscaler.
Build your governance layer technology-agnostic from day one. This is the most important architectural decision. If your data contracts, lifecycle management, and metadata are coupled to a specific cloud provider’s tools, you’re locked in regardless of where the data physically lives.
Treat the EU-native stack as a validated option, not a mandatory destination. Having Scality + Stackable + Witboost as a tested, documented alternative gives you negotiating power, risk mitigation, and regulatory preparedness, even if you never deploy it at full scale.
Run your governance across all environments. The real unlock is not choosing one infrastructure over another. It’s running the same governance standards, the same data contracts, the same lifecycle management across all of them. Your data products should be portable; your governance should be universal.

Witboost was built on this principle from day one. Our founding conviction is that a platform should never make an architectural decision on your behalf. It should never constrain your technology choices, your cloud strategy, or your sovereignty posture. Whether you run on AWS, Azure, Google Cloud, or a fully European stack - the governance, the data contracts, and the lifecycle management work the same way.

That’s not neutrality for the sake of neutrality. It’s the recognition that the enterprises we serve operate in complex, multi-geography, multi-regulatory environments where the right answer today might not be the right answer tomorrow. The only responsible architecture is one that gives you the freedom to adapt.

If you’re exploring sovereignty requirements and want to understand how an EU-native stack would work with your specific data landscape, we’re happy to walk through it with you. No ideology. Just architecture.

View full post