The "Wild West" of AI Infra is Over. Kubermatic is officially Kubernetes AI Conformant

November 11, 2025

For years, building AI infrastructure felt like the “Wild West”, with all the fragmented tools, proprietary stacks, and exploding costs. Today, that changes.

At KubeCon North America, the Cloud Native Computing Foundation (CNCF) officially launched the Kubernetes AI Conformance program. It is now the new global technical standard for how AI runs on Kubernetes.

We are incredibly proud to announce that Kubermatic is one of the very first platforms globally, and among the first in Europe, to be officially Kubernetes AI Conformant for Kubernetes v1.34.

What “AI Conformance” actually means

While ISO standards like ISO 42001 focus on management and governance, Kubernetes AI Conformance is purely technical. It defines the APIs, capabilities, and configurations a Kubernetes cluster must support to run AI/ML workloads efficiently and securely.

In short: If a platform is “AI Conformant,” you can train and deploy models on it, and move them to another conformant cluster without rewriting anything. This means freedom from vendor lock-in and a clear path toward portable, sovereign, AI Infrastructure.

Kubermatic’s journey to becoming Kubernetes AI Conformant

To define the standard for “Kubernetes AI Conformance”, a dedicated “Working Group” was formed within the Kubernetes project, sponsored by the Architecture and Testing Special Interest Groups (SIGs). Kubermatic is proud to be an active member.

Since KubeCon Europe in London, the Working Group has worked intensively to identify the core technical pillars needed to address the unique challenges of AI workloads.

The outcome of this effort was a requirements catalog that every platform must meet to be considered Kubernetes AI Conformant.

Kubermatic already had most of the foundational pieces, but the conformance process pushed us to validate, document, and harden them to production-grade levels. Kubermatic KKP 2.29 delivers on every one of these mandatory requirements.

Here’s how Kubermatic implemented each requirement to achieve full AI Conformance:

1. Efficient Accelerator Management for AI Training

In non-standard environments, this leads to severe “resource fragmentation,” where large parts of valuable GPU memory remain unused, and “topology blindness,” where scheduling is not optimized for multi-GPU workloads. This results in massive over-provisioning and exploding costs.

KKP 2.29 implements this today. DRA, stable since Kubernetes 1.34, provides a flexible new way to request and share complex hardware resources. Functioning much like the familiar PersistentVolumeClaim model for storage, DRA allows users to “claim” the exact resources they need from defined “classes” of devices, while Kubernetes handles all the complex scheduling and node placement automatically. Learn how to employ DRA in KKP.

2. Advanced Ingress for AI Inference

AI inference workloads (the models in production) are fundamentally different from traditional, stateless web applications. They are often long-running, resource-intensive, and partially stateful, making them a poor fit for standard load balancers.

KKP delivers this through the KubeLB 1.2 AI Gateway. This feature integrates the open-source Kgateway and its powerful inference extension, providing optimized routing and load balancing specifically for AI workloads. Its capabilities include weighted traffic splitting and header-based routing, which is important for OpenAI protocol headers.

3. Scheduling & Orchestration

Distributed AI training jobs often require multiple components (Pods) to be started simultaneously. A traditional scheduler that places Pods one by one can lead to “deadlocks,” where a job gets stuck because not all of its parts can find resources, all while blocking the resources it has already been assigned.

KKP 2.29 includes Kueue job scheduler in its default application catalog, ensuring that distributed AI workloads are scheduled efficiently.

Furthermore, KKP meets the autoscaling requirements, providing a fully functional Cluster Autoscaler capable of scaling accelerator node groups and support for the HorizontalPodAutoscaler (HPA) with custom GPU metrics.

4. Monitoring and Metrics

The new generation of AI workloads and specialized hardware creates a “blind spot” in monitoring. There is “currently no standard way to collect accelerator metrics,” and operators lack the tools to quickly debug complex AI infrastructure problems.

KKP’s robust, built-in monitoring stack for user clusters fulfills these requirements. It enables the collection of detailed performance metrics from supported accelerator types (e.g., utilization, memory) via a standardized endpoint. It also provides a monitoring system capable of discovering and collecting metrics from any workload that exposes them in a standard format (e.g., Prometheus exposition format).

5. Security and Resource Separation

Expensive accelerators (GPUs) are a shared resource. Without strict isolation at the kernel and API levels, workloads in one container could access or interfere with the data or processes of workloads in another, posing a significant security risk in multi-tenant environments.

Access to accelerators must be properly isolated and mediated via Kubernetes resource management frameworks (like DRA or device plugins). KKP’s implementation of these frameworks ensures that access from within containers is properly isolated, preventing unauthorized access or interference between workloads, a core requirement for secure, multi-tenant AI platforms.

6. Robust Operator Support

Modern AI frameworks like Ray or Kubeflow are themselves complex, distributed systems that run as “Operators” on Kubernetes. If the core platform (e.g., its webhooks, CRD management, or API server stability) is not robust, these operators will fail, rendering the entire AI platform useless.

KKP 2.29 is proven to do this, with full support for reliably installing and running complex add-ons, including Kubeflow. This includes verifying that the operator’s pods, webhooks, and custom resource reconciliation function correctly.

KKP 2.29 is a Kubernetes AI Conformant Platform

After completing the conformance checklist and submitting our results, Kubermatic officially passed all validation criteria for Kubernetes 1.34.

You can see our certification record here: Kubermatic AI Conformance on GitHub.

Why This Matters for the Future of AI in Europe

For European enterprises, this is about digital sovereignty. With an open, community-driven standard backed by the CNCF, organizations can finally deploy AI workloads anywhere: public cloud, private data center, or edge, without sacrificing compliance or control.

As our CEO, Sebastian Scheele, puts it:

The future of AI will be built on open standards, not walled gardens. This conformance program is the foundation for that future, ensuring a level playing field where innovation and portability win.

We are ready to help you build that future today.

Learn More

Mario Fahlandt

Customer Delivery Architect

Kubermatic named in the 2025 Gartner® Magic Quadrant™ for Container Management

Access the Report

Kubermatic is officially Kubernetes AI Conformant!