AI/ML Model Operations

The Missing Control Plane for GPU Platforms: Policy as Code, Not Just Schedulers

GPU platforms have become the backbone of modern AI. Startups and enterprises now deploy large language models, embedding pipelines, and inference services on clusters of high-end GPUs. Yet despite all this sophistication, most GPU platforms today are still operated using a fragile mix of YAML, scripts, tickets, and human judgment.

What’s missing is not a better scheduler. What’s missing is a real control plane.

GPUs Are Being Sold as Products, But Operated Like Infrastructure

When customers buy GPU capacity, they are not buying machines. They are buying product guarantees such as:

entitlements (“I paid for 4 H100s”)
SLAs (“low latency, no preemption”)
isolation (“no noisy neighbors”)
compliance (“EU-only, dedicated hardware”)
lifecycle semantics (“scale up, scale down, offboard cleanly”)

But today, those guarantees live in contracts, internal runbooks, tribal knowledge and best-effort operational discipline. They are not encoded into the system. That gap is why GPU platforms struggle with:

noisy neighbors
broken SLAs
unpredictable onboarding
chaotic scaling
compliance gaps
constant manual firefighting

A Control Plane Is Not an Orchestrator

Most GPU platforms assume their “control plane” is Kubernetes, Slurm, or a scheduler. It isn’t. Those systems decide how workloads run. A real control plane decides:

who is allowed to run what
where workloads may be placed
how much capacity a tenant is entitled to
whether a request must be approved
what isolation rules apply
what happens on failure
how data must be retained or destroyed

Those are product and governance decisions, not scheduling decisions.

The 4-Lane Blueprint for a GPUaaS Control Plane

To make this concrete, I built a BPM-style swimlane blueprint that separates responsibilities into four distinct layers:

Press enter or click to view image in full size

Lane 1 — Northbound Product API / UI

Tenant and admin actions such as:

create tenant
deploy model
request scale-up
change batch size
request GPU sharing
update quotas
offboard tenant

These are product-level intents.

Lane 2 — Product Control Plane

Policy-as-Code • Workflow-as-Code • Lifecycle State Machines

This is the missing layer. It owns:

tenant entitlements and quotas
SLA tiers and isolation rules
admission control
placement policies
privacy and compliance rules
degradation and preemption policies
onboarding and offboarding workflows

Crucially, these are not scripts. They are:

declarative policies
versioned rules
durable workflows
explicit state machines

This is where Policy as Code actually lives.

Lane 3 — Orchestration & Substrate Control

This layer turns control-plane decisions into reality:

Kubernetes
Slurm / Kueue
Gateway API / inference gateways
GPU operators
Helm / Argo CD
Terraform / Ansible
CNI / CSI

It does not decide policy. It only realizes policy.

Lane 4 — Execution & Hardware

Where work actually happens:

vLLM, Triton, TGI
PyTorch
ROCm / CUDA
GPUs
NICs
NVMe

This is the physical reality layer.

Policy as Code: The Core of the Control Plane

The key idea is that GPU platforms need a policy-as-code control plane, not just a scheduler. Instead of writing documents like:

“Enterprise tenants must run on dedicated GPUs in EU clusters and cannot be preempted.”

You encode:

# Example: tenantpolicy-as-code configuration
tenant:
   tier: enterprise  
   isolation: dedicated
   data_residency: eu-only
   allow_preemption: false
   allow_gpu_sharing: false

# Example: tenantpolicy-as-code configuration
tenant:
   tier: enterprise  
   isolation: dedicated
   data_residency: eu-only
   allow_preemption: false
   allow_gpu_sharing: false

# Example: tenantpolicy-as-code configuration
tenant:
   tier: enterprise  
   isolation: dedicated
   data_residency: eu-only
   allow_preemption: false
   allow_gpu_sharing: false

And the control plane enforces it by:

rejecting invalid requests
constraining placement
selecting eligible clusters
configuring schedulers
applying isolation rules
blocking unsafe actions

This is exactly how hyperscalers work internally. GPU platforms today simply don’t have this layer.

Workflows, Not Scripts

A real control plane is workflow-driven. Examples:

tenant onboarding
quota increases
SLA upgrades
capacity exhaustion
incident response
offboarding and data destruction

These are not bash scripts. They are:

long-running processes
with approvals
retries
compensations
audit trails
explicit state transitions

That’s why Lane 2 must be implemented as: Policy-as-Code + Workflow-as-Code + Durable State Machines

Why Is This Important

Without this layer, GPU platforms are forced to operate like fragile infrastructure projects instead of reliable products. That’s why we see:

unpredictable performance
noisy neighbors
broken SLAs
chaotic onboarding
manual firefighting
compliance gaps

A programmable control plane fixes this by turning:

product promises
governance rules
lifecycle semantics

into machine-enforceable reality.

Closing Thought

Kubernetes orchestrates containers. Slurm schedules jobs. vLLM runs inference. Llm-d routes and scales inference traffic. But none of those systems know what a tenant, an SLA, or a privacy tier is. That knowledge belongs in a real control plane. And that control plane must be built on policy as code, not scripts.

Closing

I’m open-sourcing parts of this control-plane blueprint and documenting the workflows, policies, and lifecycle semantics behind it. You can find the blueprint here.

If you’re building a GPU platform or GPUaaS product and wrestling with multi-tenancy, SLAs, or onboarding reliability, I’d love to compare notes.

What’s missing is not a better scheduler. What’s missing is a real control plane.

GPUs Are Being Sold as Products, But Operated Like Infrastructure

When customers buy GPU capacity, they are not buying machines. They are buying product guarantees such as:

entitlements (“I paid for 4 H100s”)
SLAs (“low latency, no preemption”)
isolation (“no noisy neighbors”)
compliance (“EU-only, dedicated hardware”)
lifecycle semantics (“scale up, scale down, offboard cleanly”)

noisy neighbors
broken SLAs
unpredictable onboarding
chaotic scaling
compliance gaps
constant manual firefighting

A Control Plane Is Not an Orchestrator

Most GPU platforms assume their “control plane” is Kubernetes, Slurm, or a scheduler. It isn’t. Those systems decide how workloads run. A real control plane decides:

who is allowed to run what
where workloads may be placed
how much capacity a tenant is entitled to
whether a request must be approved
what isolation rules apply
what happens on failure
how data must be retained or destroyed

Those are product and governance decisions, not scheduling decisions.

The 4-Lane Blueprint for a GPUaaS Control Plane

To make this concrete, I built a BPM-style swimlane blueprint that separates responsibilities into four distinct layers:

Press enter or click to view image in full size

Lane 1 — Northbound Product API / UI

Tenant and admin actions such as:

create tenant
deploy model
request scale-up
change batch size
request GPU sharing
update quotas
offboard tenant

These are product-level intents.

Lane 2 — Product Control Plane

Policy-as-Code • Workflow-as-Code • Lifecycle State Machines

This is the missing layer. It owns:

tenant entitlements and quotas
SLA tiers and isolation rules
admission control
placement policies
privacy and compliance rules
degradation and preemption policies
onboarding and offboarding workflows

Crucially, these are not scripts. They are:

declarative policies
versioned rules
durable workflows
explicit state machines

This is where Policy as Code actually lives.

Lane 3 — Orchestration & Substrate Control

This layer turns control-plane decisions into reality:

Kubernetes
Slurm / Kueue
Gateway API / inference gateways
GPU operators
Helm / Argo CD
Terraform / Ansible
CNI / CSI

It does not decide policy. It only realizes policy.

Lane 4 — Execution & Hardware

Where work actually happens:

vLLM, Triton, TGI
PyTorch
ROCm / CUDA
GPUs
NICs
NVMe

This is the physical reality layer.

Policy as Code: The Core of the Control Plane

The key idea is that GPU platforms need a policy-as-code control plane, not just a scheduler. Instead of writing documents like:

“Enterprise tenants must run on dedicated GPUs in EU clusters and cannot be preempted.”

You encode:

# Example: tenantpolicy-as-code configuration
tenant:
   tier: enterprise  
   isolation: dedicated
   data_residency: eu-only
   allow_preemption: false
   allow_gpu_sharing: false

And the control plane enforces it by:

rejecting invalid requests
constraining placement
selecting eligible clusters
configuring schedulers
applying isolation rules
blocking unsafe actions

This is exactly how hyperscalers work internally. GPU platforms today simply don’t have this layer.

Workflows, Not Scripts

A real control plane is workflow-driven. Examples:

tenant onboarding
quota increases
SLA upgrades
capacity exhaustion
incident response
offboarding and data destruction

These are not bash scripts. They are:

long-running processes
with approvals
retries
compensations
audit trails
explicit state transitions

That’s why Lane 2 must be implemented as: Policy-as-Code + Workflow-as-Code + Durable State Machines

Why Is This Important

Without this layer, GPU platforms are forced to operate like fragile infrastructure projects instead of reliable products. That’s why we see:

unpredictable performance
noisy neighbors
broken SLAs
chaotic onboarding
manual firefighting
compliance gaps

A programmable control plane fixes this by turning:

product promises
governance rules
lifecycle semantics

into machine-enforceable reality.

Closing Thought

Closing

I’m open-sourcing parts of this control-plane blueprint and documenting the workflows, policies, and lifecycle semantics behind it. You can find the blueprint here.

If you’re building a GPU platform or GPUaaS product and wrestling with multi-tenancy, SLAs, or onboarding reliability, I’d love to compare notes.

AI/ML Model Operations

What Matters to a GPUaaS Tenant

AI/ML Model Operations

Beyond Prompt → Code: The Real Systems Challenges Behind Coding Foundation Models

AI/ML Model Operations

What Matters to a GPUaaS Provider

AI/ML Model Operations

What Matters to a GPUaaS Tenant

AI/ML Model Operations

Beyond Prompt → Code: The Real Systems Challenges Behind Coding Foundation Models

Don’t let performance bottlenecks slow you down. Optimize your stack and accelerate your AI outcomes.

Start for Free

Don’t let performance bottlenecks slow you down. Optimize your stack and accelerate your AI outcomes.

Start for Free

Don’t let performance bottlenecks slow you down. Optimize your stack and accelerate your AI outcomes.

Start for Free

Don’t let performance bottlenecks slow you down. Optimize your stack and accelerate your AI outcomes.

Start for Free

Meeting your AI infrastructure needs with scalable, secure, and seamless services.

Products

Introspect

Predictive Orchestration

ModelSpec

Services

Infrastructure Audit

Optimization Sprint

Managed Optimization

Company

Blog

Case Studies

About

Terms & Conditions

Meeting your AI infrastructure needs with scalable, secure, and seamless services.

Products

Introspect

Predictive Orchestration

ModelSpec

Services

Infrastructure Audit

Optimization Sprint

Managed Optimization

Company

Blog

Case Studies

About

Terms & Conditions

Meeting your AI infrastructure needs with scalable, secure, and seamless services.

Products

Introspect

Predictive Orchestration

ModelSpec

Services

Infrastructure Audit

Optimization Sprint

Managed Optimization

Company

Blog

Case Studies

About

Terms & Conditions

AI/ML Model Operations

The Missing Control Plane for GPU Platforms: Policy as Code, Not Just Schedulers

GPUs Are Being Sold as Products, But Operated Like Infrastructure

A Control Plane Is Not an Orchestrator

The 4-Lane Blueprint for a GPUaaS Control Plane

Lane 1 — Northbound Product API / UI

Lane 2 — Product Control Plane

Lane 3 — Orchestration & Substrate Control

Lane 4 — Execution & Hardware

Policy as Code: The Core of the Control Plane

Workflows, Not Scripts

Why Is This Important

Closing Thought

Closing

GPUs Are Being Sold as Products, But Operated Like Infrastructure

A Control Plane Is Not an Orchestrator

The 4-Lane Blueprint for a GPUaaS Control Plane

Lane 1 — Northbound Product API / UI

Lane 2 — Product Control Plane

Lane 3 — Orchestration & Substrate Control

Lane 4 — Execution & Hardware

Policy as Code: The Core of the Control Plane

Workflows, Not Scripts

Why Is This Important

Closing Thought

Closing

More articles

Don’t let performance bottlenecks slow you down. Optimize your stack and accelerate your AI outcomes.

Don’t let performance bottlenecks slow you down. Optimize your stack and accelerate your AI outcomes.

Don’t let performance bottlenecks slow you down. Optimize your stack and accelerate your AI outcomes.