Kueue
What is Kueue?
Kueue is a Kubernetes-native job queueing system. It manages when workloads are admitted to run on a cluster based on available resources, quotas, and priorities — without replacing the existing Kubernetes scheduler.
The core problem it solves: Kubernetes has a scheduler that decides where to place pods, but it has no concept of queuing. If you submit 1000 batch jobs and the cluster only has capacity for 50, vanilla Kubernetes will create all 1000 pods, most of which will sit in Pending forever, wasting API server resources and making priority decisions impossible.
Kueue adds:
- Fair-sharing of resources between teams/tenants
- Prioritized queuing — important jobs run first
- Resource quotas with borrowing and preemption
- Budget-aware scheduling — admit only what fits, queue the rest
- Multi-tenancy — teams get guaranteed and ceiling quotas
Kueue is a sig-scheduling project, part of official Kubernetes.
How Kueue Fits Into Kubernetes
flowchart TD
A["User submits Job<br>(with queue-name label)"]
B["<b>Kueue</b><br>Wraps Job as Workload<br>Places in LocalQueue<br>Routes to ClusterQueue<br>Checks quotas & priorities<br>Admits or queues the Workload"]
C["<b>Kubernetes Scheduler</b><br>Decides WHERE to place pods<br>(node selection, affinity, etc.)"]
A --> B
B -->|admitted| C
Key distinction:
- Kueue decides IF and WHEN a workload should run (admission control).
- kube-scheduler decides WHERE it runs (pod placement on nodes).
Kueue does not replace the scheduler. It gates workloads: a Job’s pods are not created until Kueue admits the workload.
Core Concepts
1. Workload
A Workload is Kueue’s internal representation of a job. You don’t create Workloads directly — Kueue automatically creates one for each supported job type (Job, MPIJob, RayJob, PyTorchJob, etc.).
The Workload object tracks:
- Which queue the job belongs to
- How many resources it requests (CPU, memory, GPU, etc.)
- Its admission status (pending, admitted, finished)
- Priority
# You don't write this -- Kueue creates it automatically
apiVersion: kueue.x-k8s.io/v1beta1
kind: Workload
metadata:
name: job-my-training-xxxxx
spec:
queueName: team-a-queue
podSets:
- count: 4
name: main
template:
spec:
containers:
- resources:
requests:
cpu: "8"
memory: "32Gi"
nvidia.com/gpu: "1"
2. ResourceFlavor
A ResourceFlavor represents a type of resource in your cluster. It maps to a set of nodes with specific characteristics (GPU type, region, spot vs on-demand, etc.).
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: gpu-a100
spec:
nodeLabels:
cloud.google.com/gke-accelerator: nvidia-tesla-a100
nodeTaints:
- key: nvidia.com/gpu
value: "true"
effect: NoSchedule
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: gpu-t4
spec:
nodeLabels:
cloud.google.com/gke-accelerator: nvidia-tesla-t4
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: spot-cpu
spec:
nodeLabels:
cloud.google.com/gke-provisioning: spot
---
# A "default" flavor with no special labels
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: default
spec: {}
Think of ResourceFlavors as tiers or classes of hardware. Kueue can try multiple flavors for a workload (e.g., try A100 first, fall back to T4).
3. ClusterQueue
A ClusterQueue is the central resource management object. It defines:
- What resources are available (and from which flavors)
- How much quota each resource has (guaranteed and borrowing limits)
- Preemption policies
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: team-a-cq
spec:
# How Kueue treats resource groups
resourceGroups:
- coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
flavors:
- name: gpu-a100
resources:
- name: "cpu"
nominalQuota: 40 # Guaranteed 40 CPUs from A100 nodes
borrowingLimit: 20 # Can borrow up to 20 more
lendingLimit: 10 # Will lend up to 10 to others
- name: "memory"
nominalQuota: 160Gi
- name: "nvidia.com/gpu"
nominalQuota: 8 # Guaranteed 8 A100 GPUs
borrowingLimit: 4
- name: gpu-t4 # Fallback flavor
resources:
- name: "cpu"
nominalQuota: 20
- name: "memory"
nominalQuota: 80Gi
- name: "nvidia.com/gpu"
nominalQuota: 4
# Preemption policy
preemption:
reclaimWithinCohort: Any
borrowWithinCohort:
policy: LowerPriority
withinClusterQueue: LowerPriority
# Fair sharing weight (relative to other CQs in the cohort)
fairSharing:
weight: 1
4. LocalQueue
A LocalQueue is the namespaced entry point for users. Users submit jobs to a LocalQueue, which routes them to a ClusterQueue.
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
name: team-a-queue
namespace: team-a
spec:
clusterQueue: team-a-cq # Which ClusterQueue to route to
Multiple LocalQueues (in different namespaces) can point to the same ClusterQueue. This is how multi-tenancy works: each team has its own namespace and LocalQueue, but they share cluster resources through ClusterQueues.
5. Cohort
A Cohort is a group of ClusterQueues that can borrow resources from each other. Cohorts are defined by name — any ClusterQueues with the same cohort field belong to the same cohort.
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: team-a-cq
spec:
cohort: research-cohort # <-- same cohort name
resourceGroups:
- coveredResources: ["cpu", "nvidia.com/gpu"]
flavors:
- name: gpu-a100
resources:
- name: "nvidia.com/gpu"
nominalQuota: 8
borrowingLimit: 4 # Can borrow up to 4 extra from cohort
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: team-b-cq
spec:
cohort: research-cohort # <-- same cohort name
resourceGroups:
- coveredResources: ["cpu", "nvidia.com/gpu"]
flavors:
- name: gpu-a100
resources:
- name: "nvidia.com/gpu"
nominalQuota: 8
borrowingLimit: 4
If team-a only uses 4 of its 8 GPUs, team-b can borrow the idle 4 (up to its borrowingLimit). When team-a needs them back, Kueue can preempt team-b’s borrowing workloads.
flowchart LR
subgraph cohort ["Cohort: research-cohort"]
direction LR
A["team-a-cq<br>8 GPUs nominal"]
B["team-b-cq<br>8 GPUs nominal"]
A <-->|"Borrowing / lending"| B
end
cohort -.- note["Total cohort: 16 GPUs<br>Each team guaranteed 8<br>Can borrow up to 4 from idle"]
style note fill:none,stroke:none
6. WorkloadPriorityClass
Defines priority levels for workloads. Higher-priority workloads are admitted first and can preempt lower-priority ones.
apiVersion: kueue.x-k8s.io/v1beta1
kind: WorkloadPriorityClass
metadata:
name: production
value: 1000
description: "Production workloads -- highest priority"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: WorkloadPriorityClass
metadata:
name: research
value: 100
description: "Research workloads -- normal priority"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: WorkloadPriorityClass
metadata:
name: background
value: 10
description: "Background workloads -- preemptable"
How Admission Works — Step by Step
- User submits a Job with a
kueue.x-k8s.io/queue-namelabel. - Kueue suspends the Job (sets
.spec.suspend = true) so no pods are created. - Kueue creates a Workload object representing the Job.
- The Workload enters the LocalQueue, which routes it to the ClusterQueue.
- Kueue sorts the queue by priority (highest first), then FIFO within the same priority.
- For each pending Workload (in order), Kueue checks:
- Does the ClusterQueue have enough quota (nominal or borrowed)?
- Which ResourceFlavor can satisfy the request?
- Should a lower-priority workload be preempted?
- If resources are available: Kueue admits the Workload:
- Sets an
Admissionstatus on the Workload. - Unsuspends the Job (sets
.spec.suspend = false). - The Kubernetes scheduler then places the pods normally.
- Sets an
- If resources are not available: the Workload stays queued until resources free up.
- When the Job finishes, Kueue releases the quota back to the ClusterQueue.
flowchart TD
A["Job submitted<br>(suspend=true)"] --> B["Workload created"]
B --> C["LocalQueue"]
C --> D["ClusterQueue"]
D --> E{"Enough<br>quota?"}
E -->|yes| F["Admitted<br>(suspend=false)"]
E -->|no| G{"Preempt lower<br>priority workload?"}
G -->|yes| F
G -->|no| H["Stay queued<br>(wait for resources)"]
F --> I["Pods created"]
I --> J["Job runs"]
J --> K["Job finishes<br>Quota released"]
Preemption
Kueue supports several preemption strategies:
Within a ClusterQueue
preemption:
withinClusterQueue: LowerPriority
A higher-priority workload can preempt lower-priority workloads in the same ClusterQueue.
Within a Cohort (reclaim lent resources)
preemption:
reclaimWithinCohort: Any # or LowerPriority
If team-a’s resources are being borrowed by team-b, team-a can reclaim them by preempting team-b’s workloads.
Borrowing within a Cohort
preemption:
borrowWithinCohort:
policy: LowerPriority
maxPriorityThreshold: 100 # Only preempt workloads below this priority
When borrowing from another CQ in the cohort, can preempt that CQ’s lower-priority workloads.
When a workload is preempted:
- Kueue sets the Job back to
suspend=true. - The running pods are terminated.
- The Workload goes back to the queue and waits for re-admission.
- When it’s re-admitted, the Job resumes (new pods are created).
Supported Job Types
Kueue is not limited to batch/v1 Job. It supports any workload type through its integration framework:
| Type | API Group | Use Case |
|---|---|---|
Job | batch/v1 | Standard batch jobs |
JobSet | jobset.x-k8s.io/v1alpha2 | Multi-template jobs (e.g., driver + workers) |
MPIJob | kubeflow.org/v2beta1 | MPI distributed training |
PyTorchJob | kubeflow.org/v1 | PyTorch distributed training |
TFJob | kubeflow.org/v1 | TensorFlow distributed training |
PaddleJob | kubeflow.org/v1 | PaddlePaddle training |
XGBoostJob | kubeflow.org/v1 | XGBoost training |
RayJob | ray.io/v1 | Ray workloads |
RayCluster | ray.io/v1 | Ray clusters |
Pod | v1 | Plain pods (with annotation) |
Each integration knows how to:
- Suspend/unsuspend the workload type
- Extract resource requests from the job spec
- Map the job’s pod templates to Kueue’s
podSets
Practical Examples
Example 1: Simple Batch Job
apiVersion: batch/v1
kind: Job
metadata:
name: my-training-job
namespace: team-a
labels:
kueue.x-k8s.io/queue-name: team-a-queue # <-- This is all Kueue needs
spec:
parallelism: 4
completions: 4
template:
spec:
containers:
- name: trainer
image: my-training:latest
resources:
requests:
cpu: "4"
memory: "16Gi"
nvidia.com/gpu: "1"
restartPolicy: Never
That’s it. The label kueue.x-k8s.io/queue-name is the only thing you add. Kueue handles the rest.
Example 2: Job with Priority
apiVersion: batch/v1
kind: Job
metadata:
name: urgent-inference
namespace: team-a
labels:
kueue.x-k8s.io/queue-name: team-a-queue
kueue.x-k8s.io/priority-class: production # <-- Kueue priority
spec:
template:
spec:
containers:
- name: inference
image: my-model:latest
resources:
requests:
cpu: "8"
memory: "32Gi"
nvidia.com/gpu: "2"
restartPolicy: Never
Example 3: Full Setup — Two Teams Sharing a GPU Cluster
# 1. Resource Flavors
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: a100-80gb
spec:
nodeLabels:
gpu-type: a100-80gb
---
# 2. ClusterQueue for Team ML
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: ml-team-cq
spec:
cohort: gpu-pool # Shared pool
resourceGroups:
- coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
flavors:
- name: a100-80gb
resources:
- name: "cpu"
nominalQuota: 64
- name: "memory"
nominalQuota: 256Gi
- name: "nvidia.com/gpu"
nominalQuota: 8 # Guaranteed 8 GPUs
borrowingLimit: 8 # Can use up to 16 total
preemption:
reclaimWithinCohort: LowerPriority
withinClusterQueue: LowerPriority
---
# 3. ClusterQueue for Team Data
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: data-team-cq
spec:
cohort: gpu-pool # Same shared pool
resourceGroups:
- coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
flavors:
- name: a100-80gb
resources:
- name: "cpu"
nominalQuota: 32
- name: "memory"
nominalQuota: 128Gi
- name: "nvidia.com/gpu"
nominalQuota: 4 # Guaranteed 4 GPUs
borrowingLimit: 4 # Can use up to 8 total
preemption:
reclaimWithinCohort: LowerPriority
withinClusterQueue: LowerPriority
---
# 4. LocalQueues (one per namespace)
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
name: ml-queue
namespace: ml-team
spec:
clusterQueue: ml-team-cq
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
name: data-queue
namespace: data-team
spec:
clusterQueue: data-team-cq
In this setup:
- ML team is guaranteed 8 GPUs, can burst to 16 if data team isn’t using theirs.
- Data team is guaranteed 4 GPUs, can burst to 8.
- Total cluster has at least 12 GPUs across A100 nodes.
- If both teams need all their GPUs, borrowing workloads get preempted (lower priority first).
Fair Sharing
When multiple ClusterQueues in a cohort compete for resources, Kueue uses Dominant Resource Fairness (DRF) to decide admission order.
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: team-a-cq
spec:
cohort: shared
fairSharing:
weight: 2 # Gets 2x the share compared to weight=1
The weight determines relative share. If team-a has weight 2 and team-b has weight 1, team-a gets ~66% of shared resources and team-b gets ~33%.
Fair sharing kicks in when there’s contention. When there’s no contention, any team can use whatever’s available (up to their borrowingLimit).
Admission Checks
Kueue supports AdmissionChecks — external controllers that must approve a workload before it’s admitted. Use cases:
- Provisioning: trigger node auto-provisioning before admitting (e.g., create GPU nodes on demand).
- Custom validation: external webhook that validates the workload.
- Budget approval: check with a cost system before admitting expensive jobs.
apiVersion: kueue.x-k8s.io/v1beta1
kind: AdmissionCheck
metadata:
name: provisioning-check
spec:
controllerName: kueue.x-k8s.io/provisioning
parameters:
apiGroup: kueue.x-k8s.io
kind: ProvisioningRequestConfig
name: spot-provisioning
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: team-a-cq
spec:
admissionChecks:
- provisioning-check # Must pass before admission
resourceGroups: [...]
ProvisioningRequest Integration
Kueue can integrate with cluster autoscaler’s ProvisioningRequest API to ensure nodes exist before admitting:
apiVersion: kueue.x-k8s.io/v1beta1
kind: ProvisioningRequestConfig
metadata:
name: spot-provisioning
spec:
provisioningClassName: check-capacity.autoscaling.x-k8s.io
parameters:
spot: "true"
MultiKueue — Multi-Cluster Federation
MultiKueue lets you distribute workloads across multiple clusters from a single management cluster.
flowchart TD
subgraph mgmt ["Management Cluster"]
A["User submits Job"] --> B["LocalQueue"]
B --> C["ClusterQueue"]
C --> D["MultiKueue"]
end
D --> E["Worker Cluster 1<br>(on-prem)"]
D --> F["Worker Cluster 2<br>(cloud-A)"]
D --> G["Worker Cluster 3<br>(cloud-B)"]
The user interacts only with the management cluster. MultiKueue routes workloads to whichever worker cluster has capacity.
Visibility and Monitoring
Check queue status
# List all ClusterQueues and their status
kubectl get clusterqueues
# Detailed view of a ClusterQueue (shows usage vs quota)
kubectl describe clusterqueue team-a-cq
# List all pending/admitted workloads
kubectl get workloads -A
# Check a specific workload's status
kubectl describe workload job-my-training-xxxxx -n team-a
# List LocalQueues in a namespace
kubectl get localqueues -n team-a
ClusterQueue status fields
status:
admittedWorkloads: 3
pendingWorkloads: 7
reservingWorkloads: 3
flavorsReservation:
- name: gpu-a100
resources:
- name: "nvidia.com/gpu"
total: "6" # 6 of 8 GPUs in use
conditions:
- type: Active
status: "True"
Prometheus Metrics
Kueue exposes metrics on :8080/metrics:
| Metric | Description |
|---|---|
kueue_admitted_active_workloads | Currently running admitted workloads |
kueue_pending_workloads | Workloads waiting for admission |
kueue_admitted_workloads_total | Total workloads admitted (counter) |
kueue_evicted_workloads_total | Total workloads evicted/preempted |
kueue_cluster_queue_resource_usage | Current resource usage per CQ |
kueue_cluster_queue_nominal_quota | Configured quota per CQ |
kueue_admission_wait_time_seconds | Time workloads spend waiting |
kueue_admission_checks_wait_time_seconds | Time waiting for admission checks |
Installation
# Install the latest release
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.9.1/manifests.yaml
# Or with Helm
helm install kueue oci://registry.k8s.io/charts/kueue \
--version v0.9.1 \
--namespace kueue-system \
--create-namespace
Kueue Configuration
apiVersion: config.kueue.x-k8s.io/v1beta1
kind: Configuration
metadata:
name: kueue-manager-config
spec:
integrations:
frameworks:
- "batch/job"
- "kubeflow.org/mpijob"
- "kubeflow.org/pytorchjob"
- "ray.io/rayjob"
- "ray.io/raycluster"
- "jobset.x-k8s.io/jobset"
- "pod" # Enable plain pod integration
fairSharing:
enable: true
preemptionStrategies:
- LessThanOrEqualToFinalShare
- LessThanInitialShare
Common Patterns
Pattern 1: Default Queue via Namespace Annotation
Instead of labeling every Job, set a default queue on the namespace:
apiVersion: v1
kind: Namespace
metadata:
name: team-a
labels:
kueue.x-k8s.io/default-queue: team-a-queue
Now any Job in team-a namespace without an explicit queue label automatically goes to team-a-queue.
Pattern 2: Spot + On-Demand Flavors (Cost Optimization)
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: cost-optimized-cq
spec:
resourceGroups:
- coveredResources: ["cpu", "memory"]
flavors:
- name: spot # Try spot first (cheaper)
resources:
- name: "cpu"
nominalQuota: 100
- name: "memory"
nominalQuota: 400Gi
- name: on-demand # Fall back to on-demand
resources:
- name: "cpu"
nominalQuota: 50
- name: "memory"
nominalQuota: 200Gi
Kueue tries flavors in order. Jobs land on spot nodes first; if spot quota is exhausted, they go to on-demand.
Pattern 3: StopPolicy — Draining a Queue
# Stop admitting new workloads (let running ones finish)
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: team-a-cq
spec:
stopPolicy: HoldAndDrain # Or just "Hold" to also keep running ones
Hold: stop admitting, but don’t touch running workloads.HoldAndDrain: stop admitting and evict running workloads.
Useful for maintenance windows.
Pattern 4: Lending Limits
Control how much of your quota others can borrow:
resources:
- name: "nvidia.com/gpu"
nominalQuota: 8
lendingLimit: 4 # Only 4 of my 8 GPUs can be borrowed by others
The remaining 4 are reserved exclusively for this ClusterQueue even when idle.
Mental Model
flowchart TD
subgraph cluster ["Cluster"]
subgraph cohort ["Cohort"]
CQA["ClusterQueue A<br>nominalQuota / borrowLimit<br>lendLimit / preemption"]
CQB["ClusterQueue B<br>nominalQuota / borrowLimit<br>lendLimit / preemption"]
CQA <-->|"borrow / lend"| CQB
end
LQA["LocalQueue<br>(ns: team-a)"]
LQB["LocalQueue<br>(ns: team-b)"]
CQA --> LQA
CQB --> LQB
LQA --> JA["Jobs with<br>queue-name label"]
LQB --> JB["Jobs with<br>queue-name label"]
RF1["ResourceFlavor: gpu-a100<br>nodeLabels: gpu=a100"]
RF2["ResourceFlavor: spot-cpu<br>nodeLabels: spot=true"]
WPC["WorkloadPriorityClass<br>production 1000<br>research 100<br>background 10"]
end
style RF1 fill:#e8f4f8,stroke:#888
style RF2 fill:#e8f4f8,stroke:#888
style WPC fill:#f8f0e8,stroke:#888
Kueue vs Alternatives
| Feature | Kueue | Volcano | YARN (Hadoop) |
|---|---|---|---|
| Kubernetes-native | Yes | Yes | No (separate system) |
| Scope | Admission control | Scheduler + admission | Full resource manager |
| Replaces kube-scheduler | No | Yes (custom scheduler) | N/A |
| Multi-tenancy | Cohorts + quotas | Queues + quotas | Queues + quotas |
| Preemption | Yes | Yes | Yes |
| Fair sharing | DRF | DRF + others | DRF + others |
| Gang scheduling | Via JobSet | Built-in | Built-in |
| Multi-cluster | MultiKueue | No | Federation |
| Complexity | Low (CRDs only) | Higher | High (separate infra) |
Kueue’s philosophy: do one thing well (admission control) and let Kubernetes handle the rest. Volcano replaces more of the Kubernetes stack. Kueue is simpler and composes better with existing Kubernetes tooling.
Troubleshooting
# Why is my workload not being admitted?
kubectl describe workload <name> -n <namespace>
# Look at Conditions and Events
# Common reasons:
# - "Inadmissible": not enough quota in any flavor
# - "Pending": waiting in queue (lower priority than others)
# - "AdmissionCheck": waiting for external check
# Check ClusterQueue capacity
kubectl get clusterqueue <name> -o yaml
# Look at status.flavorsReservation vs spec quota
# Check Kueue controller logs
kubectl logs -n kueue-system deployment/kueue-controller-manager
# Workload stuck? Check if the Job is properly labeled
kubectl get job <name> -o yaml | grep -A2 labels
# Verify LocalQueue points to the right ClusterQueue
kubectl get localqueue -n <namespace> -o yaml
Further Resources
- Official docs — comprehensive reference
- GitHub — source code and issues
- KEP (Kubernetes Enhancement Proposal) — design rationale
- API reference — CRD specs
- Kueue examples — sample configurations