EvalHub is a service for running large language model (LLM) evaluation benchmarks in Kubernetes environments. As organizations scale their AI/ML workloads, they face increasing challenges around resource management, fair sharing, and job prioritization. This is where Kueue comes in.
Kueue is a Kubernetes-native system for queueing and managing workloads. This guide explores why and how to use Kueue with EvalHub to build a production-ready evaluation platform.
Series note
This is part 8 in a series covering how to build a scalable, reproducible AI evaluation infrastructure using the EvalHub project and Red Hat AI. Catch up on the other parts in the series:
- Part 1: How EvalHub manages two-layer Kubernetes control planes
- Part 2: EvalHub: Because "looks good to me" isn't a benchmark
- Part 3: Evaluation-driven development with EvalHub
- Part 4: Understanding evaluation collections in EvalHub
- Part 5: Bring your own evaluation framework to EvalHub
- Part 6: Add automated AI evaluations to your CI/CD pipeline
- Part 7: Store immutable AI evaluation records with EvalHub and OCI
Why EvalHub needs Kueue
To resolve these operational bottlenecks, you must implement a management layer that governs how jobs access compute resources. The following section details the challenges caused by resource contention and how a native queueing system addresses them.
The challenge: Resource contention in shared clusters
In a typical AI/ML platform deployment without a centralized controller, several issues frequently arise:
- Unmanaged resource consumption: Multiple teams run evaluation jobs simultaneously, often exceeding available GPU and CPU capacity.
- Lack of prioritization: Urgent evaluations (production model validation) compete with experimental evaluations (research experiments).
- Cluster instability: Resource sprawl can lead to cluster instability or quota exhaustion.
- Operational inefficiency: Jobs that fail due to insufficient resources waste valuable time and compute cycles, requiring manual intervention to retry or reschedule.
Without a formal scheduling system, resource allocation is chaotic and unpredictable, as illustrated in Figure 1.

The solution: Intelligent workload management
With Kueue, you move from an "uncontrolled" model to a queue-based system. Kueue is a job scheduler that manages the lifecycle of your workloads, making sure they only enter the cluster when sufficient resources are available to support them. The structured flow of this managed approach is shown in Figure 2.

Key advantages of Kueue
Using Kueue for benchmark evaluations offers several operational benefits for managing evaluation workloads at scale.
Fair resource sharing across tenants
Kueue supports multitenancy with configured quotas:
# Team A gets 50% of resources
ClusterQueue: team-a-cq
CPU: 32 cores
Memory: 128Gi
GPU: 4
# Team B gets 50% of resources
ClusterQueue: team-b-cq
CPU: 32 cores
Memory: 128Gi
GPU: 4Each team's evaluation jobs stay within their quota, preventing one team from monopolizing cluster resources.
Priority-based job scheduling
Critical production evaluations can preempt lower-priority research jobs:
- Production model validation: High priority (1000); must complete quickly.
- Routine evaluations: Medium priority (500); normal SLA.
- Experimental benchmarks: Low priority (100); can wait or be preempted.
Resource quota enforcement
Prevents runaway jobs from consuming all cluster resources:
# Quota limits per ClusterQueue
resources:
- name: cpu
nominalQuota: 32
- name: memory
nominalQuota: 128Gi
- name: nvidia.com/gpu
nominalQuota: 4Automatic queueing and admission
When your cluster reaches quota, Kueue prevents job failures by automatically queueing workloads until resources become available:
- Without Kueue: The job fails with an
Insufficient resourceserror, which requires a manual retry. - With Kueue: The job is queued automatically and admitted once resources become available.
Cohort-based resource borrowing
Teams can borrow unused quota from other teams within the same cohort.
Visibility into job queue status
Track why jobs are pending and their position in the queue:
kubectl get localqueue -n team-a
NAME CLUSTERQUEUE PENDING ADMITTED
local-queue team-a-cq 3 5
kubectl get workload -n team-a
NAME QUEUE ADMITTED AGE
eval-job-1-abc123 local-queue True 2m
eval-job-2-def456 local-queue False 30s # Waiting in queueUnderstanding the personas
Enabling Kueue for benchmark evaluations involves three key personas, each with distinct responsibilities: the cluster administrator, the namespace owner, and the machine learning (ML) engineer.
| Persona | Role | Responsibilities | Scope |
|---|---|---|---|
| Cluster administrator | Manages the Kubernetes cluster and Kueue installation. | Install and configure the Kueue operator. Create ClusterQueue and ResourceFlavor objects. Define cluster-wide preemption policies. Set up multitenancy boundaries. Monitor cluster-wide resource utilization. | Cluster-wide |
| Namespace owner or team lead | Manages resources for a specific team or namespace. | Create LocalQueue objects in team namespaces. Map LocalQueue objects to appropriate ClusterQueue objects. Configure namespace labels for Kueue management. Monitor the team's quota usage. | Namespace-specific |
| EvalHub user or ML engineer | Submits evaluation jobs via the EvalHub API. | Specify the queue name when creating evaluation jobs. Understand job queueing and preemption behavior. Monitor job status through the EvalHub API or kubectl. | Individual jobs |
Setup guide by persona
Follow these configuration steps tailored to your specific operational role within the cluster environment.
Cluster administrator: Installing and configuring Kueue
Cluster administrators handle the initial cluster-wide setup, including operator installation and global queue definitions.
Step 1: Install the Kueue operator
Install the Kueue operator, create a Kueue cluster instance, and configure ResourceFlavor objects. Refer to the Red Hat build of Kueue installation documentation.
Step 2: Create ClusterQueues for multitenancy
Create ClusterQueue objects for multitenancy. Refer to the Configuring ClusterQueues documentation to set up the cluster-scoped team-a-cq and team-b-cq resources.
Namespace owner: Setting up team resources
Namespace owners configure local namespace labels and connect team resources to the main cluster queues.
Step 1: Label the namespace
Configure your namespace with the following labels:
team=team-a: Matches theClusterQueuenamespaceSelectorkueue.openshift.io/managed=true: Enables Kueue managementevalhub.trustyai.opendatahub.io/tenant=true: EvalHub tenant marker
Step 2: Create LocalQueue
The LocalQueue connects your namespace to the ClusterQueue:
apiVersion: kueue.x-k8s.io/v1beta2
kind: LocalQueue
metadata:
name: eval-queue
namespace: team-a-namespace
spec:
clusterQueue: team-a-cq # References the ClusterQueue created by adminEvalHub user: Submitting jobs with Kueue
Submit and monitor your evaluation workloads directly through the application interface or command-line tools.
Job submission via API
Jobs submitted via the EvalHub API are assigned priority 0 by default:
curl --request POST \
--url https://evalhub-team-a.example.com/api/v1/evaluations/jobs \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '{
"name": "standard-eval",
"model": {
"url": "http://llm-service.team-a.svc.cluster.local:8080/v1",
"name": "granite-3.1-8b"
},
"queue": {
"kind": "kueue",
"name": "eval-queue"
},
"benchmarks": [
{
"id": "mmlu",
"provider_id": "lm_evaluation_harness",
"parameters": {
"num_fewshot": 5
}
}
]
}'After you submit the job, the system queues it with priority 0 and admits when quota becomes available.
Checking job queue status
You can check the job queue status through the EvalHub API with this command:
curl --request GET \
--url https://evalhub-team-a.example.com/api/v1/evaluations/jobs/<resource-id> \
--header 'Authorization: Bearer <token>'The API returns a JSON response indicating the current high-level state:
{
"resource": {
"id": "abc123-def456-...",
"created_at": "2026-04-13T10:30:00Z"
},
"status": {
"state": "pending",
"message": {
"message": "Evaluation job created",
"message_code": "evaluation_job_created"
}
}
}The EvalHub API currently shows high-level states only:
pending: Job created but not yet admittedrunning: Job admitted and executingcompleted: Job finished
To view detailed status conditions, query the cluster directly using the command-line interface:
# Find the Kubernetes Job
JOB_NAME=$(kubectl get jobs -n team-a-namespace | grep "$RESOURCE_ID" | awk '{print $1}')
# Check job status
kubectl get job "$JOB_NAME" -n team-a-namespace
# Check workload status (shows queue position, preemption, etc.)
WORKLOAD=$(kubectl get workloads -n team-a-namespace -o json | \
jq -r ".items[] | select(.metadata.ownerReferences[].name == \"$JOB_NAME\") | .metadata.name")
kubectl get workload "$WORKLOAD" -n team-a-namespace -o yamlUnderstanding preemption in evaluation jobs
Preemption is a critical concept when using Kueue. Here's what every persona needs to know about how the system manages resource contention.
What is preemption?
Preemption occurs when a high-priority job needs resources, but the cluster is at quota. Kueue performs the following sequence:
- Suspends (stop) a lower-priority running job.
- Terminates its pod(s).
- Admits the higher-priority job.
- Requeues the preempted job.
- Resumes the preempted job when resources become available.
Default preemption behavior
When you create a ClusterQueue without specifying preemption settings, the system applies these defaults:
# Default behavior (no preemption section specified)
apiVersion: kueue.x-k8s.io/v1beta2
kind: ClusterQueue
metadata:
name: my-queue
spec:
resourceGroups: [...]
# preemption not specifiedThe resulting effective configuration is::
preemption:
withinClusterQueue: Never # No preemption within queue
reclaimWithinCohort: Never # Can't reclaim from cohort
borrowWithinCohort:
policy: Never # Can't preempt when borrowingBy default, Jobs queue in FIFO order. No preemption occurs, even if you assign different priorities to specific jobs.
Enabling preemption
Use the following configuration to enable priority-based preemption:
apiVersion: kueue.x-k8s.io/v1beta2
kind: ClusterQueue
metadata:
name: my-queue
spec:
preemption:
withinClusterQueue: LowerPriority # Enable preemption
resourceGroups: [...]With this setting, higher-priority jobs preempt lower-priority jobs within the same ClusterQueue.
Where is preemption status reported?
Understanding where to look for preemption data is essential for effective debugging.
Kubernetes Workload resource
The Workload resource contains the most detailed preemption information:
kubectl get workload <workload-name> -n <namespace> -o yamlCheck the status.conditions field for the following transitions:
- During preemption: The
Admittedcondition transitions toFalse, while theEvicted,Preempted, andRequeuedconditions becomeTrue. - After resume: The
AdmittedandRequeuedconditions show asTrue, whileEvictedandPreemptedreturn toFalse.
Note that the Requeued condition remains True even after resume, preserving the history that the job was preempted.
Kubernetes Job resource (basic)
The Job resource displays basic suspension status:
kubectl get job <job-name> -n <namespace> -o yamlstatus:
conditions:
# When preempted:
- type: Suspended
status: "True"
reason: JobSuspended
message: "Job suspended"
# After resume:
- type: Suspended
status: "False"
reason: JobResumed
message: "Job resumed"The Job resource does not indicate why the suspension occurred (for example, preemption versus manual intervention) nor does it provide the preemption UID.
Kubernetes Events
Events provide a historical timeline of cluster actions:
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | grep <job-name>This sequence illustrates a typical event log for a preempted workload:
7m30s Normal QuotaReserved workload Quota reserved in ClusterQueue
7m30s Normal Admitted workload Admitted by ClusterQueue
6m55s Normal EvictedDueToPreempted workload Preempted to accommodate workload (UID: ...)
6m55s Normal Preempted workload Preempted to accommodate workload (UID: ...)
6m55s Normal Suspended job Job suspended
6m55s Normal Stopped job Preempted to accommodate workload (UID: ...)
5m50s Normal Resumed job Job resumed
5m49s Normal QuotaReserved workload Quota reserved in ClusterQueue (after waiting 65s)
5m49s Normal Admitted workload Admitted by ClusterQueueEvalHub API response
The EvalHub API does not currently expose Kueue-specific states like preemption or requeueing.
{
"status": {
"state": "pending", // High-level only: pending, running, completed
"message": {
"message": "Evaluation job created"
}
}
}To track preemption for EvalHub jobs, follow these steps:
- Retrieve the
resource.idfrom the API response. - Identify the Kubernetes Job (the name contains the resource ID).
- Locate the associated
Workloadresource. - Check the
Workloadstatus.conditionsfor detailed status.
You can automate this check using the following script:
RESOURCE_ID="abc123-def456-..."
# Find Job
JOB_NAME=$(kubectl get jobs -n team-a-namespace | grep "$RESOURCE_ID" | awk '{print $1}')
# Find Workload
WORKLOAD=$(kubectl get workloads -n team-a-namespace -o json | \
jq -r ".items[] | select(.metadata.ownerReferences[].name == \"$JOB_NAME\") | .metadata.name")
# Check for preemption
kubectl get workload "$WORKLOAD" -n team-a-namespace -o jsonpath='{.status.conditions}' | \
jq '.[] | select(.type == "Preempted" or .type == "Evicted" or .type == "Requeued")'Impact on evaluation results
When a job is preempted and resumed, it restarts from the beginning.
The state transition diagram highlights how preemption alters execution flow (Figure 3).

Preemption introduces several critical operational implications:
- No progress is saved: The job doesn't checkpoint its state.
- Increased total runtime: Job age includes the suspension period.
- Unpredictable completion times: Jobs can be preempted multiple times.
To avoid these issues, create a dedicated ClusterQueue for evaluation jobs and set withinClusterQueue: Never. Because evaluation workloads cannot checkpoint their progress, this configuration helps make sure your jobs complete without interruption.
Job lifecycle with Kueue
Understanding the complete job lifecycle helps with monitoring and troubleshooting. Use these flows to identify your evaluation job's current stage.
Normal flow (no preemption)
In a standard execution, a job is submitted, reserved in a queue, and admitted to the cluster where it runs to completion without interruption (Figure 4).

Preemption flow
When preemption is enabled, a job might be suspended if a higher-priority workload requires resources. Understanding this flow is essential for interpreting job status changes during peak cluster utilization.

Monitoring and troubleshooting
Monitoring your EvalHub jobs ensures you can identify and resolve resource contention issues quickly. Here are common scenarios you might encounter while managing your evaluation workloads.
Scenario 1: Job stuck in pending
If a job remains in the pending state, the Kubernetes job status will appear as follows:
kubectl get job my-eval-job -n team-a-namespace
# NAME STATUS COMPLETIONS AGE
# my-eval-job Suspended 0/1 5mIf a job is stuck in Suspended status, use this command to diagnose the cause
# Check workload status
WORKLOAD=$(kubectl get workloads -n team-a-namespace -o json | \
jq -r ".items[] | select(.metadata.ownerReferences[].name == \"my-eval-job\") | .metadata.name")
kubectl get workload "$WORKLOAD" -n team-a-namespace -o jsonpath='{.status.conditions}' | \
jq '.[] | select(.type == "QuotaReserved" or .type == "Admitted")'Common causes include:
Insufficient quota: You might need to wait for resources to free up or request a quota increase.
{ "type": "QuotaReserved", "status": "False", "reason": "Pending", "message": "couldn't assign flavors to pod set main: insufficient unused quota for cpu in flavor default-flavor, 8 more needed" }Invalid queue name: Verify that the
LocalQueueexists and the name is correct in your job specification.kubectl get workload "$WORKLOAD" -n team-a-namespace # NAME QUEUE RESERVED IN ADMITTED # job-my-eval-job-abc12 non-existent-queue FalseWaiting for higher-priority jobs: Increase job priority or wait for the queue to clear.
kubectl get workloads -n team-a-namespace --sort-by=.spec.priority
Scenario 2: Job was preempted
If a job remains suspended after it has already begun execution, it may have been preempted. You can verify this by checking the job status:
kubectl get job my-eval-job -n team-a-namespace
# NAME STATUS COMPLETIONS AGE
# my-eval-job Suspended 0/1 10mIf a job shows as Suspended after previously running, check for preemption:
# Check for preemption
kubectl get workload "$WORKLOAD" -n team-a-namespace -o jsonpath='{.status.conditions}' | \
jq '.[] | select(.type == "Preempted" or .type == "Evicted")'The following output confirms that the job was preempted to accommodate a higher-priority workload:
{
"type": "Preempted",
"status": "True",
"reason": "InClusterQueue",
"message": "Preempted to accommodate a workload (UID: 641031a6-be4d-43f5-b51f-24a4d05dffe6, JobUID: 1f1c675a-711f-4a13-a3bd-da3d50e6f893)"
}To resolve this, wait for the preempting job to complete, which allows your job to auto-resume. Alternatively, increase your job’s priority to avoid future preemption.
Scenario 3: Job running but progress unknown
If a job has been running for an extended period, verify if the pod was restarted due to preemption:
kubectl get pod "$POD" -n team-a-namespace -o jsonpath='{.status.containerStatuses[0].restartCount}'
# 0 (no restarts)
# Check pod age vs job age
kubectl get pod "$POD" -n team-a-namespace -o jsonpath='{.metadata.creationTimestamp}'
kubectl get job my-eval-job -n team-a-namespace -o jsonpath='{.metadata.creationTimestamp}'
# If pod is much newer than job, it was likely preempted and recreatedUseful monitoring commands
You can use these command-line entries to monitor workload states, check queue positions, and track resource availability across your cluster:
# View all queued workloads
kubectl get workloads -n team-a-namespace
# View quota usage
kubectl get clusterqueue team-a-cq -o yaml | grep -A 20 "flavorsUsage:"
# View pending workloads count
kubectl get localqueue eval-queue -n team-a-namespace
# Get workload events
kubectl get events -n team-a-namespace --field-selector involvedObject.kind=Workload
# View all preempted workloads
kubectl get workloads -n team-a-namespace -o json | \
jq -r '.items[] | select(.status.conditions[]? | select(.type == "Preempted" and .status == "True")) | .metadata.name'Conclusion
Using Kueue with EvalHub transforms ad-hoc evaluation job execution into a managed, fair, and efficient system. By understanding the roles of each persona and following the best practices outlined in this guide, organizations can establish a resilient foundation for the entire AI/ML lifecycle, ensuring your evaluation platform can evolve alongside your needs.
Adopting this integrated approach allows you to:
- Prevent resource contention through quota enforcement
- Enable fair sharing across multiple teams
- Prioritize critical work with intelligent preemption
- Increase cluster utilization through cohort-based borrowing
- Improve visibility into job queueing and resource usage
By aligning your infrastructure with these standard patterns, you ensure that your evaluation platform is ready to support the next generation of LLM development.