AI agents and assistants share operational needs that typical web services do not have. LangGraph agents, CrewAI agent crews, custom assistants, and OpenClaw all hold API keys, maintain session state, call tools, execute code, and make decisions on behalf of users. They communicate with large language models (LLMs) that incur per-token costs. They might run safety checks against every message. They need identity, not just authentication.
Red Hat AI addresses these problems at the platform level. It handles model serving, safety guardrails, inference routing, agent identity, and supply chain security before you write your first agent config.
We deployed OpenClaw to put this to the test. OpenClaw is an open source personal AI assistant that runs on your infrastructure, connects to model providers, integrates with messaging platforms, and provides a web interface to interact with your agent. We chose it because it showcases how to get the most out of the Red Hat AI stack for reliable agent deployment: model inference, safety guardrails, agent identity, and persistent state. The patterns here apply to any agent workload you bring to the platform.
This article explains what the Red Hat AI platform provides and how we put it to work.
Model connectivity: Three paths to inference
Agents need LLM inference. You can call a hosted API, but that means sending every prompt off-cluster, paying per token, and trusting a third party with your data. For regulated environments or cost-sensitive workloads, you want options.
Red Hat AI gives you three: vLLM, Llama Stack, and Models-as-a-Service (MaaS).
vLLM
Now in general availability as part of Red Hat AI, vLLM provides a direct way to serve models on your cluster. You can serve a model on your cluster by using KServe and pointing your agent to the /v1/chat/completions endpoint. KServe handles GPU scheduling and scaling. This path offers full control over a single, self-hosted model.
We deployed Llama 3.2 3B Instruct on an A10G GPU this way, and OpenClaw was talking to it within minutes.
Llama Stack
Available in technology preview as part of Red Hat AI, Llama Stack provides a unified API that simplifies inference routing and multiturn conversations. It provides chat completions across multiple backends (vLLM, OpenAI, Anthropic), retrieval-augmented generation (RAG) APIs (file_search, vector stores), and an implementation of the stateful OpenAI Responses API for multiturn agent conversations.
Deploy it using the Red Hat OpenShift AI operator to get inference, retrieval, and state management through a single endpoint. You can swap between self-hosted and remote inference by changing the model parameter:
# Self-hosted (vLLM via Llama Stack)
model: vllm-local/llama3-2-8b
# Remote (OpenAI via Llama Stack)
model: openai-hosted/gpt-4o-miniBecause the endpoint and API remain the same, the agent does not know which backend handles the request.
Models-as-a-Service (MaaS)
Available in technology preview in Red Hat AI, Models-as-a-Service (MaaS) is a managed model-serving platform. It includes built-in API key management, rate limiting, and policy enforcement through Gateway API and Kuadrant. Models are served through KServe with an API gateway in front, providing OpenAI-compatible endpoints with production controls (such as authentication, quotas, traffic routing) out of the box.
All three options expose standard OpenAI-compatible APIs. Your agent connects in the same way regardless of the path you choose.
Agent identity and zero trust
Agents call other services, such as LLMs, tools, databases, andother agents. Most of these calls use long-lived API keys with broad permissions. There is no standard way to declare that a Deployment is an agent, scope its access, or verify its identity when it calls a downstream service.
Kagenti addresses this with two layers: an operator for agent lifecycle visibility and AuthBridge for zero trust service-to-service authentication. Kagenti is planned as part of Red Hat AI in the second half of 2026, with a preview coming soon.
AgentRuntime
The AgentRuntime custom resource definition (CRD) binds an agent's operational configuration to its workload. You declare "this Deployment is an agent," and the controller takes over. The controller resolves the target workload, such as a Deployment or StatefulSet), and computes a config hash from a three-layer merge of cluster defaults, namespace defaults, and CR overrides.,
It applies that hash to the pod template to trigger rolling updates when the configuration changes. The controller also tracks the runtime phase (such as Pending, Active, or Error) using structured conditions. It also watches for changes to the target workload and to cluster/namespace ConfigMaps, so config changes at any level automatically reconcile the agent's pods.
The CR includes per-agent overrides for tracing (OpenTelemetry endpoint, protocol, and sampling rate) and identity (SPIFFE trust domain):
apiVersion: agent.kagenti.dev/v1alpha1
kind: AgentRuntime
metadata:
name: openclaw
namespace: openclaw
spec:
type: agent
targetRef:
apiVersion: apps/v1
kind: Deployment
name: openclaw
trace:
endpoint: "mlflow-service.mlflow.svc.cluster.local:4318"
protocol: http
sampling:
rate: 1.0
identity:
spiffe:
trustDomain: "example.com"AgentCard
The AgentCard provides the agent's metadata, including its capabilities, endpoints, and protocols. Together, AgentRuntime and AgentCard give the platform visibility into what agents are running, where, what they can do, and how they are configured.
AuthBridge
AuthBridge is the identity layer. AuthBridge provides transparent token management for agent workloads through sidecars injected by the Kagenti webhook:
client-registration: Automatically registers the agent as a Keycloak client using its Secure Production Identity Framework for Everyone (SPIFFE) ID. No manual client configuration or static credentials.- AuthProxy: An Envoy-based proxy and external processor that intercepts inbound and outbound traffic. It validates JSON Web Tokens (JWT) tokens for inbound traffic and exchanges the caller's token for one scoped to the target service for outbound traffic.
- SPIFFE Helper: Provides the agent's workload identity (SVID) from SPIRE.
When Agent A calls Agent B, the token is automatically exchanged for one scoped to Agent B's audience. The application code does not change. The sidecar handles validation, exchange, and credential rotation transparently. Static API keys are replaced with short-lived, audience-scoped JWTs. Each agent gets its own identity and can only call services it has been authorized to reach.
The AgentRuntime controller is being developed further to integrate natively with AuthBridge (currently a Kagenti extension) and other secure identity and sandboxing solutions, so that declaring an agent workload automatically provisions its identity, scopes its access, and enforces its security boundaries. More to come on this in future posts and platform releases.
Platform security: What OpenShift enforces by default
On vanilla Kubernetes, containers can run as root, hold ambient credentials, and accept unauthenticated traffic. OpenShift prevents all three by default:
- Security Context Constraints (SCCs): Every container runs as a random non-root UID with all capabilities dropped. You do not need a custom SCC for agent workloads.
- Built-in OAuth: An
oauth-proxysidecar authenticates users against the OpenShift OAuth server without requiring an external identity provider. If you can runoc login, you can access your agent. - Automatic TLS: Routes terminate TLS by using the cluster wildcard certificate. WebSocket upgrades work natively.
Deploy OpenClaw
To deploy OpenClaw, you must first ensure your environment meets the following requirements.
Prerequisites
- An OpenShift cluster where you can create a namespace; cluster-admin privileges are not required.
ocCLI authenticated (oc login)- An API key or endpoint URL for a model provider
A note on storage: OpenClaw uses SQLite for its agent memory index, which requires POSIX file locking via fcntl(). Block storage classes (such as gp3-csi on AWS, managed-csi on Azure, or thin-csi on vSphere) work correctly. Avoid NFS-backed storage classes.
Deploy with the openclaw-installer
The openclaw-installer is a community-supported utility that automates deployment. It generates standard Kubernetes manifests, detects OpenShift, and automatically adds OAuth proxy integration:
git clone https://github.com/sallyom/openclaw-installer.git
cd openclaw-installer
npm install && npm run build && npm run dev
Open http://localhost:3000, fill in the deploy form (agent name, image, API key), and click Deploy. The installation takes about two minutes, primarily for the container image pull. When the installation is complete, the installer prints the Route URL with a preloaded gateway token.
What gets deployed
The installer creates a dedicated namespace that includes the following resources:
| Resource | Purpose |
|---|---|
| Namespace | An isolated namespace labeled for installer discovery. |
| ServiceAccount | A service account for the oauth-proxy that includes an OAuth redirect annotation. |
| Secrets | Secrets that store the OAuth configuration, gateway token, and model provider API keys. |
| ConfigMaps | Configuration maps for the agent configuration file (openclaw.json) and workspace files, such as AGENTS.md and SOUL.md. |
| PVC (10Gi) | All persistent state, including session transcripts, agent memory, and configuration. |
| Deployment | A pod that includes an init container, an oauth-proxy sidecar, and the OpenClaw gateway. |
| Service + Route | A TLS-terminated route that targets the oauth-proxy. |
The Deployment runs a single pod with three containers: an init container that configures the gateway, an oauth-proxy sidecar that handles authentication, and the OpenClaw gateway. All three run under the default restricted-v2 SCC without requiring modifications.
For the full YAML of each resource, see the example manifests.
Access your instance
After deployment, open the Route URL printed by the installer. You will be redirected to the OpenShift login page. After authenticating, the gateway token is included in the URL, so no manual copy-paste is needed.
To customize your agent, edit the workspace files locally (AGENTS.md for instructions, SOUL.md for personality, IDENTITY.md for who the agent is) and click Re-deploy in the installer's Instances tab.
What is supported vs. what is community tooling
The following table summarizes the support status for the components used in this deployment, distinguishing between enterprise-ready Red Hat products and community-supported projects.
| Component | Source | Status |
|---|---|---|
| Red Hat OpenShift | Red Hat | Supported product |
| Red Hat OpenShift AI (vLLM, KServe, TrustyAI, model serving) | Red Hat | Supported product |
| Llama Stack (via the Red Hat OpenShift AI operator) | Red Hat / upstream | Supported in Red Hat OpenShift AI |
| Kagenti operator | kagenti.dev | Open source, upstream. Planned for Red Hat AI 2H 2026 (preview soon). |
| OpenClaw | openclaw | Open source, upstream |
claw-installer | sallyom/openclaw-installer | Community utility |
Take the next step with OpenClaw and Red Hat AI
Moving operational needs like identity, guardrails, observability, and hybrid inference to the platform level lets you focus on building your agent's logic rather than its infrastructure. Start by experimenting with the Red Hat AI platform and stay tuned for more on deploying and managing agentic AI.
Learn more: