Every layer counts: Defense in depth for AI agents with Red Hat AI

You are either running AI agents in production now or you will be soon. And if you are anything like the platform engineers we have been talking to, you are probably already feeling the tension: your AI teams want agents with shell access, file system access, and network access. Your security team wants to know who is watching these things. Both are right.

The reality is that we are handing root-level capabilities to systems that can be tricked by a well-crafted paragraph. When a chatbot hallucinates, you get a wrong answer. When an agent hallucinates, it might rm -rf the wrong directory or POST your credentials to an external endpoint. The stakes changed. The guardrails mostly did not.

At Red Hat AI, we have been working on this problem across the entire stack, from metal to the agent covering the entire deployment lifecycle. We have also been talking to platform engineers running agent workloads on Kubernetes and collaborating with projects like OpenShell. We have found that teams succeeding with these deployments rely on a compound security approach rather than a single layer.

This article presents that framework. It is the third post in a series covering how to operationalize AI agents with Red Hat AI and the OpenClaw project. Catch up on the other parts in the series:

Part 1: Operationalizing "Bring Your Own Agent" on Red Hat AI, the OpenClaw edition
Part 2: Deploying agents with Red Hat AI: The curious case of OpenClaw
Part 3: Every layer counts: Defense in depth for AI agents with Red Hat AI
Part 4: Testing infrastructure red teaming with abliterated models

The threat of semantic malware

Before we look at solutions, it helps to understand how this differs from traditional security.

A VirusTotal scan of the OpenClaw agent skill marketplace found 314 malicious skills from a single publisher. These skills were disguised as legitimate tools already circulating in the ecosystem. The payload was natural language. There was no code to hash-check and nothing for a malware scanner to flag. The attack was a sentence in a README file that instructed the agent to send sensitive data to an external server.

Researchers call this semantic malware, where the delivery package contains no malicious code. The malware exists entirely in the workflow, as instructions that look like documentation but direct the agent to do something harmful.

Sound familiar? If you have ever asked an agent to summarize a .zip file from a colleague and felt a little uneasy about it, your instinct was correct.

Why "just put it in a container" is not enough

We hear this a lot. While it is not wrong, it is not nearly enough. There are three specific gaps:

Default security profiles are often too permissive. An agent can execute arbitrary binaries, delete files, and access resources it has no business touching. That is not a sandbox; that is a suggestion.
Network policies are IP-based. Agents need to reach dynamic software-as-a-service (SaaS) APIs, such as GitHub and various cloud services. Because these endpoints change frequently, an IP is difficult to maintain. You need domain-level filtering, and vanilla Kubernetes does not give you that.
Pods automatically mount service account tokens. If an agent is compromised, an attacker can use that token to move laterally across your cluster. Most agent pods do not need cluster API access at all.

Each of these gaps is present in the default configuration.

A six-layer security framework

The teams we work with that deploy agents safely all follow the same principle: each layer assumes the one above it has been breached. Figure 1 shows the framework from bottom to top.

A vertical stack illustrating 6 security layers: hardened platform, isolated runtime, sandboxed process, guarded conversation, proactive scanning, and identity-based deployment. — Figure 1: The six-layer defense-in-depth stack. Each layer assumes the one above it has been breached. All layers converge in Red Hat AI as a single platform.

Layer 1: Start with a hardened platform

The job: Ensure the platform is hardened before the agent writes a single token.

As part of Red Hat OpenShift, security context constraints (SCCs) enforce restricted-v2 profiles on agent pods. These profiles prohibit privilege escalation, host namespaces, and root user IDs (UIDs). SELinux adds mandatory access control (MAC) at the kernel level, regardless of the UID. Without SELinux, a compromised agent could read the /etc/shadow file, open a raw socket, and phone home. With SELinux in enforcing mode, the kernel blocks all three actions.

The most overlooked setting is automountServiceAccountToken: false on every agent pod. For network policy, the OpenShift EgressFirewall filters by DNS name rather than just an IP address. This allows you to permit traffic to api.openai.com and github.com while blocking all other traffic.

Platform hardening is a minimum requirement, but a surprising number of agent deployments skip it because default settings work and configurations often go unquestioned. Red Hat AI and OpenShift provide default configurations focused on your security needs.

Layer 2: Isolate the runtime

The job: Limit the impact if an agent triggers a container escape.

Built on Kata Containers and generally available with Red Hat OpenShift, OpenShift sandboxed containers (Figure 2) wrap each pod in a lightweight virtual machine (VM). If an agent runs unexpected code from a prompt injection and triggers an exploit, the attacker lands in a throwaway VM rather than the host node. The performance overhead is manageable for most agent workloads, which are often I/O-bound, such as when waiting for model APIs or reading files.

Architecture comparing pods sharing a host kernel versus sandboxed pods using individual kernels and a hypervisor for isolation. — Figure 2: OpenShift sandboxed containers based on Kata containers as a kernel-isolated runtime native to Kubernetes and OpenShift.

The upstream agent-sandbox project within Kubernetes SIG Apps is building a dedicated Sandbox custom resource for agent runtimes that uses warm pools to enable sub-second cold starts. Worth watching.

Layer 3: Sandbox the process and enforce policy

The job: Control what individual binaries can do inside the container.

This is where things get interesting. Layers 1 and 2 operate at the pod level: either the pod can reach an endpoint, or it cannot. But inside that pod, node and curl have the same permissions. That is the gap.

Looking ahead to future releases, the planned addition of OpenShell, an open source sandboxed runtime for autonomous AI agents, to Red Hat AI, closes this gap. OpenShell operates at the process level. It knows which binary inside the sandbox is making each request, verified by SHA-256 hash, and enforces different rules for each one. So node can reach api.github.com, but curl spawned by a prompt injection cannot. OpenShell is designed to provide per-binary network policy, a feature often missing in other agent sandboxes. Existing tools cannot distinguish node from curl within the same sandbox.

Three capabilities worth highlighting:

Per-binary network policy using an HTTP CONNECT proxy that inspects /proc to identify the exact binary and its full ancestor chain. Open Policy Agent (OPA) and Rego policies define per-binary, per-host allowlists.
Kernel-enforced file system allowlisting using Landlock, a Linux kernel security module. Mark directories read_only or read_write. Once applied, restrictions are irreversible for the process tree. Not even root can loosen them. A prompt injection that tells the agent to cat ~/.ssh/id_rsa receives an EACCES error from the kernel.
Credential-free inference routing. The agent calls a virtual host (inference.local) with no authentication headers. The sandbox supervisor intercepts, injects real credentials from its own memory, and forwards them to the backend. Run env | grep KEY inside the sandbox and you receive no output.

The workflow is iterative: start locked down, the agent hits walls, OpenShell's denial aggregator proposes policy updates with confidence scores; once you approve or reject them, the rules hot-reload. This allows your security policy to grow from actual behavior rather than guesswork.

Seccomp, Landlock, and network namespaces isolate an agent, routing traffic through a 7-stage supervisor inspection for API access or rejection. — Figure 3: OpenShell’s process isolation and sandboxing.

OpenShell supports five deployment backends: Kubernetes, Docker, libkrun microVM, QEMU VM, and Podman for rootless developer workstations. This design is extensible with drivers to allow the same user experience across different deployment environments.

Layer 4: Guard the conversation

The job: Control what the model can say and the tools it can call.

Available as part of Red Hat AI, the generally available NeMo Guardrails and FMS guardrails orchestrator, both operated by the TrustyAI Service Operator, sit between the user and the model. These tools intercept prompt injections before they reach the model and filter unsafe outputs before they reach the user.

One pattern to consider is the reviewer versus executor split. The model that reviews a tool definition should not be the same model that executes it. This requires attackers to create a payload that appears benign to one model while functioning as malware for another, which is more difficult than deceiving a single model.

An MCP Gateway, built on the Kuadrant project and available as technology preview in the product, can operate in front of Model Context Protocol (MCP) servers, applying authentication, authorization, and rate limiting to every tool call. The agent cannot call execute_shell if the gateway has no route for it.

Layer 5: Attack yourself first

The job: Identify vulnerabilities before your adversaries can exploit them.

Guardrails are reactive. Red teaming is proactive.

Integrated into the Red Hat AI portfolio as a technology preview, Garak (Generative AI Red-teaming and Assessment Kit) is an open source large language model (LLM) vulnerability scanner with more than 120 probes across prompt injection, jailbreaks, data leakage, and more. Red Hat's acquisition of Chatterbox Labs in December 2025 added enterprise-grade automated red teaming that specifically measures how agents respond to adversarial inputs and detects when MCP server actions are triggered by injected instructions.

Dashboard displaying a red-teaming report summary with 780 attempts, a success rate chart, and a heatmap analyzing specific adversarial intents. — Figure 4: An example of Red Teaming report results with Garak (triggered with EvalHub and KFP).

The workflow: Run Garak (via Eval-hub or KubeFlow pipelines) in a continuous integration (CI) pipeline to identify regressions. Feed findings back into your guardrails. Repeat.

Layer 6: Deploy with identity and observability

The job: Ensure AI engineers can ship agents without becoming security experts.

This is where the human factor matters as much as solving the technical problem. The people building agents and the people securing them are rarely the same. AI engineers care about tool bindings and prompt chains. Platform engineers care about SCCs, runtime classes, and egress rules.

The AgentRuntime custom resource definition (CRD) is part of Kagenti and on track to become part of Red Hat OpenShift AI. It bridges that gap with two capabilities: SPIFFE-based workload identity, where every agent gets a cryptographic identity instead of a shared API key, and OpenTelemetry trace integration, which captures every tool call with full lineage. Platform engineers configure these settings at the cluster level. AI engineers inherit it automatically.

Workflow showing Keycloak and SPIRE issuing JWTs to agents, providing secure tool access with auditable workload identities. — Figure 5: Identity flows enabled by Kagenti’s authbridge and the Red Hat AI stack.

We are working to integrate OpenShell as the sandbox runtime for this layer. This adds process-level enforcement alongside identity and tracing.

The full stack at a glance

Layer	The job	Red Hat AI components
Platform	Harden the platform before the agent starts	Red Hat OpenShift (SCCs, SELinux, EgressFirewall)
Runtime	Contain escapes in a throwaway VM	OpenShift sandboxed containers
Process sandbox and policy engine	Per-binary network policy, file system allowlisting, credential-free inference	OpenShell
Guardrails	Intercept injections and gate tool calls	Red Hat OpenShift AI + TrustyAI + MCP Gateway
Red teaming	Identify vulnerabilities before adversaries do	Garak (infused with Chatterbox Labs techniques)
Deployment	SPIFFE identity, OpenTelemetry tracing, and process-level sandboxing	AgentRuntime + OpenShell (coming to Red Hat OpenShift AI)

All of these converge in Red Hat AI as a single platform, not six products to stitch together.

A note on tenancy: Six layers are the baseline, not the ceiling. The number of isolation boundaries depends on how you define your tenancy model. A cluster-per-tenant model with hosted control planes adds a control plane isolation layer that does not exist in a shared cluster. A project-per-tenant model with network policies and RBAC provides a different boundary. A sandboxed container-per-tenant model adds VM isolation at the tenant level, rather than only at the agent level. Red Hat OpenShift provides platform teams with the building blocks to add security layers based on their specific multi-tenancy requirements. A six-layer approach is a starting point, not the final state.

The one thing to remember

If you take away one idea from this post, start with the assumption that the agent will be compromised. Not might. Will.

A prompt injection will land. A malicious tool will slip through review. A model will do something nobody anticipated. The question is not whether these events happen, but whether your stack limits the impact when they do.

While the threat model for autonomous agents is new, the infrastructure patterns for containing them are not. Principles such as least privilege, workload isolation, egress filtering, and zero trust identity are proven ideas applied to a new class of workload. Red Hat AI provides a unified foundation where these capabilities become one opinionated stack, so your teams do not have to build a security framework from scratch.

Get started

If you want to put this framework into practice, start with these resources:

Learn how to build agents on Red Hat AI with a focus on security:

Explore the upstream communities driving this work:

OpenShell: A process-level agent sandboxing tool with per-binary network policy
Kagenti: A cloud-native agent deployment with SPIFFE identity and OpenTelemetry tracing
agent-sandbox: A Kubernetes SIG Apps project for dedicated agent runtime isolation
Garak: An open source LLM vulnerability scanner with more than 120 adversarial probes
NeMo Guardrails: An open source toolkit for programmable guardrails
TrustyAI: An AI safety operator for Red Hat OpenShift AI
MCP Gateway: A service for authentication and authorization of MCP tool calls
SPIFFE/SPIRE: A zero trust workload identity framework for agent-to-agent communication

Last updated: June 2, 2026

Every layer counts: Defense in depth for AI agents with Red Hat AI

The threat of semantic malware

Why "just put it in a container" is not enough

A six-layer security framework

Layer 1: Start with a hardened platform

Layer 2: Isolate the runtime

Layer 3: Sandbox the process and enforce policy

Layer 4: Guard the conversation

Layer 5: Attack yourself first

Layer 6: Deploy with identity and observability

The full stack at a glance

The one thing to remember

Get started

Learn to optimize, deploy, and benchmark LLMs with vLLM: A New Free Course

Build modular AI pipelines with OpenShift AI and reusable components

Kafka Monthly Digest: May 2026

UBI 9 and 10 builders on Paketo Buildpacks with multi-arch support

Deploy Hermes Agent on OpenShift AI with vLLM model serving

Open source AI for developers

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links