Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training

    Featured resources

    • Open Source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Integrate Claude Code with Red Hat AI Inference Server on OpenShift

March 26, 2026
Alexander Barbosa Ayala
Related topics:
Artificial intelligenceDeveloper productivity
Related products:
Red Hat AI InferenceRed Hat AIRed Hat Enterprise Linux AIRed Hat OpenShift AIRed Hat OpenShift Container Platform

    Agentic coding tools help developers build software efficiently. Claude Code, Anthropic's terminal-based coding agent, improves productivity by letting you interact with your codebase through natural language—directly from the console.

    One advantage of Claude Code is its flexibility. Rather than being locked to Anthropic's cloud models, you can connect it to any backend that uses the Anthropic Messages API. 

    This article explores how to integrate Claude Code with a local model served by Red Hat AI Inference Server (a downstream version of vLLM) on Red Hat OpenShift. This approach keeps the inference process private on your infrastructure while retaining the full Claude Code workflow. By doing so, you keep all prompts and responses within your environment while benefiting from Claude Code's developer-focused workflows.

    Prerequisites

    You will need:

    • An OpenShift cluster with GPUs enabled and the NVIDIA Operator installed. For a local OpenShift installation, follow the steps in How to enable NVIDIA GPU acceleration in OpenShift Local.
    • A Hugging Face account and active API token.
    • Access to the Red Hat image registry.

    Environment

    I executed the steps in this article using an environment with the following specifications:

    • Single-node OpenShift 4.21
    • GPU: NVIDIA RTX 4060 Ti
    • CPU: Intel Core i7-14700 × 28
    • Host machine operating system: Fedora 43

    Disclaimer

    Because this testing machine is not part of a supported environment, this demo is for testing only and does not represent an official Red Hat support procedure.

    Deploy the Red Hat AI Inference Server

    The first step is to deploy Red Hat AI Inference Server. For this demo, I created a Helm chart to simplify the deployment in an OpenShift 4.21 environment. You can alternatively follow the manual deployment procedure. 

    Clone the project:

    git clone https://github.com/alexbarbosa1989/rhai-helm

    Set the minimal required environment variables:

    export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    export AUTHFILE=$XDG_RUNTIME_DIR/containers/auth.json
    export STORAGECLASS=<ocp-storageclass>

    Alternatively, you can configure your own values in the rhai-helm/values.yaml file—for example, a different model from Hugging Face or a custom namespace.

    Hint: Before setting AUTHFILE, verify whether auth.json already exists at the expected path. This file is created automatically when you authenticate using Podman in the terminal.

    podman login registry.redhat.io

    Once you define the required environment variables, you can install the Helm chart. For example, to use the default rhai-helm/values.yaml, run:

    helm install rhai-helm ./rhai-helm \
    --create-namespace --namespace rhai-helm \
    --set persistence.storageClass=$STORAGECLASS \
    --set secrets.hfToken=$HF_TOKEN \
    --set-file secrets.docker.dockercfg=$AUTHFILE

    Check the created resources:

    oc get secrets
    oc get pvc model-cache
    oc get deployment
    oc get svc
    oc get route

    Finally, check the running pod. This might take a few minutes, depending on hardware resources.

    oc get pod
    NAME                        READY   STATUS    RESTARTS   AGE
    qwen-coder-5f6668b767-hp585   1/1     Running   0          5m11s

    Install and configure Claude Code

    Configure this on your developer workstation. Follow the official installation instructions, or install it directly using the convenience script for Linux and macOS:

    curl -fsSL https://claude.ai/install.sh | bash

    Claude Code uses environment variables for configuration. By overriding the default Anthropic settings, you can redirect requests to a local model served by vLLM. Use this example configuration:

    ANTHROPIC_BASE_URL="<RHAI-Inference-exposed-route>" \
    ANTHROPIC_API_KEY="vllm" \
    ANTHROPIC_DEFAULT_OPUS_MODEL="qwen-coder" \
    ANTHROPIC_DEFAULT_SONNET_MODEL="qwen-coder" \
    ANTHROPIC_DEFAULT_HAIKU_MODEL="qwen-coder" \
    CLAUDE_CODE_FILE_READ_MAX_OUTPUT_TOKENS="2000" \
    CLAUDE_CODE_MAX_OUTPUT_TOKENS="4096" \
    MAX_THINKING_TOKENS="0" \
    claude

    The ANTHROPIC_BASE_URL environment variable must point to the exposed OpenShift route of the Red Hat AI inference service. This is the endpoint Claude Code uses for all requests. 

    Replace the example value with the route generated in your OpenShift cluster. Retrieve the route by running:

    oc get route -n <namespace>

    Also, the values for CLAUDE_CODE_FILE_READ_MAX_OUTPUT_TOKENS and CLAUDE_CODE_MAX_OUTPUT_TOKENS should be tuned according the hardware capabilities to avoid exhausting the context window.

    Once you set the environment variables, launching Claude Code prompts an interactive setup to initialize the workspace (Figure 1).

    claude setup
    Figure 1: Claude Code setup.

    Select ❯ 1. Yes, I trust this folder. At this point, Claude Code is fully initialized and ready for use, as shown in Figure 2.

    claude-setup2
    Figure 2. Claude Code initialization.

    In this example, the following instruction was provided:

    ❯ create a basic quarkus "hello" service

    Claude Code immediately begins processing the request using the locally served model, as illustrated in Figure 3.

    claude-demo1
    Figure 3: Claude Code interactive session.

    You can also verify the interaction directly from the vLLM backend pod in the OpenShift cluster. Successful requests appear in the logs as calls to the /v1/messages API endpoint:

    (APIServer pid=1) INFO:     10.128.0.2:43662 - "POST /v1/messages?beta=true HTTP/1.1" 200 OK
    (APIServer pid=1) INFO:     10.128.0.2:43664 - "POST /v1/messages?beta=true HTTP/1.1" 200 OK

    This confirms that Claude Code successfully routes requests to the OpenShift-hosted inference service.

    Key takeaways

    By integrating Claude Code with a vLLM-based inference service on OpenShift, you gain access to effective AI-assisted coding workflows while keeping models, data, and inference under your control.

    This demonstration uses a lightweight Qwen model. With specialized, higher-performance hardware, you can serve larger models that provide advanced coding and reasoning capabilities.

    Overall, this approach combines the productivity of Claude Code with the security and scalability of OpenShift. It is a practical solution for organizations that need private, on-premises AI development environments.

    Related Posts

    • Introduction to distributed inference with llm-d

    • Deploy an LLM inference service on OpenShift AI

    • Why vLLM is the best choice for AI inference today

    • Profiling vLLM Inference Server with GPU acceleration on RHEL

    • vLLM or llama.cpp: Choosing the right LLM inference engine for your use case

    • Getting started with llm-d for distributed AI inference

    Recent Posts

    • Red Hat UBI 8 builders have been promoted to the Paketo Buildpacks organization

    • Using eBPF in Red Hat products

    • How we made one data layer serve the UI, the mocks, and the E2E tests

    • Build trusted Python containers with Project Hummingbird and Calunga

    • Simplify distributed tracing: ObservabilityInstaller installation

    What’s up next?

    Featured image for LLM Compressor.

    Red Hat AI Inference Server

    Move larger models from code to production faster with an enterprise-grade...

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility