Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • See all Red Hat products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Red Hat OpenShift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • See all technologies
    • Programming languages & frameworks

      • Java
      • Python
      • JavaScript
    • System design & architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer experience

      • Productivity
      • Tools
      • GitOps
    • Automated data processing

      • AI/ML
      • Data science
      • Apache Kafka on Kubernetes
    • Platform engineering

      • DevOps
      • DevSecOps
      • Red Hat Ansible Automation Platform for applications and services
    • Secure development & architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & cloud native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • See all learning resources

    E-books

    • GitOps cookbook
    • Podman in action
    • Kubernetes operators
    • The path to GitOps
    • See all e-books

    Cheat sheets

    • Linux commands
    • Bash commands
    • Git
    • systemd commands
    • See all cheat sheets

    Documentation

    • Product documentation
    • API catalog
    • Legacy documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore the Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Use OpenShift Lightspeed with locally served LLMs to drive security-focused, cost-efficient enterprise solutions for Red Hat products

November 5, 2025
Samy Zemmouri
Related topics:
Artificial intelligenceDeveloper Productivity
Related products:
Red Hat AIRed Hat OpenShift AIRed Hat OpenShift Container PlatformRed Hat OpenShift Lightspeed

    Red Hat OpenShift Lightspeed is an integrated assistant within the Red Hat OpenShift web console designed to improve user efficiency. It streamlines operations and enables users to quickly access information about Red Hat OpenShift Container Platform and its various components, aiding in troubleshooting and simplifying management.

    Integrating OpenShift Lightspeed with locally served large language models (LLMs) via Red Hat OpenShift AI offers significant benefits over relying on external APIs. This approach helps keep sensitive data internal and reduce operational costs. It also gives you greater control over performance and model customization. 

    This blog demonstrates the practical steps involved in integrating OpenShift Lightspeed with an LLM that is actively serving inference within Red Hat OpenShift AI. By following along, you will gain a clear understanding of how to connect these tools to use a locally inferenced model for enhanced assistance within your OpenShift environment.

    Prerequisites

    Before you begin, ensure you have the following prerequisites in place:

    • An OpenShift cluster (4.17 or later)
    • The OpenShift AI operator (2.19 or later)
    • S3-compatible object storage (for this blog, I used Red Hat OpenShift Data Foundation)
    • The AWS command-line interface (CLI)
    • 1 GPU node (I used a g6.4xlarge instance type powered by an NVIDIA L4 GPU)

    Model upload

    To store the model and make it accessible for serving in OpenShift AI, you first need an S3-compatible object storage back end. In OpenShift environments, OpenShift Data Foundation provides a seamless way to provision S3-compatible object storage. Using the OpenShift Data Foundation console, you can create an ObjectBucketClaim to automatically provision a backing bucket and generate access credentials (including AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY). See Figure 1.

    OpenShift Data Foundation console showing a form to create an ObjectBucketClaim with fields prefilled for llm-bucket, StorageClass, and BucketClass.
    Figure 1: Configuring an ObjectBucketClaim for storage provisioning.

    Once your S3 bucket is ready, you can clone a pre-trained model from Hugging Face. Here I used the Qwen3-8B model:

    git clone https://huggingface.co/Qwen/Qwen3-8B

    Next, use the AWS CLI to upload the model to the S3 bucket. Configure an AWS profile with the credentials provided by the ObjectBucketClaim. Point the --endpoint-url to the S3 route URL that can be retrieved in the OpenShift storage namespace. Here's the command used to upload the model:

    aws s3 sync --profile \ --endpoint-url \/ ./\ s3://\/\/

    This command recursively uploads the model files from your local directory to the specified path inside the S3 bucket (see Figure 2). Once uploaded, the model is ready to be referenced by your OpenShift AI ServingRuntime or pipeline workflow.

    The object storage bucket displaying the uploaded Qwen3-8B model's components, including safetensors and configuration files.
    Figure 2: The uploaded Qwen3-8B model files are available for use.

    Model deployment

    Once your model is uploaded to an S3 bucket, you can reference it within OpenShift AI to serve it. This process involves these key steps: creating a connection to the S3 bucket and then configuring the model deployment.

    1. Create a connection to your S3 bucket

    Connections link your OpenShift AI project to external data sources like S3. This allows your workbenches and models to access data without needing to embed credentials directly into your code.

    To create a connection:

    1. From the OpenShift AI dashboard, navigate to Data Science Projects.
    2. Select your project and go to the Connections tab.
    3. Click Add connection.
    4. Choose S3 compatible object storage as the connection type.
    5. Complete the form with your details. For this example, you would use:
      • Connection Name: llm-bucket-connection
      • Access Key and Secret Key: Use your S3 credentials
      • Endpoint: http://s3.openshift-storage.svc
      • Region: us-east-2
      • Bucket: llm-bucket-17e4f051-fb08-4605-a8b5-255d05c89721
    6. Click Add connection.

    Once created, the connection will be available for use in your workbenches and when configuring model servers.

    2. Configure and deploy the model

    The recommended deployment method for large models is the single-model serving platform (Figure 3), where each model gets its own dedicated server.

    The option to select the recommended single-model serving platform method. The user interface instructs the user to choose this option when you want to deploy a large model such as a large language model (LLM).
    Figure 3: Choosing the single-model serving platform deployment option.

    To deploy your model:

    1. Go to Models → Model Deployments.
    2. Click Deploy Model.
    3. Fill out the form with the following details:
      • Model deployment name: Qwen3-8B
      • Runtime: vLLM NVIDIA GPU Serving Runtime
      • Deployment mode: Standard (this uses Kubernetes resources with fewer dependencies)
      • Number of replicas: 1
      • Model server size (see Figure 4): Custom
    Part of the OpenShift AI deployment form with selections in place. Model server size is set to Custom. 3 cores are requested for the CPUs, with a CPU limit of 4 cores. The requested memory is set to 8 GiB and memory limit is set to 14 GiB.
    Figure 4: Configuring the model deployment in OpenShift AI.

    The Model server size arguments in term of CPU/Mem might need to be adjusted based on your hardware limits:

    • Accelerator: nvidia-gpu-node
    • Model route: Select Make deployed models available through an external route and Require Token Authentication.
    • Source model location:
      • Existing connection: llm-bucket-connection
      • Path: Qwen3-8B/
    • Additional server runtime arguments:
      • --max-model-len=6144: Determines how much information the model can process at once.
      • --enable-auto-tool-choice: Enables the model to automatically decide which tool to use, if any, based on the user's request.
      • --tool-call-parser=granite: Specifies the parser for interpreting the model's output when it calls a tool.

    Once you've populated all the details, click Deploy. After a few minutes, you'll see a green checkmark under the Status column, indicating a successful deployment (Figure 5).

    The granite-3.3-8b-instruct model shown in the Model deployments list with a green checkmark in the Status column.
    Figure 5: The model has been deployed.

    Once your model is deployed, you'll need to retrieve the necessary details to interact with it:

    1. Navigate to your deployed model: Data science projects → LLMs → Models
    2. Select Internal and external endpoints details. This displays both the internal and external URLs for your model (Figure 6).
    Internal and External URLs shown on the Inference endpoints details page in Red Hat OpenShift AI.
    Figure 6: Viewing the model's internal and external URLs.

    You'll use the internal URL to allow OpenShift Lightspeed to interact with the deployed model.

    To get the token secret, click the drop-down menu next to the model's name. This secret is required for authentication and is used along with the internal URL to make requests to the model.

    OpenShift Lightspeed

    Create a secret in the openshift-lightspeed namespace. This secret stores the API token needed for OpenShift Lightspeed to authenticate with and access the model.

    oc create secret generic credentials \
      --namespace openshift-lightspeed \
      --from-literal=apitoken='<API_TOKEN>' \
      --type=Opaque

    Create the customer resource OLSConfig, which is used by the OpenShift Lightspeed operator to configure the assistant's behavior, including which LLM to use.

    apiVersion: ols.openshift.io/v1alpha1
    kind: OLSConfig
    metadata:
      name: cluster
    spec:
      llm:
        providers:
          - name: red_hat_openshift_ai
            type: rhoai_vllm
            url: https://qwen3-8b-predictor.llms.svc.cluster.local:8443/v1
            credentialsSecretRef:
              name: credentials
            models:
              - name: qwen3-8b
      ols:
        defaultProvider: red_hat_openshift_ai
        defaultModel: qwen3-8b
        logLevel: INFO

    Once the OLSConfig object is applied, you can start interacting with the OpenShift Lightspeed chatbot located in the OpenShift console, as shown in Figure 7.

    An interaction between a user and the Red Hat OpenShift Lightspeed chatbot. The user asked "what is openshift" and OpenShift Lightspeed responded with a paragraph summary.
    Figure 7: Prompting the OpenShift Lightspeed chatbot from the OpenShift console.

    Enable cluster interaction for OpenShift Lightspeed

    OpenShift Lightspeed uses a LLM to answer user questions. You can significantly improve the accuracy of these responses by enabling cluster interaction, a feature that enriches the model's knowledge with real-time information from your clusters.

    When you enable this feature, it installs a Model Context Protocol (MCP) server. This server standardizes how applications provide context to an LLM, granting the OpenShift Lightspeed service read access to your cluster's API. This allows the LLM to gather the necessary context to provide highly specific and relevant responses tailored to your environment.

    Choosing the right model

    The effectiveness of MCP tool calling is highly dependent on the LLM itself. An LLM's ability to effectively select and use a tool is a key factor in its performance. Generally, larger models with more parameters are better at this. If you use a smaller model, you may notice degraded performance in tool selection or other aspects of cluster interaction, as these models may struggle to correctly select the appropriate tool and interpret the information provided by the MCP server.

    The Berkeley Function-Calling Leaderboard's agentic evaluation highlights that the Qwen3-8B model delivers good tool-calling performance relative to its size. This makes it an ideal choice for showcasing the cluster interaction feature, especially in environments with limited GPU resources.

    Note

    Cluster interaction is a Technology Preview feature as of this writing.

    How to enable cluster interaction

    To enable cluster interaction, you need to set the introspectionEnabled parameter to true. You can do this by running the following command in your terminal:

    oc patch olsconfig cluster --type=json -p='[{"op": "add", "path": "/spec/ols/introspectionEnabled", "value": true}]' -n openshift-lightspeed

    After running this command, the OpenShift Lightspeed pods will restart. Once they are in a Running state, you can begin using this feature to get real-time details about your cluster, as shown in Figures 8 and 9.

    OpenShift Lightspeed provides an overview of the nodes in a user's cluster, including a list of key details such as "All nodes are in a Ready state." It then asks if the user would like details about specific nodes.
    Figure 8: Using the cluster interaction feature to get details about cluster nodes and their status.
    Tool output shows details for nodes in an OpenShift cluster, with columns for APIVERSION, KIND, NAME, STATUS, ROLES, AGE, VERSION, INTERNAL-IP, and EXTERNAL-IP.
    Figure 9: Viewing the tool output with node details.

    Conclusion

    This blog showcased a practical example of how to leverage generative AI in a way that is tailored and cost-effective for organizations. By connecting Red Hat OpenShift Lightspeed with a locally served large language model in OpenShift environment, you can create an intelligent assistant that helps streamline your work and address key enterprise concerns around data privacy and operational costs. This approach gives you greater control over model performance and customization while using organizations' private infrastructure.

    Related Posts

    • Integrate incident detection with OpenShift Lightspeed via MCP

    • OpenShift Lightspeed: Assessing AI for OpenShift operations

    • What’s new for developers in Red Hat OpenShift 4.19

    • How to secure Azure credentials for OpenShift Lightspeed

    • How spec-driven development improves AI coding quality

    • Integrate a private AI coding assistant into your CDE using Ollama, Continue, and OpenShift Dev Spaces

    Recent Posts

    • Setting up Intel TDX VMs with Trustee on OpenShift

    • Building and running Request Tracker as a quadlet container

    • Use OpenShift Lightspeed with locally served LLMs to drive security-focused, cost-efficient enterprise solutions for Red Hat products

    • 3 MCP servers you should be using (safely)

    • How Red Hat Lightspeed events enhance system life cycle management

    What’s up next?

    Download the OpenShift command-line essentials cheat sheet for developers to discover the most useful commands for working with the oc CLI.

    Get the cheat sheet
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue