Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • View All Red Hat Products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Secure Development & Architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • Product Documentation
    • API Catalog
    • Legacy Documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Deploy ChatQnA on OpenShift with AMD Instinct

July 23, 2025
Sandip Gahlot Jeremy Ary
Related topics:
Artificial intelligenceSummit 2025
Related products:
Red Hat AIRed Hat OpenShift AIRed Hat OpenShiftRed Hat OpenShift Container Platform

Share:

    The goal of Open Platform for Enterprise AI (OPEA) is to provide an open ecosystem for enterprise level generative AI (gen AI) solutions with a focus on retrieval-augmented generative AI. Red Hat OpenShift AI provides an open ecosystem of software and hardware for model serving and hardware acceleration as well as manages the lifecycle of gen AI models. Red Hat OpenShift Container Platform allows for building (if needed) and deploying or scaling of various components of an application.

    OPEA's platform provides various microservices that are building blocks for gen AI systems, such as:

    • Dataprep
    • Retrieval
    • Embedding
    • Reranking
    • LLM Text Generation Inference (TGI)

    OPEA has various megaservices that make use of these microservices to showcase certain combined capabilities (e.g., chatbot, text-to-image generation, code generator, etc.) AMD Instinct GPU acceleration and ROCm software stack provide optimization for high-performance computing needs typical of gen AI workloads.

    This article will guide you through deployment of OPEA’s ChatQnA megaservice to Red Hat OpenShift, along with the various microservices that are required for the megaservice to work. 

    Deployment process

    The deployment process makes use of a validated pattern to install and set up various resources in an enterprise-ready, GitOps-capable manner. Figure 1 shows the overall architecture used in this article.

    Figure 1: Architectural diagram - ChatQnA app on Red Hat OpenShift with AMD GPU
    Figure 1: Architectural diagram - ChatQnA app on Red Hat OpenShift with AMD GPU
    Figure 1: Architectural diagram of ChatQnA app on Red Hat OpenShift with AMD GPU.

    Prerequisites

    The following prerequisites must be satisfied before you can successfully deploy and use the ChatQnA application:

    • Install and set up Red Hat OpenShift Container Platform.

    • Install OpenShift client.

    • Install Podman.

    • Install Git.

    • Set up a Hugging Face account and acquire necessary model permissions by following these steps:

      1. Create an account at Hugging Face or log in if you already have an account.

      2. Create a read access token.

      3. Share your contact information to access meta-llama/Llama-3.1-8B-Instruct model.

    • Set up MinIO or AWS S3:

      Before serving the model in OpenShift AI, upload the model to either MinIO or Amazon Web Services (AWS) S3. This guide does not go into the details regarding setting up MinIO or AWS S3, so please ensure to set up either MinIO or AWS S3.

    • Install Red Hat OpenShift AI by following these instructions if it is not already installed in the OpenShift Container Platform cluster. Follow the given steps to deploy the model for this article:

      1. Open up OpenShift AI by selecting it from the OpenShift Application Launcher. This will open up OpenShift AI in a new tab.

      2. In the OpenShift AI window, select Data Science projects in the sidebar and click the Create project button. 

      3. Name the project chatqna-llm.

    Create a connection

    Follow these steps to create a connection that will be used by the init-container to fetch the model uploaded in the next step when deploying the mode for inferencing:

    1. Click the Create connection button in the Connections tab in your newly created project.
    2. Select S3 compatible object storage - v1 in the Connection type dropdown, as shown in Figure 2.
    Figure 2: Red Hat OpenShift AI - Create Connection - Connection type
    Figure 2: Red Hat OpenShift AI - Create Connection - Connection type
    Figure 2: Select S3 compatible object storage - v1 in the Connection type dropdown menu.
    1. Use the following values for this data connection, as shown in Figure 3:
      • Connection name: model-store
      • Connection description: Connection that points to the model store (provide any relevant description here).
      • Access key: MinIO username if using MinIO else use AWS credentials.
      • Secret key: MinIO password if using MinIO else use AWS credentials.
      • Endpoint: minio-api route location from the OpenShift Container Platform cluster if using MinIO. Otherwise, use AWS S3 endpoint that is in the format of https://s3.<REGION>.amazonaws.com.
      • Region: us-east-1 if using MinIO otherwise use the correct AWS region.
      • Bucket: models
    Figure 3: Red Hat OpenShift AI - Create connection
    Figure 3: Red Hat OpenShift AI - Create connection
    Figure 3: Enter the values for this data connection.

    This bucket will be created by the Jupyter notebook, if it does not exist, when uploading the model. If you are using AWS S3 and the bucket does not exist, make sure to assign the correct permissions to the IAM user for creating the bucket.

    Create the workbench

    To upload the model needed for this article, create a workbench first:

    1. In the chatqna-llm data science project, create a new workbench by clicking the Create workbench button in the Workbenches tab.

    2. Enter the following values to create the workbench:

      • Name: chatqna
      • Image selection: ROCm-PyTorch
      • Version selection: 2025.1
      • Container size: Medium
      • Accelerator: AMD
      • Cluster storage: Make sure the storage is at least 50GB.
      • Connection: Click the Attach existing connections button and attach the connection named model-store created in the previous step (Figure 4). This will pass on the connection values to the workbench when it is started, which will be used to upload the model.

    Create the workbench by clicking the Create workbench button. The workbench will start and move to running status soon.

    Figure 4: Red Hat OpenShift AI - Create Workbench - Attach existing connections
    Figure 4: Red Hat OpenShift AI - Create Workbench - Attach existing connections
    Figure 4: Attaching existing connections.

    Upload the model using OpenShift AI

    To serve the model, we will first need to download it using the workbench created in the previous step as well as upload it to either MinIO or AWS S3, using the connection named model-store created in one of the previous steps. Follow the steps in this section to serve the model.

    Open the workbench

    Open the workbench named chatqna by following these steps:

    1. Once the chatqna workbench is in running status, open it by clicking its name in Workbenches tab.

      The workbench will open up in a new tab. When the workbench is opened for the first time, you will be shown an Authorize Access page.

    2. Click the Allow selected permissions button on the Authorize Access page.

    Clone the repo

    Now that the workbench is created and running, follow these steps to set up the project:

    1. In the open workbench, click the Terminal icon in the Launcher tab.
    2. Run the following command in the terminal to clone the repository containing code to upload the model:

      git clone https://github.com/validatedpatterns-sandbox/qna-chat-amd.git

    Run Jupyter notebook

    Use the notebook mentioned in this section to download the meta-llama/Llama-3.1-8B-Instruct model and upload it to either MinIO or AWS S3. Follow these steps to run the notebook:

    1. After cloning the repository, select the folder where you cloned the repository (in the sidebar) and open the scripts/model-setup/upload-model.ipynb Jupyter notebook.
    2. Run this notebook by running each cell one by one. When prompted for the Hugging Face token, provide your Read Access token and click the Login button, as shown in Figure 5.
    Figure 5: Red Hat OpenShift AI - Run Jupyter notebook - Hugging Face token prompt
    Figure 5: Red Hat OpenShift AI - Run Jupyter notebook - Hugging Face token prompt
    Figure 5: The Hugging Face token prompt.

    Once all the cells in the notebook complete successfully, the Llama model should have been uploaded to either MinIO or AWS S3 under Llama-3.1-8B-Instruct directory in models bucket.

    By default, this notebook will upload the model to MinIO. To choose AWS S3, modify the last cell in the notebook by changing the value of XFER_LOCATION to AWS as follows:

    XFER_LOCATION = 'MINIO'  # <= current value
    XFER_LOCATION = 'AWS'    # <= modify to "AWS" to upload to AWS S3

    Deploy the model

    Once the initial notebook has run successfully and the model uploaded, you can deploy the model by following these steps:

    1. In the chatqna-llm data science project, select Models tab and click the Deploy model button and fill in the blank fields as follows:

      • Model name: llama-31b
      • SServing runtime: vLLM AMD GPU ServingRuntime for KServe
      • Model framework: vLLM
      • Deployment mode: Advanced
      • Model server size: Small
      • Accelerator: AMD
      • Model route: Enable the Make deployed models available through an external route checkbox.
      • Source Model location: Select the Existing connection option.
        • Name: model-store (This is the name we used when we created the connection in the Create Connection step.)
        • Path: Llama-3.1-8B-Instruct (This is where the model copied to in the previous step.)
    2. Click Deploy to deploy the model. 

    3. Once the model is successfully deployed, copy the inference endpoint to use it in the ChatQnA application (it will take a few minutes to deploy the model).

    Make sure the model name is set to "llama-31b" as this is the value used in the deployment of llm microservice that invokes the inference endpoint.

    Deploy the ChatQnA application

    This section provides details on installing the ChatQnA application as well as verifying the deployment and configuration by querying the application.

    Install ChatQnA application

    After meeting all the prerequisites, we can install the ChatQnA application by following these steps in a terminal:

    1. Clone the repository by running the following commands:

      git clone https://github.com/validatedpatterns-sandbox/qna-chat-amd.git
      cd qna-chat-amd
    2. Configure secrets for Hugging Face and the inference endpoint by running the following command:

      cp values-secret.yaml.template ~/values-secret-qna-chat-amd.yaml
    3. Modify the value field in the ~/values-secret-qna-chat-amd.yaml file as shown here:

      secrets:
        - name: huggingface
          fields:
          - name: token
            value: null  <- CHANGE THIS TO YOUR HUGGING_FACE TOKEN
            vaultPolicy: validatePatternDefaultPolicy
        - name: rhoai_model
          fields:
          - name: inference_endpoint
            value: null  <- CHANGE THIS TO YOUR MODEL'S INFERENCE ENDPOINT
    1. Deploy the application by running the following command:

      ./pattern.sh make install

      This command will install the application by deploying the ChatQnA megaservice along with the following required microservices:

      • Dataprep
      • LLM text generation
      • Retriever
      • Hugging Face Text Embedding Inference
        • Embedding service
        • Reranker service
      • ChatQnA backend
      • ChatQnA UI

    The processes for the build and installation of all the required services can take some time to complete. To monitor progress via the Argo CD application dashboard, follow these steps:

    1. Open the Argo CD dashboard in a browser using the URI returned by running the following command:

      echo https://$(oc get route hub-gitops-server -n qna-chat-amd-hub -o jsonpath="{.spec.host}")
    2. Get the password by running the following command:

      echo $(oc get secret hub-gitops-cluster -n qna-chat-amd-hub -o jsonpath="{.data['admin\.password']}" | base64 -d)
    3. Log in to the Argo CD dashboard using the following information:
      • Username: admin
      • Password: password from the previous step

    Verify the ChatQnA application

    After deploying the application, and it is running successfully, we can connect to the UI and query the application by following these steps:

    1. Run the following command to get the ChatQnA UI URI:

      echo https://$(oc get route chatqna-ui-secure -n amd-llm -o jsonpath="{.spec.host}")
    2. Open the ChatQnA UI in a browser by using the URI returned from this command.

    Query ChatQnA without RAG

    Type the following query in the prompt: "What is the revenue of Nike inc in 2023?"

    Since we have not yet provided an external knowledge base regarding this query to the application, it does not return the correct answer to this query and instead returns a generic response shown in Figure 6.

    Figure 6: ChatQnA UI - response without RAG
    Figure 6: ChatQnA UI - response without RAG
    Figure 6: This shows the ChatQnA UI response without RAG.

    Query ChatQnA in RAG mode

    In the ChatQnA UI, follow these steps to add an external knowledge base (a Nike PDF) to perform the above query using RAG:

    1. Click the upload icon (cloud with an arrow) in the top right corner.
    2. Click the Choose File button and select nke-10k-2023.pdf from the scripts directory (Figure 7). 

      When you select the PDF and close the dialog box, the upload will start automatically.

    Figure 8: ChatQnA UI - Upload File
    Figure 7: ChatQnA upload file.
    1. Allow a few minutes for the file to be ingested, processed, and uploaded to the Redis vector database.
    2. Refresh the page after a few minutes to verify the file has been uploaded.
    3. Type the following query at the prompt: "What is the revenue of Nike inc in 2023?"

    The response for this query now makes use of the Nike knowledge base added in previous step, as shown in Figure 8.

    Figure 9: ChatQnA UI - RAG response
    Figure 8: ChatQnA UI RAG response.

    ChatQnA: Remove external knowledge base

    Follow the steps in this section to remove the external knowledge base that was added to the app:

    1. Click the upload icon in the top right corner.
    2. Move your cursor on top of the file in the Data Source section and click the trashcan icon that pops up in the top right corner of the file icon, as shown in Figure 9.
    3. Select Yes, I'm sure when prompted in Confirm file deletion? dialog box.
    Figure 10: ChatQnA UI - delete knowledge base
    Figure 9: ChatQnA UI delete knowledge base.

    Query ChatQnA: General questions

    When the knowledge base is not added to the application, you can also query the application to ask general questions (Figure 10). For example, you could ask:

    • Tell me more about Red Hat.
    • What services does Red Hat provide?
    • What is deep learning?
    • What is a neural network?
    Figure 11: ChatQnA UI - response for general questions
    Figure 10: ChatQnA UI response for general questions.

    Wrap up

    In this article, we deployed Open Platform for Enterprise AI’s ChatQnA megaservice in Red Hat OpenShift Container Platform using Red Hat OpenShift AI and AMD hardware acceleration. The ChatQnA application makes use of OPEA’s microservices to return RAG responses using an external knowledge base (in this case, the Nike PDF) as well as invokes Llama LLM when there is no external knowledge base present.

    Installing and setting up the application was made easy with the use of a validated pattern that in turn uses Argo CD for the continuous integration/continuous delivery (CI/CD) pipeline to deploy various components of the application as well as to keep them in sync with the Git repository in case of any config changes.

    Learn more about the various technologies used in this article:

    • Config repository
    • Red Hat OpenShift AI
    • Red Hat OpenShift Container Platform
    • Validated patterns
    • Open Platform for Enterprise AI (OPEA)
      • Various microservices
      • ChatQnA application
    • Hugging Face Llama-3.1-8B-Instruct model
    • AMD Instinct

    Related Posts

    • Retrieval-augmented generation with Llama Stack and Node.js

    • Retrieval-augmented generation with Node.js, Podman AI Lab & React

    • Model training in Red Hat OpenShift AI

    • Red Hat OpenShift AI and machine learning operations

    Recent Posts

    • Cloud bursting with confidential containers on OpenShift

    • Reach native speed with MacOS llama.cpp container inference

    • A deep dive into Apache Kafka's KRaft protocol

    • Staying ahead of artificial intelligence threats

    • Strengthen privacy and security with encrypted DNS in RHEL

    What’s up next?

    Learn how to use Red Hat OpenShift AI to quickly develop, train, and deploy machine learning models. This hands-on guide walks you through setting up a Jupyter notebook environment and running sample code in a JupyterLab Integrated Development Environment (IDE) in the Developer Sandbox.

    Start the activity
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue