Page

Run a WebUI chat application against three GPU-hosted models

March 6, 2026

Ian Lawson

Now that you have interacted with the models, it's time to deploy a fully functional chatbot that talks to all the models at once.

Prerequisites:

In this lesson, you will:

Deploy a Helm chart.
Run a WebUI chatbot application.
Connect to three GPU-hosted models.
Interact with a web application to send a prompt to all the models simultaneously.

Deploy a Helm chart

This step requires us to deploy a Helm chart to the Developer Sandbox. By default, the terminal provided within the Red Hat OpenShift user interface (UI) does not have Helm preinstalled.

Log back into the OpenShift user interface, open the Terminal (using the >_ icon in the top right), and enter the following:

curl -Lo /wto/bin/helm https://mirror.openshift.com/pub/openshift-v4/clients/helm/latest/helm-linux-amd64

chmod +x /wto/bin/helm

helm

This will install Helm, and, with the last command, print the help message for Helm.

Once you have Helm running, we are going to git-clone a repository from the Red Hat AI business unit that contains a quickstart Helm chart for getting the WebUI application running. We will need to configure the values file.

In the terminal, type the following:

cd
git clone https://github.com/rh-aiservices-bu/vllm-quickstart
cd vllm-quickstart/charts/openwebui

This subdirectory contains the Helm chart and the values.yaml it uses. We will be editing the values.yaml to add the three inference servers and their models to the chatbot, as well as load the appropriate certificate authority (CA) from the system itself.

Use an editor, such as the user-friendly nano or vi if you're old-school, edit the values.yaml file. Scroll down to where it reads:

configuration:
  env:
    ENABLE_SIGNUP: "false"
    WEBUI_AUTH: "false"
  extraArgs: []
  vllmEndpoints:
    - endpoint: http://vllm.example.svc:8000/v1
      token: ""
  data:

We will replace the vllmEndpoints: entry with the following, but you need to add your OpenShift token.

Open an additional terminal tab by clicking on the + in the terminal portion of the interface. Then type the following command:

oc whoami -t

Copy the output token, then remove the default endpoint from the values.yaml file and replace it with the following. Remember to replace INSERT OC WHOAMI -t with your token from above.

  vllmEndpoints:
    - endpoint: https://isvc-granite-31-8b-fp8-predictor.sandbox-shared-models.svc.cluster.local:8443/v1
      token: "INSERT OC WHOAMI -t"
    - endpoint: https://isvc-qwen3-8b-fp8-predictor.sandbox-shared-models.svc.cluster.local:8443/v1
      token: "INSERT OC WHOAMI -t"
    - endpoint: https://isvc-nemotron-nano-9b-v2-fp8-predictor.sandbox-shared-models.svc.cluster.local:8443/v1
      token: "INSERT OC WHOAMI -t"

Notice there are no linebreaks between the endpoint: and the URL. This is due to the rendering of the textbox.

Before you save this file in the terminal, scroll down to the injectServiceCA line. Set it to true:

injectServiceCA: true

Save the file. Now you can instantiate the Helm chart with the following:

helm upgrade --install --namespace [YOUR_USER_HERE]-dev openwebui ./ -f values.yaml

In the OpenShift UI navigation, click on Helm/Releases. It should look similar to Figure 1:

Figure 1: A correctly installed Helm release.

Now switch to the Topology (Workloads > Topology). You will see a running instance of the WebUI chatbot. Click on the route icon shown in Figure 2.

When you click on the icon, it will open a tab and ask you to log on using OpenShift, the inject service CA we set to true. Once you click on this, the WebUI should render.

Remove the welcome message, Note that it has, by default, selected the Granite model.

Click on the plus button next to it to add a model. Choose the qwen3 one.

Then click on the plus button again and select the nemotron model.

All three models should be selected as shown in Figure 3.

Figure 3: Add additional models to the Chat application.

In the prompt box, type anything you want. As a suggestion, try this:

in 20 words tell me about the roman empire

The system will then query all three GPU-hosted models and return information (Figure 4).

Figure 4: Query all models at the same time.

Next steps

And there you go! You have successfully connected to and queried the three GPU-hosted models provided by the Red Hat Developer Sandbox.

Ready to learn more about AI?

Now you have the basic mechanisms to directly query and consume the hosted LLMs. You can now write test applications on the Red Hat Developer Sandbox that use this information.

For more information on writing applications and developing on the Sandbox, read the free eBook, The Grumpy Developer’s Guide to OpenShift.

Get started with consuming GPU-hosted large language models on Developer Sandbox

Run a WebUI chat application against three GPU-hosted models

Prerequisites:

In this lesson, you will:

Deploy a Helm chart

Next steps

Ready to learn more about AI?

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Breadcrumb

Get started with consuming GPU-hosted large language models on Developer Sandbox

Path resource: Run a WebUI chat application against three GPU-hosted models

Prerequisites:

In this lesson, you will:

Deploy a Helm chart

Next steps

Ready to learn more about AI?

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links