Deliver generative AI at scale with NVIDIA NIM on OpenShift AI

Earlier this year, Red Hat announced plans to integrate support for NVIDIA NIM microservices on Red Hat OpenShift AI to help streamline inferencing for dozens of AI/ML models on a consistent, flexible hybrid cloud platform. NVIDIA NIM, part of the NVIDIA AI Enterprise software platform, is a set of easy-to-use inference microservices for accelerating the deployment of foundation models and keeping your data secured.

Combined with Red Hat OpenShift AI, NVIDIA NIM can help developers speed up the development and deployment of AI and generative AI (gen AI) applications. Now, NVIDIA NIM is available in technology preview on Red Hat OpenShift AI, so users can manually enable NVIDIA NIM and NVIDIA GPUs as a model serving accelerator on premises or in the public cloud where OpenShift is running.

In this how-to article, we will demonstrate how to integrate NVIDIA NIM with Red Hat OpenShift AI to create and deliver AI-enabled applications at scale.

Enable NVIDIA NIM

Note

NVIDIA NIM is available as a technology preview feature in OpenShift AI 2.14. To enable it in your OdhDashboardConfig instance, set disableNIMModelServing to false. See configuration documentation here.

First, go to the NVIDIA NGC catalog to generate an API key. From the top right profile menu, select the setup option and click to generate your API key, as shown in Figure 1.

Figure 1: Generate the API key to use the NVIDIA NGC catalog.

In your Red Hat OpenShift AI dashboard, select the explore option from the left navigation bar and locate the NVIDIA NIM tile. See Figure 2.

Annotated screenshot showing how to locate NVIDIA NIM app in Red Hat OpenShift AI: under Applications in the left menu, click Explore, then select the NVIDIA NIM tile. — Figure 2: Add the NVIDIA NIM app to your OpenShift AI instance.

Next, click the NVIDIA NIM tile, click Enable, input the API key that you generated in Figure 1 from the NVIDIA NGC catalog, and enable NVIDIA NIM. See Figure 3.

Note

Enabling NVIDIA NIM requires being logged in to OpenShift AI as a user with OpenShift AI administrator privileges.

Click the NVIDIA NIM tile, click Enable, and input the API key that you generated in the previous step from the NVIDIA NGC catalog to enable NVIDIA NIM. — Figure 3: Enable NVIDIA NIM.

Verify the enablement by selecting the Enabled option from the left navigation bar, as marked in Figure 4. Note the NVIDIA NIM card as one of your apps.

Under Applications, click Enabled, and locate the enabled NVIDIA NIM app. — Figure 4: Verify NVIDIA NIM enablement.

Create and deploy the model

Create a Data Science Project. Data science projects allow you to collect your work—including Jupyter workbenches, storage, data connections, models, and servers—into a single project.

From the left navigation bar, select Data Science Projects and click to create a project. Enter a project and description name, then click Create, as shown in Figure 5.

Once the project is created, you can use the Overview tab to specify NIM as a model serving microservice (this can also be accomplished from the Models tab). Click the NVIDIA NIM model serving platform card, as shown in Figure 6.

Click the NVIDIA NIM model serving platform card to start the model deployment. — Figure 6: Deploy a NIM model from the Overview tab.

Select the target NIM model from the drop-down list and describe your inference service. In the following example, we’ve selected Llama-3.1-8B with InstructLab. You will also have the option to enter the number of server replicas to deploy, the model size (small, medium…), and the type and number of NVIDIA accelerator card(s) that you have access to in your cluster. See Figure 7.

Configure properties for deploying your model using an NVIDIA NIM. — Figure 7: Describe the NIM model deployment.

Verify the deployment as a deployed model within this Data Science Projects screen, as shown in Figure 8. Note it might take a couple of minutes to appear.

In the Data Science Projects screen, the deployment appears in the Serve models section under the Deployed models. — Figure 8: NIM model successfully deployed.

Take note of the internal service URL link within the NIM deployed model card, highlighted in Figure 9; you will need this in the next section.

The NIM model internal service link is shown in the lower left corner of the OpenShift AI dashboard. — Figure 9: Click the Internal Service link to display the URL.

Configure and create a workbench

Now that the model is deployed and we have our internal link, let’s create a workbench. A workbench is an instance of your development environment. In it, you can select a notebook image, such as Jupyter, for your data science work.

From the same Data Science Projects Overview tab (or the Workbenches tab), click Create a workbench, as shown in Figure 10.

From the Data Science Projects Overview tab, click the button labeled Create a workbench. — Figure 10: Click to create a workbench.

Complete the fields shown in Figure 11 to describe your workbench, including the notebook image you want to use, cluster deployment size, accelerator and other fields, then click Create workbench.

Configuring workbench details: the notebook image you want to use, cluster deployment size, accelerator. The Create workbench button is marked below the fields. — Figure 11: Configure properties for your workbench.

Wait for the workbench to be in a running state. When the status displays Running, click Open (see Figure 12).

Opening a running workbench in Red Hat OpenShift AI. — Figure 12: Open running workbench.

From the opened workbench, clone a repo, as shown in Figure 13. Use the repo https://github.com/RHEcosystemAppEng/notebooks.

Cloning a repo into a workbench in Red Hat OpenShift AI. — Figure 13: Workbench cloning of a repo containing example Notebook.

Verify model access

Open the example notebook (Figure 14): notebooks/openshift_ai/nvidia_nim_access_from_workbench.ipynb

Follow the instructions to verify access to the running NIM model, and have fun!

Opening a notebook in a workbench in Red Hat OpenShift AI. — Figure 14: Open example notebook in workbench.

Example notebook walkthrough

Use the internal service URL you had noted in Figure 8 for the nim_url parameter.

import json
import requests
import urllib3
urllib3.disable_warnings()
nim_url = "https://nim-deploy.nim-openshift-ai.svc.cluster.local"

Copy snippet

Fetch a list of the available models, note the model described in Figure 7, meta/llama3-8b-instruct.

response = requests.get(nim_url + "/v1/models", verify=False)
print(json.dumps(response.json()['data'], indent=2))

Copy snippet

Example response:

[
  {
    "id": "meta/llama3-8b-instruct",
    "object": "model",
    "created": 1729805092,
    "owned_by": "system",
    "root": "meta/llama3-8b-instruct",
    "parent": null,
    "permission": [
      {
        "id": "modelperm-937bf9a2ec6d475da69a8f04752b32b7",
        "object": "model_permission",
        "created": 1729805092,
        "allow_create_engine": false,
        "allow_sampling": true,
        "allow_logprobs": true,
        "allow_search_indices": false,
        "allow_view": true,
        "allow_fine_tuning": false,
        "organization": "*",
        "group": null,
        "is_blocking": false
      }
    ]
  }
]

Copy snippet

Send a chat request to the target model, use the model described in Figure 7, meta/llama3-8b-instruct.

headers = {
    "Content-Type": "application/json"
}
payload = {
    "model": "meta/llama3-8b-instruct",
    "messages": [
        {
            "role": "user",
            "content": "What is Red Hat OpenShift AI?"
        },
        {
            "role": "user",
            "content": "What is NVIDIA NIM?"
        }
    ],
    "temperature": 0.5,
    "top_p": 1,
    "max_tokens": 1024,
    "stream": False
}
response = requests.post(nim_url + "/v1/chat/completions", verify=False, headers=headers, json=payload)
print(response.json()['choices'][0]['message']['content'])

Copy snippet

Example response:

I think I can help you with that!
NVIDIA NIM is a cloud-based platform that enables data scientists and developers to easily deploy, manage, and scale AI and machine learning (ML) workloads across multiple cloud environments. NIM provides a unified platform for building, testing, and deploying AI and ML models, and it integrates with various NVIDIA technologies, such as NVIDIA GPU acceleration, NVIDIA Tensor Core processors, and NVIDIA DGX-1 systems.
NIM offers a range of features, including:
1. Model development and testing: NIM provides a cloud-based environment for data scientists to develop, test, and fine-tune their AI and ML models using popular frameworks like TensorFlow, PyTorch, and Caffe.
2. Model deployment: NIM allows developers to deploy their trained models to various cloud environments, such as Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP), and on-premises data centers.
3. Model management: NIM provides a centralized platform for managing and monitoring AI and ML models, including version control, model tracking, and performance monitoring.
4. Collaboration: NIM enables collaboration among data scientists, developers, and other stakeholders through features like model sharing, version control, and real-time feedback.
By using NIM, organizations can accelerate their AI and ML projects, improve collaboration, and reduce the complexity of deploying and managing AI workloads across multiple environments.
Now, regarding Red Hat OpenShift AI, it's a cloud-native platform that enables organizations to deploy and manage AI and ML workloads on-premises or in the cloud. OpenShift AI is built on top of Red Hat OpenShift, a popular container orchestration platform, and provides a scalable and secure environment for building, deploying, and managing AI and ML applications.
OpenShift AI integrates with various AI and ML frameworks, such as TensorFlow, PyTorch, and scikit-learn, and provides features like:
1. Containerized AI and ML workloads: OpenShift AI enables data scientists and developers to package their AI and ML workloads into containers, making it easier to deploy, manage, and scale them.
2. Scalable infrastructure: OpenShift AI provides a scalable infrastructure that can handle large-scale AI and ML workloads, with support for NVIDIA GPU acceleration and other specialized hardware.
3. Security and compliance: OpenShift AI provides a secure environment for AI and ML workloads, with features like encryption, access controls, and compliance with industry regulations.
4. Collaboration: OpenShift AI enables collaboration among data scientists, developers, and other stakeholders through features like version control, model tracking, and real-time feedback.
By using OpenShift AI, organizations can accelerate their AI and ML projects, improve collaboration, and reduce the complexity of deploying and managing AI workloads.

Copy snippet

Get started with NVIDIA NIM on OpenShift AI

We hope you found this short tutorial instructive!

NVIDIA NIM integration on Red Hat OpenShift AI is now available in technology preview. With this integration, enterprises can increase productivity by implementing gen AI to address real business use cases like expanding customer service with virtual assistants, case summarization for IT tickets, and accelerating business operations with domain-specific copilots.

Get started today with NVIDIA NIM on Red Hat OpenShift AI. You can also find more information on the OpenShift AI product page.

Last updated: January 13, 2025

Deliver generative AI at scale with NVIDIA NIM on OpenShift AI

Share:

Enable NVIDIA NIM

Create and deploy the model

Configure and create a workbench

Verify model access

Example notebook walkthrough

Get started with NVIDIA NIM on OpenShift AI

Products

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue