Earlier this year, Red Hat announced plans to integrate support for NVIDIA NIM microservices on Red Hat OpenShift AI to help streamline inferencing for dozens of AI/ML models on a consistent, flexible hybrid cloud platform. NVIDIA NIM, part of the NVIDIA AI Enterprise software platform, is a set of easy-to-use inference microservices for accelerating the deployment of foundation models and keeping your data secured.
Combined with Red Hat OpenShift AI, NVIDIA NIM can help developers speed up the development and deployment of AI and generative AI (gen AI) applications. Now, NVIDIA NIM is available in technology preview on Red Hat OpenShift AI, so users can manually enable NVIDIA NIM and NVIDIA GPUs as a model serving accelerator on premises or in the public cloud where OpenShift is running.
In this how-to article, we will demonstrate how to integrate NVIDIA NIM with Red Hat OpenShift AI to create and deliver AI-enabled applications at scale.
Enable NVIDIA NIM
Note
NVIDIA NIM is available as a technology preview feature in OpenShift AI 2.14. To enable it in your OdhDashboardConfig instance, set disableNIMModelServing to false. See configuration documentation here.
First, go to the NVIDIA NGC catalog to generate an API key. From the top right profile menu, select the setup option and click to generate your API key, as shown in Figure 1.
In your Red Hat OpenShift AI dashboard, select the explore option from the left navigation bar and locate the NVIDIA NIM tile. See Figure 2.
Next, click the NVIDIA NIM tile, click Enable, input the API key that you generated in Figure 1 from the NVIDIA NGC catalog, and enable NVIDIA NIM. See Figure 3.
Verify the enablement by selecting the Enabled option from the left navigation bar, as marked in Figure 4. Note the NVIDIA NIM card as one of your apps.
Create and deploy the model
Create a Data Science Project. Data science projects allow you to collect your work—including Jupyter workbenches, storage, data connections, models, and servers—into a single project.
From the left navigation bar, select Data Science Projects and click to create a project. Enter a project and description name, then click Create, as shown in Figure 5.
Once the project is created, you can use the Overview tab to specify NIM as a model serving microservice (this can also be accomplished from the Models tab). Click the NVIDIA NIM model serving platform card, as shown in Figure 6.
Select the target NIM model from the drop-down list and describe your inference service. In the following example, we’ve selected Llama-3.1-8B with InstructLab. You will also have the option to enter the number of server replicas to deploy, the model size (small, medium…), and the type and number of NVIDIA accelerator card(s) that you have access to in your cluster. See Figure 7.
Verify the deployment as a deployed model within this Data Science Projects screen, as shown in Figure 8. Note it might take a couple of minutes to appear.
Take note of the internal service URL link within the NIM deployed model card, highlighted in Figure 9; you will need this in the next section.
Configure and create a workbench
Now that the model is deployed and we have our internal link, let’s create a workbench. A workbench is an instance of your development environment. In it, you can select a notebook image, such as Jupyter, for your data science work.
From the same Data Science Projects Overview tab (or the Workbenches tab), click Create a workbench, as shown in Figure 10.
Complete the fields shown in Figure 11 to describe your workbench, including the notebook image you want to use, cluster deployment size, accelerator and other fields, then click Create workbench.
Wait for the workbench to be in a running state. When the status displays Running, click Open (see Figure 12).
From the opened workbench, clone a repo, as shown in Figure 13. Use the repo https://github.com/RHEcosystemAppEng/notebooks
.
Verify model access
Open the example notebook (Figure 14): notebooks/openshift_ai/nvidia_nim_access_from_workbench.ipynb
Follow the instructions to verify access to the running NIM model, and have fun!
Example notebook walkthrough
Use the internal service URL you had noted in Figure 8 for the nim_url
parameter.
import json
import requests
import urllib3
urllib3.disable_warnings()
nim_url = "https://nim-deploy.nim-openshift-ai.svc.cluster.local"
Fetch a list of the available models, note the model described in Figure 7, meta/llama3-8b-instruct
.
response = requests.get(nim_url + "/v1/models", verify=False)
print(json.dumps(response.json()['data'], indent=2))
Example response:
[
{
"id": "meta/llama3-8b-instruct",
"object": "model",
"created": 1729805092,
"owned_by": "system",
"root": "meta/llama3-8b-instruct",
"parent": null,
"permission": [
{
"id": "modelperm-937bf9a2ec6d475da69a8f04752b32b7",
"object": "model_permission",
"created": 1729805092,
"allow_create_engine": false,
"allow_sampling": true,
"allow_logprobs": true,
"allow_search_indices": false,
"allow_view": true,
"allow_fine_tuning": false,
"organization": "*",
"group": null,
"is_blocking": false
}
]
}
]
Send a chat request to the target model, use the model described in Figure 7, meta/llama3-8b-instruct
.
headers = {
"Content-Type": "application/json"
}
payload = {
"model": "meta/llama3-8b-instruct",
"messages": [
{
"role": "user",
"content": "What is Red Hat OpenShift AI?"
},
{
"role": "user",
"content": "What is NVIDIA NIM?"
}
],
"temperature": 0.5,
"top_p": 1,
"max_tokens": 1024,
"stream": False
}
response = requests.post(nim_url + "/v1/chat/completions", verify=False, headers=headers, json=payload)
print(response.json()['choices'][0]['message']['content'])
Example response:
I think I can help you with that!
NVIDIA NIM is a cloud-based platform that enables data scientists and developers to easily deploy, manage, and scale AI and machine learning (ML) workloads across multiple cloud environments. NIM provides a unified platform for building, testing, and deploying AI and ML models, and it integrates with various NVIDIA technologies, such as NVIDIA GPU acceleration, NVIDIA Tensor Core processors, and NVIDIA DGX-1 systems.
NIM offers a range of features, including:
1. Model development and testing: NIM provides a cloud-based environment for data scientists to develop, test, and fine-tune their AI and ML models using popular frameworks like TensorFlow, PyTorch, and Caffe.
2. Model deployment: NIM allows developers to deploy their trained models to various cloud environments, such as Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP), and on-premises data centers.
3. Model management: NIM provides a centralized platform for managing and monitoring AI and ML models, including version control, model tracking, and performance monitoring.
4. Collaboration: NIM enables collaboration among data scientists, developers, and other stakeholders through features like model sharing, version control, and real-time feedback.
By using NIM, organizations can accelerate their AI and ML projects, improve collaboration, and reduce the complexity of deploying and managing AI workloads across multiple environments.
Now, regarding Red Hat OpenShift AI, it's a cloud-native platform that enables organizations to deploy and manage AI and ML workloads on-premises or in the cloud. OpenShift AI is built on top of Red Hat OpenShift, a popular container orchestration platform, and provides a scalable and secure environment for building, deploying, and managing AI and ML applications.
OpenShift AI integrates with various AI and ML frameworks, such as TensorFlow, PyTorch, and scikit-learn, and provides features like:
1. Containerized AI and ML workloads: OpenShift AI enables data scientists and developers to package their AI and ML workloads into containers, making it easier to deploy, manage, and scale them.
2. Scalable infrastructure: OpenShift AI provides a scalable infrastructure that can handle large-scale AI and ML workloads, with support for NVIDIA GPU acceleration and other specialized hardware.
3. Security and compliance: OpenShift AI provides a secure environment for AI and ML workloads, with features like encryption, access controls, and compliance with industry regulations.
4. Collaboration: OpenShift AI enables collaboration among data scientists, developers, and other stakeholders through features like version control, model tracking, and real-time feedback.
By using OpenShift AI, organizations can accelerate their AI and ML projects, improve collaboration, and reduce the complexity of deploying and managing AI workloads.
Get started with NVIDIA NIM on OpenShift AI
We hope you found this short tutorial instructive!
NVIDIA NIM integration on Red Hat OpenShift AI is now available in technology preview. With this integration, enterprises can increase productivity by implementing gen AI to address real business use cases like expanding customer service with virtual assistants, case summarization for IT tickets, and accelerating business operations with domain-specific copilots.
Get started today with NVIDIA NIM on Red Hat OpenShift AI. You can also find more information on the OpenShift AI product page.