Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • See all Red Hat products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Red Hat OpenShift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • See all technologies
    • Programming languages & frameworks

      • Java
      • Python
      • JavaScript
    • System design & architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer experience

      • Productivity
      • Tools
      • GitOps
    • Automated data processing

      • AI/ML
      • Data science
      • Apache Kafka on Kubernetes
    • Platform engineering

      • DevOps
      • DevSecOps
      • Red Hat Ansible Automation Platform for applications and services
    • Secure development & architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & cloud native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • See all learning resources

    E-books

    • GitOps cookbook
    • Podman in action
    • Kubernetes operators
    • The path to GitOps
    • See all e-books

    Cheat sheets

    • Linux commands
    • Bash commands
    • Git
    • systemd commands
    • See all cheat sheets

    Documentation

    • Product documentation
    • API catalog
    • Legacy documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore the Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Configure NVIDIA Blackwell GPUs for Red Hat AI workloads

Driving AI acceleration with the NVIDIA RTX PRO 4500 Blackwell Server Edition on Red Hat AI Enterprise

March 16, 2026
Erwan Gallen Tarun Kumar Antonin Stefanutti Selbi Nuryyeva Michey Mehta
Related topics:
Artificial intelligenceContainersEdge computing
Related products:
Red Hat AI Inference ServerRed Hat AIRed Hat Enterprise Linux AIRed Hat OpenShift AI

    The NVIDIA RTX PRO 4500 Blackwell Server Edition brings GPU acceleration to the world's most widely adopted enterprise data center and edge computing platforms. It offers a significant performance increase compared to traditional CPU-only servers. For Red Hat customers, this server edition provides compact acceleration across the Red Hat AI portfolio, including Red Hat AI Inference Server, Red Hat Enterprise Linux AI, and Red Hat AI Enterprise. This gives organizations a practical path to build, optimize, deploy, and scale AI workloads across enterprise datacenter and edge environments.

    Optimized for Red Hat AI

    The NVIDIA RTX PRO 4500 Blackwell Server Edition is a reliable choice for compact, power-efficient AI deployments. It provides inference performance without adding unnecessary operational complexity. For Red Hat AI users, it offers a practical mix of memory capacity, performance, and efficiency for running modern models in enterprise datacenter and edge environments.

    This hardware also stands out as a compelling successor to the NVIDIA L4 for this type of deployment. With more memory, greater performance headroom, and support for low-precision inference, organizations can better tune model size, throughput, latency, and overall deployment efficiency to match workload requirements.

    Quantization provides much of that value. 8-bit integer (INT8) is a widely adopted option for inference, while 4-bit integer (INT4) helps fit larger models into more constrained memory footprints. FP8 has also become increasingly important for modern accelerator-based deployments. Blackwell supports NVFP4, giving Red Hat AI users flexibility for advanced model optimization and inference.

    NVIDIA RTX PRO Servers with RTX PRO 4500 Blackwell Server Edition are also featured as part of the updated NVIDIA Enterprise AI Factory validated design and the NVIDIA AI Data Platform, a customizable reference design for building modern storage systems for enterprise agentic AI.

    Configure the RTX PRO 4500 Blackwell Server Edition on Red Hat AI Enterprise

    To use the RTX PRO 4500 Blackwell Server Edition in Red Hat OpenShift, install the Node Feature Discovery and the NVIDIA GPU Operator (Figure 1).

    Software Catalog in Red Hat OpenShift showing a search for NVIDIA GPU Operator with one result provided by NVIDIA Corporation.
    Figure 1: Search for and select the NVIDIA GPU Operator from the OpenShift Software Catalog.

    Set these parameters in the NVIDIA GPU Operator installation UI:

    1. Set the NVIDIA GPU Operator ClusterPolicy to version 580.126.16 (version 595 will be the officially supported NVIDIA driver release). Enter this value in the driver version field to deploy the required driver image tag across the cluster.
    2. Enter nvcr.io/nvidia in the repository field of the ClusterPolicy so the operator pulls the container from the correct registry.
    3. Enter driver in the image field of the ClusterPolicy to reference the correct driver container image.
    4. Set kernelModuleType to open in the NVIDIA GPU Operator ClusterPolicy to use open GPU kernel modules during installation.

    You can also edit with the cluster policy and add these parameters:

    $ oc edit clusterpolicy
    driver:
       version: 580.126.16
       image: driver
       repository: nvcr.io/nvidia
       kernelModuleType: open

    Once installed, you can use the RTX PRO 4500 Blackwell Server Edition with OpenShift (Figure 2).

    Red Hat OpenShift terminal showing nvidia-smi output for two NVIDIA RTX PRO 4500 Blackwell GPUs with no running processes found.
    Figure 2: Use the terminal to verify the installation of the NVIDIA GPUs using the nvidia-smi command.

    Running nvidia-smi from the NVIDIA driver daemonset in the OpenShift web console confirms that both NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs are detected correctly

    Verify the hardware

    This validation environment uses Red Hat OpenShift 4.20.15.

    $ oc get clusterversion
    NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
    version   4.20.15   True        False         24m     Cluster version is 4.20.15

    The deployment uses a single-node Red Hat OpenShift cluster running Kubernetes 1.33.6.

    $ oc get nodes
    NAME                         STATUS   ROLES                         AGE     VERSION
    redhat-validation-02-gpu01   Ready    control-plane,master,worker   6h24m   v1.33.6
    Node Feature Discovery Operator installation modal in the Red Hat OpenShift Software Catalog highlighting the install button and software components.
    Figure 3: Use the Node Feature Discovery Operator to manage hardware-specific labeling within your cluster.

    After you install the Node Feature Discovery Operator (Figure 3), the node identifies as hosting an NVIDIA PCI device with Single Root I/O Virtualization (SR-IOV) capabilities.

    $ oc describe node/redhat-validation-02-gpu01 | grep pci-10de
                        feature.node.kubernetes.io/pci-10de.present=true
                        feature.node.kubernetes.io/pci-10de.sriov.capable=true

    The NVIDIA GPU Operator deploys into the nvidia-gpu-operator project.

    $ oc project nvidia-gpu-operator
    Now using project "nvidia-gpu-operator" on server "https://api.launchpad.nvidia.com:6443".

    During installation, the NVIDIA GPU Operator starts components in sequence. These include the driver daemonset, container toolkit, device plug-in, NVIDIA Data Center GPU Manager (DCGM), GPU Feature Discovery, node status exporter, and operator validator.

    $ oc get pods
    NAME                                           READY   STATUS     RESTARTS   AGE
    gpu-feature-discovery-sftmv                    0/1     Init:0/1   0          2m21s
    gpu-operator-595d9f95cf-rv2jr                  1/1     Running    0          13m
    nvidia-container-toolkit-daemonset-5h99p       1/1     Running    0          2m21s
    nvidia-dcgm-exporter-6trh8                     0/1     Init:0/2   0          2m21s
    nvidia-dcgm-r5gsn                              0/1     Init:0/1   0          2m21s
    nvidia-device-plugin-daemonset-j7s74           0/1     Init:0/1   0          2m21s
    nvidia-driver-daemonset-9.6.20250925-0-cdrcf   2/2     Running    0          2m28s
    nvidia-node-status-exporter-5wflx              1/1     Running    0          2m27s
    nvidia-operator-validator-vbwlr                0/1     Init:0/4   0          2m21s

    Once the installation completes, verify that the NVIDIA GPU Operator components are operational. These include the driver daemonset, MIG Manager, and the node status exporter.

    $ oc get pods
    NAME                                           READY   STATUS      RESTARTS      AGE
    gpu-feature-discovery-sftmv                    1/1     Running     0             3m16s
    gpu-operator-595d9f95cf-rv2jr                  1/1     Running     0             14m
    nvidia-container-toolkit-daemonset-5h99p       1/1     Running     0             3m16s
    nvidia-cuda-validator-pv4mv                    0/1     Completed   0             42s
    nvidia-dcgm-exporter-6trh8                     1/1     Running     2 (22s ago)   3m16s
    nvidia-dcgm-r5gsn                              1/1     Running     0             3m16s
    nvidia-device-plugin-daemonset-j7s74           1/1     Running     0             3m16s
    nvidia-driver-daemonset-9.6.20250925-0-cdrcf   2/2     Running     0             3m23s
    nvidia-mig-manager-w5ncg                       1/1     Running     0             23s
    nvidia-node-status-exporter-5wflx              1/1     Running     0             3m22s
    nvidia-operator-validator-vbwlr                1/1     Running     0             3m16s

    Running nvidia-smi confirms that OpenShift exposes the NVIDIA RTX PRO 4500 Blackwell Server Edition. The output shows driver version 580.126.16 and CUDA 13.0, with the GPUs idle and ready for workload validation.

    $ oc exec -it nvidia-driver-daemonset-9.6.20250925-0-cdrcf   -- nvidia-smi
    Tue Mar 10 20:46:45 2026       
    +-----------------------------------------------------------------------------------------+
    | NVIDIA-SMI 580.126.16             Driver Version: 580.126.16     CUDA Version: 13.0     |
    +-----------------------------------------+------------------------+----------------------+
    | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
    |                                         |                        |               MIG M. |
    |=========================================+========================+======================|
    |   0  NVIDIA RTX PRO 4500 Blac...    On  |   00000000:17:00.0 Off |                    0 |
    | N/A   33C    P8             16W /  165W |       0MiB /  32623MiB |      0%      Default |
    |                                         |                        |             Disabled |
    +-----------------------------------------+------------------------+----------------------+
    |   1  NVIDIA RTX PRO 4500 Blac...    On  |   00000000:63:00.0 Off |                    0 |
    | N/A   34C    P8             17W /  165W |       0MiB /  32623MiB |      0%      Default |
    |                                         |                        |             Disabled |
    +-----------------------------------------+------------------------+----------------------+
    +-----------------------------------------------------------------------------------------+
    | Processes:                                                                              |
    |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
    |        ID   ID                                                               Usage      |
    |=========================================================================================|
    |  No running processes found                                                             |
    +-----------------------------------------------------------------------------------------+

    Verify the full GPU names with the following command:

    $ oc exec -it nvidia-driver-daemonset-9.6.20250925-0-cdrcf -- \
      nvidia-smi --query-gpu=name --format=csv
    name
    NVIDIA RTX PRO 4500 Blackwell Server Edition
    NVIDIA RTX PRO 4500 Blackwell Server Edition

    At idle, the NVIDIA RTX PRO 4500 Blackwell Server Edition reports temperatures of 32–33°C and a power draw of approximately 17 W against a 165 W power limit.

    $ oc exec -it nvidia-driver-daemonset-9.6.20250925-0-cdrcf -- \
      nvidia-smi --query-gpu=index,name,temperature.gpu,power.draw,power.limit,fan.speed --format=csv
    index, name, temperature.gpu, power.draw [W], power.limit [W], fan.speed [%]
    0, NVIDIA RTX PRO 4500 Blackwell Server Edition, 32, 16.74 W, 165.00 W, [N/A]
    1, NVIDIA RTX PRO 4500 Blackwell Server Edition, 33, 17.40 W, 165.00 W, [N/A]

    Each GPU exposes 32 GB of memory:

    $ oc exec -it nvidia-driver-daemonset-9.6.20250925-0-cdrcf -- \
      nvidia-smi --query-gpu=index,name,utilization.gpu,utilization.memory,memory.total,memory.used,memory.free --format=csv
    index, name, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.used [MiB], memory.free [MiB]
    0, NVIDIA RTX PRO 4500 Blackwell Server Edition, 0 %, 0 %, 32623 MiB, 0 MiB, 32128 MiB
    1, NVIDIA RTX PRO 4500 Blackwell Server Edition, 0 %, 0 %, 32623 MiB, 0 MiB, 32128 MiB

    At idle, the graphics and streaming multiprocessor (SM) clocks run at 180 MHz, with memory clocks at 405 MHz.

    $ oc exec -it nvidia-driver-daemonset-9.6.20250925-0-cdrcf -- \
      nvidia-smi --query-gpu=index,name,clocks.current.graphics,clocks.current.sm,clocks.current.memory --format=csv
    index, name, clocks.current.graphics [MHz], clocks.current.sm [MHz], clocks.current.memory [MHz]
    0, NVIDIA RTX PRO 4500 Blackwell Server Edition, 180 MHz, 180 MHz, 405 MHz
    1, NVIDIA RTX PRO 4500 Blackwell Server Edition, 180 MHz, 180 MHz, 405 MHz

    Topology reporting shows that the GPUs and Mellanox NICs are attached within the same platform fabric, with both GPUs sharing NUMA affinity and standard PCIe-based connectivity.

    $ oc exec -it nvidia-driver-daemonset-9.6.20250925-0-cdrcf -- \
      nvidia-smi topo -m
            GPU0    GPU1    NIC0    NIC1    NIC2    NIC3    CPU Affinity    NUMA Affinity   GPU NUMA ID
    GPU0     X      NODE    NODE    SYS     SYS     SYS     0-31,64-95      0               N/A
    GPU1    NODE     X      NODE    SYS     SYS     SYS     0-31,64-95      0               N/A
    NIC0    NODE    NODE     X      SYS     SYS     SYS
    NIC1    SYS     SYS     SYS      X      PIX     NODE
    NIC2    SYS     SYS     SYS     PIX      X      NODE
    NIC3    SYS     SYS     SYS     NODE    NODE     X 
    Legend:
      X    = Self
      SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
      NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
      PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
      PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
      PIX  = Connection traversing at most a single PCIe bridge
      NV#  = Connection traversing a bonded set of # NVLinks
    NIC Legend:
      NIC0: mlx5_0
      NIC1: mlx5_1
      NIC2: mlx5_2
      NIC3: mlx5_3

    MIG is disabled, compute mode remains in the default setting, and both persistence mode and ECC are enabled.

    nvidia@redhat-validation-02-bastion:~$ oc exec -it nvidia-driver-daemonset-9.6.20250925-0-cdrcf -- \
      nvidia-smi --query-gpu=index,mig.mode.current,compute_mode,persistence_mode,ecc.mode.current --format=csv
    index, mig.mode.current, compute_mode, persistence_mode, ecc.mode.current
    0, Disabled, Default, Enabled, Enabled
    1, Disabled, Default, Enabled, Enabled

    Run Red Hat AI inference

    Use the registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.3.0-1771898916 container image to run Red Hat AI Inference Server 3.3.

    The Red Hat AI inference CUDA image supports NVIDIA's NVFP4 quantization format on RTX PRO 4500 Blackwell-based GPUs. This allows for efficient, low-cost large-model inference with vLLM. NVFP4 is a 4-bit floating-point format introduced with the NVIDIA Blackwell architecture that uses hardware acceleration.

    We have reliably deployed NVFP4-quantized models. Using Red Hat AI Inference Server 3.3, results for completions, tool calling, reasoning, and accuracy are consistent with original full-precision models. Tests confirm good accuracy RedHatAI/Qwen3-30B-A3B-NVFP4 (TP1) and RedHatAI/Llama-3.3-70B-Instruct-NVFP4 (TP2).

    Model nameCompletionsChat  completionTool callingAccuracy
    RedHatAI/Qwen3-30B-A3B-NVFP4YesYesYes80%
    RedHatAI/Llama-3.3-70B-Instruct-NVFP4YesYesYes93%

    The following is a sample deployment that serves the model using Red Hat AI Inference Server. An init container downloads the model weights from Hugging Face, and the main container launches vLLM with tensor parallelism across two GPUs with tool-calling support enabled.

    Create the necessary resources, such as Hugging Face secret for authentication (needed for gated model) and a persistent volume for caching the model weights, and then apply the deployment:

    # Create the HF token secret
    oc create secret generic hf-token-secret \
      --from-literal=HUGGING_FACE_TOKEN=<your-token>
    # Create a PVC for model caching
    oc apply -f - <<EOF
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: model-cache
    spec:
      accessModes: [ReadWriteOnce]
      resources:
        requests:
          storage: 100Gi
    EOF
    # Create model deployment with basic confguration
    oc apply -f - <<EOF
    kind: Deployment
    apiVersion: apps/v1
    metadata:
      name: llm-deploy-929
      namespace: test-rhaiis
      labels:
        app: rhaiis-runner
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: rhaiis-runner
      template:
        metadata:
          name: rhaiis-runner
          labels:
            app: rhaiis-runner
        spec:
          restartPolicy: Always
          initContainers:
            - name: download
              command:
                - /bin/bash
                - '-c'
              env:
                - name: HF_HUB_OFFLINE
                  value: '0'
                - name: HF_HOME
                  value: /mnt/model
                - name: HUGGING_FACE_HUB_TOKEN
                  valueFrom:
                    secretKeyRef:
                      name: hf-token-secret
                      key: HUGGING_FACE_TOKEN
              volumeMounts:
                - name: cache-volume
                  mountPath: /mnt/model
              terminationMessagePolicy: File
              image: registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.3.0-1771898916
              args:
                - huggingface-cli download RedHatAI/Qwen3-30B-A3B-NVFP4
          imagePullSecrets:
            - name: quay-secrets
          containers:
            - resources:
                limits:
                  cpu: '16'
                  memory: 30Gi
                  nvidia.com/gpu: '1'
                requests:
                  cpu: 10m
                  memory: 29Gi
                  nvidia.com/gpu: '1'
              name: rhaiis
              command:
                - /bin/bash
                - '-c'
              env:
                - name: HF_HUB_OFFLINE
                  value: '0'
                - name: HF_HOME
                  value: /mnt/model
                - name: HUGGING_FACE_HUB_TOKEN
                  valueFrom:
                    secretKeyRef:
                      name: hf-token-secret
                      key: HUGGING_FACE_TOKEN
              ports:
                - containerPort: 8000
                  protocol: TCP
              volumeMounts:
                - name: cache-volume
                  mountPath: /mnt/model
                - name: dshm
                  mountPath: /dev/shm
              image: registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.3.0-1771898916
              args:
                - vllm serve RedHatAI/Qwen3-30B-A3B-NVFP4 --uvicorn-log-level debug --trust-remote-code --enable-chunked-prefill --tensor-parallel-size 1 --max-model-len 10000
          volumes:
            - name: cache-volume
              persistentVolumeClaim:
                claimName: model-cache
            - name: dshm
              emptyDir:
                medium: Memory
    EOF

    Use the following commands and outputs to validate model completions, chat performance, and accuracy benchmarks.

    1. Completion (POST /v1/completions)
      curl -s -X POST http://localhost:9000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "RedHatAI/Qwen3-30B-A3B-NVFP4",
    "prompt": "The capital of France is",
    "max_tokens": 32,
    "temperature": 0.0
    }' | jq -r '.choices[0].text'
    " Paris. The capital of the United Kingdom is London. The capital of the United States is Washington, D.C. The capital of Germany is Berlin. The capital",
    2.  Chat Completion - single turn:
     curl -X POST http://localhost:9000/v1/chat/completions -H Content-Type: application/json -d {
            "model": "RedHatAI/Qwen3-30B-A3B-NVFP4",
            "messages": [{"role": "user", "content": "What is the capital of France? Answer in one sentence."}],
            "max_tokens": 64,
            "temperature": 0.0
        }
    HTTP STATUS: 200
    3.  Accuracy(gsm8k):
    local-completions ({'model': 'RedHatAI/Qwen3-30B-A3B-NVFP4', 'base_url': 'http://localhost:9000/v1/completions', 'num_concurrent': 100, 'tokenized_requests': False}), gen_kwargs: ({'max_gen_toks': 4048}), limit: None, num_fewshot: None, batch_size: 16
    |Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
    |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
    |gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9067|±  |0.0080|
    |     |       |strict-match    |     5|exact_match|↑  |0.9052|±  |0.0081|

    Performance validation

    After confirming accuracy with NVFP4 quantized models, we validated performance characteristics using the GuideLLM benchmarking tool. The tests measured throughput and latency across five NVFP4 models deployed with Red Hat AI Inference Server 3.3 on the RTX PRO 4500 Blackwell Server Edition GPUs. See the full list of NVFP4 quantized models from Red Hat.

    Test configuration

    The validation used a standardized workload profile with 1,000 input tokens and 1,000 output tokens per request. We tested multiple concurrency levels to identify throughput limits and latency behavior under load. Each concurrency level ran for 2-4 minutes to ensure stable measurements.

    All deployments used a dual-replica configuration with tensor parallelism set to 1 (TP=1), meaning each replica ran on a single GPU.

    Performance results

    The following table shows peak throughput and peak SLO-compliant throughput for each model. Peak SLO-compliant concurrency is the highest level where P99 Time to First Token (TTFT) is at or below 3,000 ms and P99 Inter-Token Latency (ITL) is at or below 80 ms.

    Performance summary for the 2x NVIDIA RTX PRO 4500 Blackwell Server Edition. Throughput values represent output tokens per second. SLO-compliant concurrency is the maximum concurrent requests while maintaining SLO (P99 TTFT ≤ 3s, P99 ITL ≤ 80ms).
    ModelSizePeak throughput (tok/s)Peak concurrencyPeak SLO-compliant throughput (tok/s)Peak SLO-compliant concurrencyP99 TTFT (ms)P99 ITL (ms)
    Llama-3.1-8B8B3,5152252,8471002,64531
    Qwen3-8B8B2,9662252,4211002,53132
    Qwen3-14B14B2,2251501,339502,71933
    Mistral-Small-3.2-24B24B1,625170688302,13734
    Qwen3-32B32B66650333202,07643

    Key findings:

    • The 8B models demonstrate linear throughput scaling up to 100 concurrent requests and maintain sub-3 second P99 response times.
    • The 14B model provides a balance between capability and performance, supporting up to 50 concurrent requests within the prescribed SLO.
    • The 24B and larger models are best suited for lower-concurrency workloads where model capability is prioritized over throughput.

    The scaling behavior for these models across concurrent requests is shown in Figure 4, and the comparison of peak versus SLO-compliant throughput is shown in Figure 5.

    Line chart comparing LLM output throughput across concurrent requests, showing Llama-3.1-8B achieving the highest peak performance.
    Figure 4: Output throughput versus concurrency comparison for various models, highlighting peak and SLO-compliant operating points. Blue stars mark peak throughput for each model. Green stars mark the peak SLO-compliant operating point (P99 TTFT ≤3s, P99 ITL ≤80ms).
    Bar chart comparing peak and SLO-compliant throughput across five LLMs, with Llama-3.1-8B achieving the highest performance.
    Figure 5: Comparison of peak versus SLO-compliant throughput for Llama, Qwen, and Mistral models. SLO-compliant throughput represents the maximum sustainable performance while meeting strict latency SLOs.

    Conclusion

    The NVFP4 quantized models running on dual RTX PRO 4500 Blackwell Server Edition GPUs deliver high-speed inference performance across various model sizes. This platform demonstrates that 4-bit NVFP4 quantization, combined with modern GPU architecture and optimized inference engines, delivers reliable AI inference at scale.

    Red Hat OpenShift AI

    With the accelerator environment already prepared and validated, the next step is to add Red Hat OpenShift AI so teams can start using those resources for model serving, inference, and other AI workflows at scale. This is the point where the validated hardware configuration becomes available through the OpenShift AI experience and can be used by data scientists, developers, and platform teams.

    Install Red Hat OpenShift AI from the Software Catalog using the stable channel stable-3.x and version 3.3.0. Once installed, the platform can make use of the available accelerator resources for AI workloads (Figure 6).

    OpenShift AI installation dialog in the Software Catalog showing selections for the stable-3.x channel and version 3.3.0.
    Figure 6: The Red Hat OpenShift AI operator installation interface within the Software Catalog.

    To make the NVIDIA RTX PRO 4500 Blackwell Server Edition available as a reusable accelerator option in Red Hat OpenShift AI, we created a dedicated hardware profile. In OpenShift AI, hardware profiles define the resource configuration that users can select for workbenches and other AI workloads, combining CPU, memory, and accelerator settings into a single reusable profile.

    For this configuration, we created a profile named NVIDIA RTX PRO 4500 Blackwell Server Edition and associated it with the accelerator resource identifier nvidia.com/gpu. We then defined the default and allowed resource ranges for CPU, memory, and GPU allocation. In this example, the profile was configured with a default of 2 CPU cores, 16 GiB of memory, and 1 GPU, with support for scaling to 8 CPU cores, 32 GiB of memory, and 2 GPUs as required (Figure 7).

    Hardware profiles table in OpenShift AI showing resource settings for a profile with CPU, memory, and GPU identifiers.
    Figure 7: The Hardware profiles interface displaying configured node resource limits for CPU, memory, and NVIDIA accelerators.

    After the profile is updated, it is listed as an enabled hardware profile in OpenShift AI and can be used as a standard accelerator-backed configuration for supported workloads (Figure 8).

    Hardware profiles list in Red Hat OpenShift AI showing the newly created NVIDIA RTX PRO 4500 Blackwell Server Edition profile as enabled.
    Figure 8: The enabled hardware profile is now available in the Red Hat OpenShift AI dashboard for workload allocation.

    For example, we created a distributed training job using Kubeflow Trainer to fine-tune a large language model (LLM) on Red Hat OpenShift AI using two NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs. Figure 9 illustrates the training configuration and metrics during the distributed model fine-tuning process directly from a Jupyter notebook using TensorBoard.

    Four line graphs in TensorBoard showing increasing training epoch and decreasing gradient norm, learning rate, and loss over time.
    Figure 9: Distributed model training job metrics using Kubeflow Trainer with the RTX PRO 4500 Blackwell Server Edition.

    Figure 10 displays the OpenShift web console observability dashboard, which allows you to monitor the GPU metrics in real time and shows the high utilization of the two NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs during the fine-tuning job.

    OpenShift console Metrics dashboard displaying a stacked area chart tracking real-time utilization peaks for two NVIDIA GPU instances.
    Figure 10: OpenShift AI GPU metrics.

    Summary and next steps

    The NVIDIA RTX PRO 4500 Blackwell Server Edition provides a clear upgrade path for teams moving beyond the NVIDIA L4. By using the NVFP4 format on Red Hat OpenShift, you can maximize inference efficiency while maintaining a compact hardware footprint. Use the configuration steps in this guide to begin validating Blackwell-class workloads in your environment.

    Learn more about the NVIDIA RTX PRO 4500 Blackwell Server Edition GPU and view the technical specifications.

    Related Posts

    • Estimate GPU memory for LLM fine-tuning with Red Hat AI

    • How to enable NVIDIA GPU acceleration in OpenShift Local

    • Profiling vLLM Inference Server with GPU acceleration on RHEL

    • Network performance in distributed training: Maximizing GPU utilization on OpenShift

    • Optimize GPU utilization with Kueue and KEDA

    • Boost GPU efficiency in Kubernetes with NVIDIA Multi-Instance GPU

    Recent Posts

    • Configure NVIDIA Blackwell GPUs for Red Hat AI workloads

    • Unlocking UBI to Red Hat Enterprise Linux container images

    • What's new in Red Hat Developer Hub 1.9?

    • Zero trust GitOps: Build a secure, secretless GitOps pipeline

    • How to manage Red Hat OpenShift AI dependencies with Kustomize and Argo CD

    What’s up next?

    Learning Path Red Hat AI

    Get started with consuming GPU-hosted large language models on Developer Sandbox

    Learn the many ways you can interact with GPU-hosted large language models...
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue