Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Sensitive information detection using the NVIDIA Morpheus AI framework

October 18, 2021
Vitaliy Emporopulo Costin Gament
Related topics:
Artificial intelligenceDevSecOpsKafkaSecurity
Related products:
Red Hat OpenShiftRed Hat Enterprise Linux

Share:

    The growth of cloud-native applications has driven an explosion of east-west network traffic within a datacenter where applications can create hundreds of thousands of network connections among virtual machines and containers. As a consequence, the ability to track, monitor, and secure a datacenter in a timely manner has risen above that of any individual or team, thus requiring the help of AI and machine learning (AI/ML) to enable ITOps, infrastructure security, and DevSecOps teams to manage the complexity of modern cloud-native applications and the underlying platforms.

    Red Hat and NVIDIA have been working together to bring the security analytics capabilities of the NVIDIA Morpheus AI application framework to Red Hat infrastructure platforms for cybersecurity developers. This article provides a set of configuration instructions to Red Hat developers working on applications that use the NVIDIA Morpheus AI application framework and NVIDIA BlueField data processing units (DPUs) to secure interservice communication.

    Prerequisites

    • Free Red Hat Developer account
    • Active Amazon Web Services (AWS) account
    • NVIDIA Morpheus Early Access program (EAP) account
    • A server with NVIDIA BlueField-2 DPU

    Architecture overview

    This architecture consists of several software and hardware components.

    NVIDIA's Morpheus AI Cybersecurity Framework is at the heart of the system. The software runs on Red Hat OpenShift and uses AI/ML to continuously inspect network and server telemetry at scale. As a quick refresher, the NVIDIA Morpheus AI application framework is built on the RAPIDS libraries, deep learning frameworks, and NVIDIA Triton Inference Server. It can run on site or in the cloud, and simplifies the analysis of logs and telemetry to help detect and mitigate security threats so that developers can create and deploy AI-powered security solutions more quickly. The framework helps clean, filter, and pre-process the telemetry data before sending it to the NVIDIA Triton server for inference. The framework also post-processes inference results before returning them to a client application such as a monitoring dashboard.

    The telemetry is gathered by the NVIDIA NetQ Agent running on the NVIDIA BlueField-2 DPU installed on the local server and running Red Hat Enterprise Linux. The agent captures network packets from the DPU on the server and sends them to the NVIDIA Morpheus AI framework for inspection.

    An Apache Kafka broker is used for streaming telemetry data from data sources to the NVIDIA Morpheus AI engine and sending inspection output from the engine, via a Kafka output topic, to a client application (e.g., monitoring dashboard).

    Note: This guide does not cover a client application for consuming inspection output. Instead, we will be directly monitoring the Kafka output topic.

    The particular case of collecting telemetry using the NVIDIA NetQ Agent requires the NVIDIA Endpoint Gateway (also part of the NVIDIA Morpheus AI framework) that allows sending telemetry samples to a gRPC endpoint and forwarding them to the input topic of the Apache Kafka broker.

    The architecture is depicted at a high level in Figure 1.

    Diagram of the NVIDIA Morpheus AI architecture.
    Figure 1: The high level architecture.

    It is possible to deploy NVIDIA Morpheus AI with Red Hat OpenShift on both physical and cloud infrastructures. However, to simplify the access for a wide variety of developers, we are using AWS infrastructure to host the NVIDIA Morpheus AI software and selecting an EC2 instance type that includes NVIDIA GPU resources. Additionally, we are going to assume that the developer has access to a physical server running Red Hat Enterprise Linux that has a NVIDIA BlueField-2 data DPU installed.

    Installing and running NVIDIA Morpheus AI on Red Hat OpenShift

    Now that you have all the prerequisites and requirements, let's get going! In the next few steps, you will:

    • Install a Red Hat OpenShift cluster on AWS, including configuration of an AWS account, connecting the installation program to AWS, and customizing installation files.
    • Install the NVIDIA GPU Operator on the OpenShift cluster. You will also obtain a cluster entitlement and install the Node Feature Discovery (NFD) Operator as part of this procedure.
    • Install the NVIDIA Morpheus AI Engine with an Apache Kafka broker and Endpoint Gateway, and deploy the sensitive information detection (SID) ML model.
    • Configure required Kafka topics.
    • Configure and install the NVIDIA Morpheus SDK CLI.
    • Install Red Hat Enterprise Linux on the NVIDIA BlueField-2 DPU on a local server.
    • Install and configure NVIDIA NetQ Agent on the DPU.
    • Simulate a Web server that receives sensitive data over HTTP, and observe how the NVIDIA Morpheus AI Framework performs sensitive information detection.

    Installing Red Hat OpenShift cluster

    First, we are going to install an OpenShift cluster on AWS using a preselected EC2 instance type with appropriate NVIDIA GPU resources.

    1. To prepare for installing Red Hat OpenShift on AWS, follow the Preparing to install on AWS and Configuring an AWS account guides.
    2. Proceed to Installing a cluster on AWS with customizations. In step 2 of Creating the installation configuration file, modify the worker instance type and replicas count in the generated install-config.yaml as follows:
      compute:
      - architecture: amd64
        hyperthreading: Enabled
        name: worker
        platform: 
          aws:
            type: g4dn.xlarge
        replicas: 1
      controlPlane:
    3. Deploy the cluster.
    4. Note the location of kubeconfig once the cluster deployment has completed. Example:
      INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/home/iris/iris-20210825/auth/kubeconfig'
    1. Install the OpenShift CLI client.
    2. Verify that the cluster is up and running. Run:
      export KUBECONFIG=<path to your kubeconfig file> 

      Then, run:

      oc get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.8.10    True        False         20m     Cluster version is 4.8.10
    1. Verify that the cluster has a single worker:
      oc get node --selector=node-role.kubernetes.io/worker
      NAME                                           STATUS   ROLES    AGE   VERSION
      ip-10-0-133-217.eu-central-1.compute.internal  Ready    worker   42m   v1.21.1+9807387

    Installing NVIDIA GPU operator

    The NVIDIA GPU Operator automates the management of all NVIDIA software components needed to provision GPUs within OpenShift or Kubernetes. Follow the NVIDIA Cloud Native Technologies documentation to install the NVIDIA GPU Operator and its prerequisites on the Red Hat OpenShift cluster, and verify the installation.

    Deploying the NVIDIA Morpheus AI framework

    As described in the Architecture overview section, NVIDIA's Morpheus AI Framework has multiple components that need to be set up and configured. The NVIDIA Morpheus AI Engine is installed using a Helm chart that contains NVIDIA Triton Inference Server, a Kafka broker, and an Endpoint Gateway.

    Note: In this example, the cluster must have a single GPU-enabled worker node for the NVIDIA Morpheus AI Engine to function properly. Also, both the NFD Operator and NVIDIA GPU Operator must already be installed, according to instructions provided in the previous section.

    Here are the steps:

    1. Run:
      export KUBECONFIG=<path you copied in the cluster installation step>
    2. Assuming you have registered and have been selected to participate in the NVIDIA Morpheus Early Access program (EAP), generate an API key as explained in Generating Your NGC API Key. If you already have an API key, you can copy it from https://ngc.nvidia.com/setup.
    3. Run:
      export API_KEY=<your API key>
    4. Install Helm. At the time of writing, version 3.7.0 is available, but any 3.x version should work.
    5. Download the NVIDIA Morpheus AI Engine Helm chart:
      helm fetch https://helm.ngc.nvidia.com/ea-nvidia-morpheus/charts/morpheus-ai-engine-0.1.14.tgz --username='$oauthtoken' --password=$API_KEY --untar
    6. Create a morpheus-ai namespace:
      oc create namespace morpheus-ai
    7. Switch to the morpheus-ai project:
      oc project morpheus-ai
    8. Deploy the chart:
      helm install --set ngc.apiKey="$API_KEY",serviceAccount.create=true,platform.openshift=true,gateway.enabled=true,loadBalancer.enabled=true morpheus-ai-engine ./morpheus-ai-engine

    Note: Setting gateway.enabled=true tells Helm to include the Endpoint Gateway in the installation. As already mentioned, the gateway is needed in this example so that the NVIDIA NetQ Agent could send traffic samples to the NVIDIA Morpheus AI framework. In order for the gateway to have a publicly accessible DNS name on AWS, we specify loadBalancer.enabled=true in the preceding command. Without this argument, or with loadBalancer.enabled=false, the gateway will be exposed via a NodePort.

    1. Verify the deployment. The engine might take several minutes to start, so use the watch command:
      watch -n5 oc get pod
      
      NAME                                READY   STATUS    RESTARTS   AGE
      ai-engine-7c694bc64c-hgnxl          1/1     Running   0          62m
      broker-9f8bf48b8-ph9qs              1/1     Running   0          62m
      endpoint-gateway-5cdb9f8f9d-qshht   1/1     Running   0          62m
      zookeeper-54f5cc7d8c-frgsj          1/1     Running   0          62m

    Configuring Kafka topics

    Apache Kafka is installed as a part of the Helm chart that also installs the NVIDIA Morpheus AI framework and is used to send input data (telemetry) and receive output (inspection results).

    To use the Kafka broker with this use case, you need to create the input and output topics as follows:

    • netq_message_Packet_raw is read from by the NVIDIA Morpheus SDK CLI, and written to by the endpoint gateway.
    • morpheus-output is used by the NVIDIA Morpheus SDK CLI to stream post-processed inference output that contains the results of sensitive information detection.
    export KAFKA_POD=$(oc get pods -l=app.kubernetes.io/name=broker -o jsonpath='{.items[0].metadata.name}')
    
    oc exec $KAFKA_POD -- kafka-topics \
        --create \
        --bootstrap-server broker:9092 \
        --replication-factor 1 \
        --partitions 1 \
        --topic netq_message_Packet_raw
    oc exec $KAFKA_POD -- kafka-topics \
        --create \
        --bootstrap-server broker:9092 \
        --replication-factor 1 \
        --partitions 1 \
        --topic morpheus-output

    Verify that Kafka topics have been successfully created by running:

    oc exec $KAFKA_POD -- kafka-topics --list --bootstrap-server broker:9092
    morpheus-output
    netq_message_Packet_raw

    You can then monitor the output topic in a new terminal window:

    • export KUBECONFIG=<path to the kubeconfig>
    • export KAFKA_POD=$(oc get pods -l=app.kubernetes.io/name=broker -o jsonpath='{.items[0].metadata.name}')
    • oc exec $KAFKA_POD -- kafka-console-consumer -topic morpheus-output --bootstrap-server broker:9092 --from-beginning

    Deploying SID model

    The SID model is one of the pre-trained models that comes with the NVIDIA Morpheus AI framework. In this example, we need the model to detect sensitive data in packets captured by the NVIDIA NetQ Agent. The model must be loaded into the NVIDIA Triton Server and requires a GPU to run successfully. Here is how you can deploy the model:

    1. Install NGC CLI as described in NVIDIA NGC CLI Install.
    2. Run ngc config set and follow the prompt.
    3. Create a directory for ML models: mkdir -p models
    4. Run:
      ngc registry resource download-version "ea-nvidia-morpheus/sid_bert_triton_package:sid-minibert-20211002-t4"
    5. Then, run:
      unzip sid_bert_triton_package_vsid-minibert-20211002-t4/sid-minibert-trt-t4.zip -d models
    6. Verify that the target directory now contains the SID model: ls -R models
      models:
      sid-minibert-trt
      
      models/sid-minibert-trt:
      1  config.pbtxt
      
      models/sid-minibert-trt/1:
      README.md  sid-minibert-trt_b1-8_b1-16_b1-32.engine
    1. Copy the SID model to Triton's models repository:
      • MORPHEUS_POD=$(oc get pods -l=app.kubernetes.io/name=ai-engine -o jsonpath='{.items[0].metadata.name}')
      • oc rsync models/sid-minibert-trt $MORPHEUS_POD:/common/triton-model-repo
    2. List all available models:
      oc exec -ti $MORPHEUS_POD -- curl -f -X POST http://localhost:8000/v2/repository/index | jq
      
      [
        {
          "name": "sid-minibert-trt"
        }
      ]
    1. Load the SID model into Triton:
      oc exec -ti $MORPHEUS_POD -- curl -f -X POST http://localhost:8000/v2/repository/models/sid-minibert-trt/load
      
    2. Verify that the model has been successfully loaded:
      oc exec -ti $MORPHEUS_POD -- curl -f -X POST http://localhost:8000/v2/repository/index | jq
      [
        {
          "name": "sid-minibert-trt",
          "version": "1",
          "state": "READY"
        }
      ]

    Deploying NVIDIA Morpheus SDK CLI with SID pipeline

    The NVIDIA Morpheus SDK CLI pre-processes input data before sending it for inference to the NVIDIA Triton Inference Server and is installed using a separate Helm chart. The Helm chart is instructed to deploy the pipeline on the same OpenShift worker node as the NVIDIA Morpheus AI engine because they share GPU memory for better inference performance. One of the pipeline types supported by the SDK is Natural Language Processing (NLP). It is used here with the SID model.

    1. Download the Helm chart to install and configure NVIDIA Morpheus SDK CLI:
      helm fetch https://helm.ngc.nvidia.com/ea-nvidia-morpheus/charts/morpheus-sdk-client-0.1.8.tgz --username='$oauthtoken' --password=$API_KEY --untar
      
    2. Deploy the Helm chart:
      helm install --set ngc.apiKey="$API_KEY" --set sdk.args="morpheus --debug --log_level=DEBUG run --num_threads=8 --pipeline_batch_size=1024 --model_max_batch_size=32 pipeline-nlp --model_seq_length=256 from-kafka --input_topic netq_message_Packet_raw --bootstrap_servers broker:9092 buffer deserialize preprocess --vocab_hash_file=./data/bert-base-uncased-hash.txt --do_lower_case=True monitor --description='Preprocessing rate' buffer inf-triton --force_convert_inputs=True --model_name=sid-minibert-trt --server_url=ai-engine:8001 monitor --description='Inference rate' --smoothing=0.001 --unit inf add-class filter serialize --exclude '^ts_' to-kafka --output_topic morpheus-output --bootstrap_servers broker:9092",platform.openshift=true morpheus-sid-pipeline morpheus-sdk-client

    Let’s take a closer look at the arguments used in sdk.args. The pipeline runs NLP using the sid-minibert-trt model (--model_name). It reads input from thenetq_message_Packet_raw Kafka topic (--input_topic), and calls the NVIDIA Triton Inference Server at ai-engine:8001 (--server_url). Finally, it writes output into the morpheus-output Kafka topic (--output_topic).

    1. Wait until the pipeline pod has started:
      watch -n5 oc get pods -l=app.kubernetes.io/name=sdk-cli-morpheus-sid-pipeline
      NAME                                             READY   STATUS    RESTARTS   AGE
      sdk-cli-morpheus-sid-pipeline-69b49c654d-qrc9d   1/1     Running   0          9m20s
    1. Verify that there are no errors in the pipeline log:
      oc logs $(oc get pods -l=app.kubernetes.io/name=sdk-cli-morpheus-sid-pipeline -o jsonpath='{.items[0].metadata.name}')
      Configuring Pipeline via CLI
      ...
      Added stage: <to-kafka-11; WriteToKafkaStage(bootstrap_servers=broker:9092, output_topic=morpheus-output)>
        └─ List[str] -> List[str]
      ====Building Pipeline Complete!====
      ====Starting Pipeline====
      ...

    Setting up NVIDIA NetQ agent on NVIDIA BlueField-2 DPU

    Now we'll walk through the steps to set up the NVIDIA NetQ agent.

    Installing Red Hat Enterprise Linux on NVIDIA BlueField-2 DPU

    In our testing, we used a Dell EMC PowerEdge R740 server in the following configuration:

    • 40 core Intel Xeon Silver 4210 CPU at 2.20GHz
    • 92GB memory
    • NVIDIA BlueField-2 DPU (model MT42822)

    To install Red Hat Enterprise Linux on both the server and the DPU, the respective installation images must be obtained using a free Red Hat Developer subscription, as detailed in this guide.

    Note: The provisioning scripts have only been tested with Red Hat Enterprise Linux 8.4, so we recommend that you use this version.

    In order to start installing Red Hat Enterprise Linux on the Dell PowerEdge R740 server, follow this installation guide.

    To start installing Red Hat Enterprise Linux on the NVIDIA BlueField-2 card, run these commands:

    • git clone https://github.com/kwozyman/rhel-on-bf2
    • cd rhel-on-bf2
    • export RHEL_ISO=/path/to/redhat_iso_file
    • bash bluefield_provision.sh -a

    The preceding commands will, in order:

    • Prepare the host system and install dependencies.
    • Update BlueField-2 firmware.
    • Enable SR-IOV on the host.
    • Prepare and boot PXE.

    This README file contains important information on installing Red Hat Enterprise Linux on DPUs.

    After dependencies are installed, you will see this output for the firmware update:

    Complete!
    == STATUS === Performing firmware update
    -2021-05-20 10:22:18--  https://www.mellanox.com/downloads/BlueField/BlueField-3.5.1.11601/BlueField-3.5.1.11601_install.bfb Resolving www.mellanox.com (www.mellanox.com)... 151.101.2.133
    Connecting to www.mellanox.com (www.mellanox.com)|151.101.2.133|:443... connected. 
    HTTP request sent, awaiting response... 302 Found
    Location: https://content.mellanox.com/BlueField/BlueField-3.5.1.11601/BlueField-3.5.1.11601_install.bfb [following] -2021-05-20 10:22:19-- https://content.mellanox.com/BlueField/BlueField-3.5.1.11601/BlueField-3.5.1.11601_install.bfb
    Resolving content.mellanox.com (content.mellanox.com)... 107.178.241.102
    Connecting to content.mellanox.com (content.mellanox.com)...107.178.241.102:443... connected. 
    HTTP request sent, awaiting response... 200 OK
    Length: 528205736 (504M) [application/octet-stream]
    Saving to: ‘BlueField-3.5.1.11601_install.bfb’
     
    BlueField-3.5.1.11601_install.bf 100%[==========>] 503.74M 97.6MB/s in 6.9s
     
    2021-05-20 10:22:27 (72.9 MB/s) - ‘BlueField-3.5.1.11601_install.bfb' saved [528205736/528205736]
     
    == STATUS === Sending firmware to BF2. Please wait.
    

    You might see the following message for more than 10 seconds; stand by:

    Welcome to minicom 2.7.1
    OPTIONS: I18n
    Compiled on Aug 13 2018, 16:41:28.
    Port /dev/rshim0/console, 10:22:16
     
    Press CTRL-A Z for help on special keys
    

    Yocto boot will commence:

    Welcome to minicom 2.7.1
    OPTIONS: I18n Compiled on Aug 13 2018, 16:41:28.
    Port /dev/rshimO/console, 10:22:16
     
    Press CTRL-A Z for help on special keys
     
    [5.228955] mlxbf_gige MLNXBF17:00 oob_net0: renamed from eth0 [5.295481] virtio_net virtio1 tmfifo_netO: renamed from eth0
    [5.370656] mlx5_core 0000:03:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
    [5.388735] mlx5_core 0000:03:00.0: E-Switch: Total vports 18, per vport: max uc(1024) max mc(16384)
    [5.410144] mlx5_core 0000:03:00.0: Part module event: module 0, Cable plugged
    [5.424763] mlx5_core 0000:03:00.0: mlx5_pcie_event:286:(pid 7): PCIe slot power capability was not advertised.
    [5.431846] mlx5_core 0000:03:00.1: enabling device (0400 -> 0402)
    [5.457855] mlx5_core 0000:03:00.1: firmware version: 24.29.2002
    [5.469980] mlx5_core 0000:03:00.1: 252.048 Gb/s available PCIe bandwidth (16 GT/s x16 link)
    [5.736384] mlx5_core 0000:03:00.1: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
    [5.754472] mlx5_core 0000:03:00.1: E-Switch: Total vports 18, per vport: max uc(1024) max mc(16384)
    [5.776826] mlx5_core 0000:03:00.1: Port module event: module 1, Cable unplugged
    [5.791805] mlx5_core 0000:03:00.1: mlx5_pcie_event:2861(pid 266): PCIe slot power capability was not advertised.
    [5.799161] mlx5_core 0000:03:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
    [6.084276] mlx5_core 0000:03:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
    [7.074338] random: dd: uninitialized urandom read (512 bytes read)
    [7.104375] random: rngd: uninitialized urandom read (16 bytes read)
    [7.117171] random: rngd: uninitialized urandom read (4 bytes read)
    [9.211564] urandom_read: 1 callbacks suppressed
    [9.211568] random: rngd: uninitialized urandom read (2500 bytes read) 
    [9.234163] random: crng init done
    

    The firmware upgrade will take place automatically, and the PXE server setup will follow:

    ************************************************************                                                      ***                                                      ***
    ***         Platform firmware updates complete.          ***
    ***                                                      ***
    ************************************************************
     
    root@localhost:-# /lib/firmware/mellanox/mlxfwmanager_sriov_dis_aarch64_41686
    Querying Mellanox devices firmware ...
     
    Device #1:
    Device Type: BlueField2
    Part Number: MBF2H516A-CEEO_Ax_Bx
    Description: BlueField-2 DPU 100GbE Dual-Port QSFP56; PCIe Gen4 x16; Crypto; 16GB on-board DDR; 1GbE OOB management; FHHL
    PSID: MT 0000000702
    PCI Device Name: 0000:03:00.0
    Base GUID: b8cef603003dd0le
    Base MAC: b8cef63dd0le
    Versions: Current        Available
           FW 24.29.2002     24.29.2002
         NVMe 20.1.0003      20.1.0003
          PXE 3.6.0204       3.6.0204
         UEFI 14.22.0016     14.22.0016
     
    Status: Up to date
     
     
    === STATUS === Checking usability of SRIOV for PCI 0000:af:00.0
    SRIOV enabled
    EMBEDDED_CPU mode enabled
    === STATUS === Checking usability of SRIOV for PCI 0000:af:00.1
    SRIOV enabled
    EMBEDDED_CPU mode enabled
    === STATUS === Setting up PXE environment
    --2021-05-20 10:25:17--  http://download.eng.bos.redhat.com/released/rhel-8/RHEL-8/8.4.0-Beta-1/Base0S/aarch64/iso/RHEL-8.4.0-20210309.1-aarch64-dvdl.iso
    Resolving download.eng.bos.redhat.com (download.eng.bos.redhat.com)... 10.19.43.4
    Connecting to download.eng.bos.redhat.com (download.eng.bos.redhat.com)[10.19.43.4]:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 7368329216 (6.9G) [application/octet-stream] Saving to: '/tmp/RHEL-8.4.0-20210309.1-aarch64-dvdl.iso'
     
    .1-aarch64-dvdl.iso      40%[======            ]2.77G 77.8MB/s eta 61s

    Then BlueField-2 DPU will PXE boot from the network interface:

    Done
    Next step: PXE boot from target (make sure to select the correct port!)
    === INFO === The BF2 is about to be rebooted and minicom console
    === INFO === started. You must manually select the PXE boot device.
    === INFO === This can't be fully automated because the list of
    === INFO === options is not consistent.
    === INFO ===
    === INFO === ACTION: When you see the "Boot Option Menu" select the
    === INFO === option with following device path (and press enter):
    === INFO === MAC(001ACAFFFF01,0x1)/
    === INFO === IPv4(0.0.0.0)
    === INFO === In most cases, this is "EFI NETWORK 4". After that the
    === INFO === automation picks up again. Let it take over. The console
    === INFO === and reboot are slow. Have patience.
    === INFO ===
    === INFO === Press enter when you're read .
    

    Note: The device name can vary in your particular case, but it will follow the naming convention of EFI NETWORK X (i.e., EFI NETWORK 4).

    Press Enter and wait until you see the screen shown in Figure 2.

    Screenshot of the  NVIDIA BlueField-2 Boot Manager menu.
     Figure 2: NVIDIA BlueField-2 Boot Manager menu

    You should shortly see a successful PXE boot:

    »Start PXE over IPv4.
      Station IP address is 172.31.100.10
     
      Server IP address is 172.31.100.1
      NBP filename is /BOOTAA64.EFI
      NBP filesize is 857984 Bytes
     Downloading NBP file...
     
      NBP file downloaded successfully.
    Fetching Netboot Image
    
    === INFO === The RHEL install has been started. This is the end of the automation.
    === INFO === I will reattach the minicom console to see the install progress.
    === INFO === You can drop it anytime with key sequence: ctrl-a X
    === INFO ===
    === INFO === Press enter when you're ready.
    

    You can exit the script at this point or watch the installation take place by pressing Enter.

    There are two ways to get the DPU IP address:

    1. Log in via console from the host to the DPU:
      minicom --color on --baudrate 115200 --device /dev/rshim0/console

      Using user root and password defined in the installation kickstart.

      You can now simply use Linux standard tools, for example:

      ip a

      To exit the console, you can press Ctrl-A and then Q.

    2. Examine DHCPD lease files on the host:

      cat /var/lib/dhcpd/dhcpd.leases

      Be sure to look for the lease that is in binding state "active" and has the hardware address of 00:1a:ca:ff:ff:01.

    Configuring NVIDIA NetQ agent for packet mirroring

    Next, we will install an NVIDIA NetQ agent on the DPU, configure it to capture packets on a network interface, and mirror them to the Morpheus endpoint gateway.

    Open a terminal session to the DPU, either via SSH or minicom console.

    1. Install the ARM64 version of the NGC CLI as described in the documentation.
    2. Configure the CLI by running:
      ngc config set
    3. Download a RPM of the NVIDIA NetQ Agent for ARM64:
      ngc registry resource download-version "ea-nvidia-morpheus/morpheus_telemetry:netq4-rhel8"
    4. Install the RPM:
      rpm -i morpheus_telemetry_vnetq4-rhel8/netq-agent-4.0.0-rh8u34~1626785319.908c36d5.aarch64.rpm
    5. Create /etc/netq/pcap.yml using the name of your DPU's network interface, e.g., eth0:
      netq-pcap-config:
        interface: <network-interface> 
        filter: "tcp port 80"
    1. Find out the DNS name of the endpoint gateway:
      oc get svc netq-endpoint-gateway
      NAME                    TYPE           CLUSTER-IP       EXTERNAL-IP              PORT(S)           AGE
      netq-endpoint-gateway   LoadBalancer   172.30.254.220   a0...elb.amazonaws.com   31980:31357/TCP   128m
    1. Modify the /etc/netq/netq.yml configuration file to point to the endpoint gateway. For instance:
      netq-agent:
        port: 31980
        server: a0...elb.amazonaws.com
        vrf: default
        is-pcap-enabled: True
        suppress-sub-agents: all
    1. Start the agent:
      sudo systemctl start netq-agent
    2. Verify that the agent is running and can connect to the endpoint gateway:
      sudo systemctl status -l netq-agent
      Aug 26 05:54:10 agent.redhat.com netq-go-agent[160240]: INFO: Initializing Client Connection to gateway.example.com:31980
      Aug 26 05:54:10 agent.redhat.com netq-agent[160211]: 2021/08/26 05:54:10.550318 agent.redhat.com netq-go-agent[160240]: INFO: Reading Live Capture Started
      Aug 26 05:54:10 agent.redhat.com netq-go-agent[160240]: INFO: Reading Live Capture Started
      Aug 26 05:54:10 agent.redhat.com netq-agent[160211]: 2021/08/26 05:54:10.722425 agent.redhat.com netq-go-agent[160240]: INFO: Connection Successful to OPTA
      Aug 26 05:54:10 agent.redhat.com netq-go-agent[160240]: INFO: Connection Successful to OPTA
      Aug 26 05:54:10 agent.redhat.com netq-agent[160211]: Successfully Connected to gateway.example.com:319802021/08/26 05:54:10.722519 agent.redhat.com netq-go-agent[160240]: INFO: starting stream to gateway.example.com:31980

    Simulating an application service

    Let's simulate an application that runs on the DPU and exchanges sensitive data in JSON payloads over HTTP. For this purpose, we will use Python's SimpleHTTPServer to echo headers and body of the incoming HTTP requests to stdout.

    1. Open a new terminal session to the DPU because the server will run in the foreground.
    2. Change to your home directory: cd $HOME
    3. Create a file named echo-server.py with the following content:
      #!/usr/bin/env python3
      
      from http.server import BaseHTTPRequestHandler, HTTPServer
      import logging
      
      class PostHandler(BaseHTTPRequestHandler):
      
          def do_POST(self):
              self.send_response(200)
              self.send_header('Content-Type', 'application/json')
              self.end_headers()
              logging.info(f'-- Headers --:\n{self.headers}')
              content_len = int(self.headers.get('Content-Length'))
              logging.info(f'-- Body --:\n{self.rfile.read(content_len)}')
              return
      
      logging.basicConfig(level=logging.INFO)
      server = HTTPServer(('', 80), PostHandler)
      server.serve_forever()
    1. Make the file executable:
      sudo chmod u+x echo-server.py
    2. Open a new terminal window (the server runs in the foreground).
    3. Execute:
      sudo ./echo-server.py

    Detecting sensitive information through ML inference

    Finally, to test sensitive information detection, we will send a request that contains sensitive data using a cURL command (you can send the request from the host to the DPU if the DPU does not have an external IP address):

    curl -X POST http://agent.redhat.com -H 'Content-Type: application/json' --data-binary '{"user": "Teresa Bruce"}'

    See inference results on the output topic (the messages have been formatted for readability):

    {"timestamp": 1633448662, "host_ip": "agent.redhat.com", "data_len": 183, "data": "POST / HTTP/1.1Host: agent.redhat.comUser-Agent: curl/7.69.1Accept: */*Content-Type: application/jsonContent-Length: 24{\"user\": \"Teresa Bruce\"}", "src_mac": "14:58:d0:58:95:92", "dest_mac": "3c:94:d5:4f:d9:c1", "protocol": "TCP", "src_ip": "10.8.2.97", "dest_ip": "10.35.206.159", "src_port": 80, "dest_port": 39932, "flags": 32784, "si_address": false, "si_bank_acct": false, "si_credit_card": false, "si_email": false, "si_govt_id": false, "si_name": true, "si_password": false, "si_phone_num": false, "si_secret_keys": false, "si_user": false}

    As you can see in the preceding output, Teresa Bruce has been correctly identified as a person's name ("si_name": true). While this is a relatively simple example, the model is continuously refined to improve the accuracy of detection.

    Try sending other types of sensitive data to the server to see SID in action. For example:

    curl -X POST http://agent.redhat.com -H 'Content-Type: application/json' --data-binary '{"email": "tbruce@acme.com"}'
    curl -X POST http://agent.redhat.com -H 'Content-Type: application/json' --data-binary '{"phone": "+1-609-361-2775x8385"}'
    curl -X POST http://agent.redhat.com -H 'Content-Type: application/json' --data-binary '{"home": "9951 Miller Street Apt. 113, Laurenshire, IA 67656"}'

    And more than one sensitive item:

    curl -X POST http://agent.redhat.com -H 'Content-Type: application/json' --data-binary '{"user": "Teresa Bruce", "email": "tbruce@acme.com", "home": "9951 Miller Street Apt. 113, Laurenshire, IA 67656", "phone": "+1-609-361-2775x8385"}'

    Deployment cleanup

    Follow the instructions at Uninstalling a cluster on AWS to delete the cluster and all associated AWS resources.

    How to avoid common issues

    During such a complex installation and configuration process that involves multiple software and infrastructure components, there is a good chance that certain things might not work right away or at all. We've seen several such errors in our deployment and are providing them here as a courtesy. We hope that avoiding these issues will help you to get up and running sooner.

    • The Triton server might complain that the model requires a GPU while the node does not have one. This could be because the operator has not fully started yet. Just wait for the NVIDIA GPU Operator to complete initialization, then delete the ai-engine pod and let OpenShift recreate it.
      Poll failed for model directory 'sid-minibert-trt': instance group sid-minibert-trt_0 of model sid-minibert-trt has kind KIND_GPU but no GPUs are available
    • You might see a message that the pipeline cannot connect to Kafka. This could be because Kafka service is not up yet, the topics were not created, or their names were misspelled.
    • If there are no messages on the Kafka input topic, that is likely because the topic name was misspelled.
    • If the pipeline complains about the model not being available, the model might not have been loaded yet, or failed to load.
      Traceback (most recent call last):s]
        File "/opt/conda/envs/morpheus/lib/python3.8/threading.py", line 932, in _bootstrap_inner
          self.run()
        File "/opt/conda/envs/morpheus/lib/python3.8/threading.py", line 870, in run
          self._target(*self._args, **self._kwargs)
        File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/inference/inference_triton.py", line 615, in main_loop
          self.init(loop)
        File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/inference/inference_triton.py", line 459, in init
          assert self._triton_client.is_model_ready(self._model_name), \
      AssertionError: Triton model sid-minibert-trt is not ready
    • Another common issue is when the NVIDIA NetQ agent cannot connect to the endpoint gateway or the agent tries to initialize the client connection, but never reaches Connection Successful to OPTA. This could be happening because the gateway specified in /etc/netq/netq.yaml is unreachable (e.g., no routing to the gateway's host, the gateway is down, or the port is incorrect or blocked by the firewall).
      Aug 26 06:00:37 agent.redhat.com netq-go-agent[160446]: INFO: Initializing Client Connection to gateway.example.com:31980
      Aug 26 06:00:37 agent.redhat.com netq-agent[160415]: 2021/08/26 06:00:37.670319 agent.redhat.com netq-go-agent[160446]: INFO: Reading Live Capture Started
      Aug 26 06:00:37 agentredhat.com netq-go-agent[160446]: INFO: Reading Live Capture Started

    Conclusion

    Congratulations—if you are reading this, then you are likely to have successfully set up, configured, and tested the NVIDIA Morpheus AI application framework and NVIDIA BlueField-2 data processing units (DPUs) with Red Hat OpenShift and Red Hat Enterprise Linux.

    The model supplied with the Morpheus SDK for testing purposes is just one example of how to build an application for a particular use case using this powerful AI-enabled software. Additional models are available from NVIDIA, and Morpheus is designed to make it easy for customers to modify GPU-powered data science models or create new ones.

    Combining the advanced Kubernetes capabilities of OpenShift, the trusted capabilities of Red Hat Enterprise Linux, and the data sampling power of the BlueField DPU helps you develop and deploy custom cybersecurity applications tailored to your particular organization's needs.

    We hope that by going through this set of instructions, you now feel more confident about getting further involved in real-world projects. If you have encountered any issues while working with this configuration guide, please leave a comment on this article.

    Last updated: July 25, 2023

    Recent Posts

    • More Essential AI tutorials for Node.js Developers

    • How to run a fraud detection AI model on RHEL CVMs

    • How we use software provenance at Red Hat

    • Alternatives to creating bootc images from scratch

    • How to update OpenStack Services on OpenShift

    What’s up next?

     

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue