Featured image for: How to install Kubeflow 1.2 on Red Hat OpenShift.

As artificial intelligence (AI) adoption increases across industries, particularly through machine learning (ML), the job of integrating the often disparate tools, libraries, packages, and dependencies also increases in complexity. This makes development and operations (DevOps) a daunting task that both organizations and open source communities are actively working on. To quote the authors of Hidden Technical Debt in Machine Learning Systems, "developing and deploying ML systems is relatively fast and cheap, but maintaining them over time is difficult and expensive."

If you are in the throes of tackling DevOps for AI/ML (MLOps), two open source projects worth your attention are the upstream Kubeflow and the downstream Open Data Hub (ODH). The goal of these projects is to provide machine learning toolkits that handle the complex parts of orchestration that traditional software DevOps does not.

Note: For more about MLOps, see, Dotscience on OpenShift: Enabling DevOps for MLOps.

As the name indicates, Kubeflow is based on Kubernetes. In this article, we'll show it running on Red Hat OpenShift Container Platform and include an Istio service mesh.

Objective

Use this article as a startup procedure to install a default Kubeflow toolkit on an OpenShift Container Platform instance to explore the tools and capabilities. Figure 1 shows the Kubeflow dashboard running on OpenShift Container Platform, providing access to a suite of machine learning tools that span the system life cycle.

Screenshot of the kubeflow central dashboard
Figure 1: The Kubeflow central dashboard.
Figure 1: The Kubeflow central dashboard.

Note: The latest release of Kubeflow at the time of this writing incorporates changes to the file structure for distribution-specific platforms, such as OpenShift. If you are interested in the details, you can read the source pull request that explains the reason for the change.

Overview of major steps

The following list summarizes the steps needed to get Kubeflow running on OpenShift Container Platform:

  1. Install the Open Data Hub Operator.
  2. Create the Kubeflow project.
  3. Install Kubeflow.
  4. Monitor the installation.
  5. Access the Kubeflow user interface (UI).

Requirements

To use Kubeflow as shown in this article, please note the following requirements:

  • You must have an OpenShift Container Platform cluster 4.2+ installed with cluster admin privileges.
  • You should not have an existing Istio service mesh, because it will lead to name collisions.
  • You should nothave an existing project named istio-system as Kubeflow deploys Istio along with configurations.
  • You must not have remaining mutating webhooks or validating webhooks from prior tests.
  • You must not deploy Kubeflow in a project or namespace other than kubeflow.

Running on an OpenShift cluster

Here are some options for getting access to an OpenShift cluster to run through the procedure in this article. Getting a cluster running is beyond the scope of the tutorial, but the resources in this section offer a starting point.

On your local machine cluster (recommended)

Red Hat CodeReady Containers is designed to run on a local computer to simplify setup and testing. The product emulates the cloud development environment with all of the tools needed to develop container-based applications.

On a 60-minute temporary cluster (only for learning)

Katacoda offers an OpenShift cluster as a playground that can be used to perform this installation, as long as you complete the task in an hour or less. It can be done.

More options

See the OpenShift trial page for other options.

Installing the Open Data Hub Operator

Kubeflow should be installed on OpenShift using the Open Data Hub Operator from the OpenShift Operators catalog. The upstream Kubeflow Operator from OperatorHub.io will not run successfully on OpenShift because it is intended for a general-purpose Kubernetes cluster.

As an administrator from the OpenShift web console, do the following:

  1. Go to Operators.
  2. Go to OperatorHub.
  3. Search for "Open Data Hub."
  4. Click the Open Data Hub Operator button.
  5. Click the Continue button.
  6. Click the Install button.
  7. Accept the default installation strategy, which uses the following settings:
    • Update Channel: beta
    • Installation mode: All namespaces on the cluster (default)
    • Installed Namespace: openshift-operators
    • Approval strategy: Automatic
  8. Click the Install button.

Figure 2 illustrates the Open Data Hub Operator selection from the OpenShift OperatorHub.

Screenshot of Open Data Hub Operator install from the Red Hat OpenShift OperatorHub
Figure 2: Open Data Hub Operator install.
Figure 2: Open Data Hub Operator installation.

Creating the Kubeflow project

Kubeflow must be installed in a namespace called kubeflow. A request for an alternative namespace is an open issue at the time of this writing.

As an administrator from the OpenShift web console, do the following:

  1. Go to Home.
  2. Go to Projects.
  3. Click the Create Project button.
  4. Set the following values:
    • Name: kubeflow (cannot be altered)
    • Display Name: kubeflow (unlike the previous name, you can choose another value here)
    • Description: Kubeflow ML toolkit (you can choose another value)
  5. Change to the kubeflow project.
  6. Go to Operators—>Installed Operators.
  7. Wait for the Operator to display "Succeeded" in the Status field.

Figure 3 displays the expected result when the operator is completely installed.

Screenshot of Open Data Hub Operator with a succeeded status
Figure 3: ODH Operator successful installation.
Figure 3: A succesful installation of the Open Data Hub Operator.

Installing Kubeflow

By default, the Open Data Hub Operator includes a manifest that lets you try out different components for MLOps. Because the toolset used in this article is different from the one in the default manifest, you should paste in a different manifest.

As an administrator from the OpenShift web console, do the following:

  1. Click the Open Data Hub Operator button.
  2. Click the Open Data Hub link under Provided APIs.
  3. Click the Create KfDef button.
  4. Click the YAML View radio button.
  5. Delete all the YAML code.
  6. Copy and paste in all the YAML code from kfctl_openshift.v1.2.0.yaml.

    Note: For reference, the HTML version can be found on Kubeflow GitHub manifests.

  7. Click the Create button.

Figure 4 shows the Provided APIs selection.

Screenshot of the provided API for selection to create a KfDef and edit YAML
Figure 4: Open Data Hub Provide API.
Figure 4: Open Data Hub Provided APIs.

Figure 5 shows the YAML code you will replace.

Screenshot of the YAML View to replace existing YAML with provided YAML for Kubeflow
Figure 5: ODH Provided API KfDef YAML View.
Figure 5: Open Data Hub Provided API KfDef YAML View.

Monitoring the installation

In the background, the Open Data Hub Operator performs the commands a system administrator would execute on the command line to install Kubeflow, such as kfctl build -f... and kfctl apply -f... The web console doesn't show when the installation is complete, so this section shows a few ways to monitor the installation. If all the pods are running without errors, the installation is complete.

Monitoring from the administrator perspective

Streaming events are a great way to get a sense of what major activity is occurring after an action such as a deployment. To view the events:

  1. Go to Home.
  2. Section the project: either kubeflow to see just events for Kubeflow, or "All projects" to see the multiple projects being updated during installation.
  3. Go to Events to monitor the deployment events stream.

Figure 6 shows the events streaming during an installation in the kubeflow project.

Screenshot of the event stream during a new kubeflow installation
Figure 5: Event stream during installation
Figure 6: Event stream during installation.

Workload status and alerts are a quick way to understand how progress is going. To view the workloads:

  1. Under Home, click Projects.
  2. Click the kubeflow project link.
  3. Click the Workloads menu item in the body of the screen to review pods.
  4. Investigate workloads that don't self-correct (give them time to auto-correct).

Figure 7 shows workloads from the project overview page. Workloads in the project are also viewable from the vertical menu.

Screenshot of the workloads from the project overview page
Figure 6: Kubeflow project Workloads overview
Figure 7: Overview of the Kubeflow project workloads.

A project called cert-manager gets created during installation. Its events and pods provide good insight. To view these events or pods:

  1. Select Project: cert-manager.
  2. Under Home, click Events to review events. Under Workloads, click Pods to review pods.

Figure 8 shows the pods for cert-manager.

Screenshot of the pods in the cert-manager project that gets created
Figure 7: Project cert-manager pods status
Figure 8: Status of pods in the cert-manager pods status.

Another important project, istio-system, is created during installation. This project hosts the Istio service mesh that handles all the networking between the services. To view the project:

  1. Select Project: istio-system.
  2. Under Home, click Events to review events. Under Workloads, click Pods to review pods. Under Networking, click Routes to access the URL to the Kubeflow central dashboard.

Figure 9 shows the routes in the project.

Screenshot of the istio ingress gateway route to access kubeflow interface
Figure 8: Istio-system route for istio-ingressgateway
Figure 9: istio-system route for the istio-ingress gateway.

Monitoring from the developer perspective

In addition to the administrator perspective, a developer perspective abstracts infrastructure features out of view to leave an uncluttered developer experience. To see this perspective:

  1. Go to the developer perspective.
  2. Select Project: kubeflow.
  3. Go to Topology.

Figure 10 shows the results.

Screenshot of the app topology created in the kubeflow project
Figure 9: Developer Perspective kubeflow project topology
Figure 10: Kubeflow project topology in the developer perspective.

If there are no errors across the projects and the Kubeflow UI launches, the installation has succeeded.

Accessing the Kubeflow UI

This section offers two ways to access the Kubeflow central dashboard from the web console. For reference, a command-line query would look like:

# oc get routes -n istio-system istio-ingressgateway -o jsonpath='http://{.spec.host}/'

Going to the dashboard from the administrator perspective

From the administrator perspective, do the following:

  1. Select Project: istio-system.
  2. Go to Networking.
  3. Go to Routes.
  4. Click the location URL http://istio-ingressgateway....

Figure 11 shows how to find the location URL.

Screenshot of the kubeflow dashboard route in the istio-system project
Figure 10: Route to Kubeflow Dashboard in the istio-system project
Figure 11: Route to the Kubeflow dashboard in the istio-system project.

Going to the dashboard from the developer perspective

From the developer perspective, do the following:

  1. Select Project: istio-system.
  2. Go to Topology.
  3. Search for "istio-ingressgateway."
  4. Click the Open URL arrow icon, or click the istio-ingressgateway pod and the URL under Resources—>Routes.

Figure 12 shows the location of the URL.

Screenshot of the app topology of the istio-system project to access Kubeflow dashboard
Figure 11: Developer Perspective route to Kubeflow from istio-system project
Figure 12: Developer perspective route to Kubeflow from istio-system project.

Viewing the Kubeflow central dashboard

Once you complete the registration process and create a namespace, you will see a dashboard like the one in Figure 13.

Screenshot of the kubeflow central dashboard
Figure 1: The Kubeflow central dashboard.
Figure 13: The Kubeflow central dashboard.

Uninstalling Kubeflow

No proper installation procedure is truly complete without an uninstallation procedure.

As an administrator from the OpenShift web console, do the following:

  1. Select Project kubeflow.
  2. Click the Open Data Hub Operator.
  3. Click the Open Data Hub link under Provided APIs.
  4. Click the Kebab button (the one with three vertical dots) for your kubeflow instance.
  5. Click Delete KfDef to begin the delete process for your kubeflow instance.

Summary

The procedure in this article illustrates a best practice you can follow to install Kubeflow on Red Hat OpenShift using the Open Data Hub Operator. The manifest file used provides an example toolkit from the Kubeflow project that you can fork, modify, and update to fit your production MLOps needs. Furthermore, the Operator framework simplifies installation, operations, and maintenance as the community continues to publish enhancements to both the operator and machine learning tooling in conjunction with the overall benefits of AI/ML on Red Hat OpenShift.

Last updated: November 8, 2023