Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

From notebooks to pipelines: Using Open Data Hub and Kubeflow on OpenShift

July 29, 2020
Juana Nakfour
Related topics:
Artificial intelligenceCI/CDDevOps
Related products:
Red Hat OpenShift

Share:

    Data scientists often use notebooks to explore data and create and experiment with models. At the end of this exploratory phase is the product-delivery phase, which is basically getting the final model to production. Serving a model in production is not a one-step final process, however. It is a continuous phase of training, development, and data monitoring that is best captured or automated using pipelines. This brings us to a dilemma: How do you move code from notebooks to containers orchestrated in a pipeline, and schedule the pipeline to run after specific triggers like time of day, new batch data, and monitoring metrics?

    Today there are multiple current tools and proposed methods for moving code from notebooks to pipelines. For example, the tool nachlass uses the source-to-image (S2I) method to convert notebooks ultimately into containers. In this article and in my original DevNation presentation, we explore a new Kubeflow-proposed tool for converting notebooks to Kubeflow pipelines: Kale. Kale is a Kubeflow extension that is integrated with JupyterLab's user interface (UI). It offers data scientists a UI-driven way to convert notebooks to Kubeflow pipelines and run the pipelines in an experiment.

    We run these tools as part of the Open Data Hub installation on Red Hat OpenShift. Open Data Hub is composed of multiple open source tools that are packaged in the ODH Operator. When you install ODH you can specify which tools you want to install, such as Airflow, Argo, Seldon, Jupyterhub, and Spark.

    Prerequisites

    To run this demo, you will need access to an OpenShift 4.x cluster with cluster-admin rights to install a cluster-wide Open Data Hub Operator.

    Install Kubeflow on OpenShift

    We can use the Open Data Hub Operator to install Kubeflow on OpenShift. From the OpenShift portal, go to the OperatorHub and search for Open Data Hub, as shown in Figure 1.

    A screenshot of the OperatorHub with the Open Data Hub Operator selected and installed.
    Figure 1: Installing Open Data Hub from the OpenShift OperatorHub.

    Click Install and move to the next screen. Currently, Open Data Hub offers two channels for installation: beta and legacy. The beta channel is for the new Open Data Hub releases that include Kubeflow. Keep the default settings on that channel, and click Subscribe, as shown in Figure 2.

    A screenshot of the subscription page with the option to choose the beta update channel.
    Figure 2: Create a subscription to the Open Data Hub Operator using the beta update channel.

    After you subscribe, the Open Data Hub Operator will be installed in the openshift-operators namespace, where it is available cluster-wide.

    Next, create a new namespace called kubeflow. From there, go to Installed Operators, click on the Open Data Hub Operator, and create a new instance of the kfdef resource. The default is an example kfdef instance (a YAML file) that installs Open Data Hub components such as Prometheus, Grafana, JupyterHub, Argo, and Seldon. To install Kubeflow, you will need to replace the example kfdef instance with the one from Kubeflow. Replace the example file with this one, then click Create. You will see the file shown in Figure 3.

    A screenshot of the new kfdef resource file for Kubeflow.
    Figure 3: Create the kfdef resource for Kubeflow.

    That is all it takes to install Kubeflow on OpenShift. Watch the pods install in the namespace and wait until all of the pods are running before starting on the next steps

    Notebooks to pipeline

    After the installation has successfully completed, the next step is to create a notebook server in your development namespace, then create a notebook that includes tasks for creating and validating models. To get to the Kubeflow portal, head over to the istio-system namespace and click on the istio-ingressgateway route. This route brings you to the main Kubeflow portal, where you must create a new profile and a working namespace. From the left side of the menu bar in the main menu, head over to Notebook Server and click on New Server. A new form will open, where you can create a notebook server to host your notebooks. Be sure that the namespace that you just created is selected in the drop-down menu.

    In this form, you must specify a custom image that includes the Kale component. Specify the custom image: gcr.io/arrikto-public/tensorflow-1.14.0-notebook-cpu:kubecon-workshop.

    Add a new data volume, as shown in Figure 4, then hit Launch.

    A screenshot of the setup page for the custom image.
    Figure 4: Launching a new notebook server.

    Once it is ready, you can connect to the notebook server that you just created. The new notebook server gets you to the main JupyterLab portal, which includes Kubeflow's Kale extension.

    An example notebook

    We will use a very simple notebook based on this example. The notebook predicts whether a house value is below or above average house values. For this demonstration, we simplified the notebook and prepared it for a pipeline. You can download the converted notebook from GitHub.

    Tasks in this notebook include downloading the house-prediction data, preparing the data, and creating a neural network with three layers that can predict the value of a given house. At this point, the example notebook looks like a normal notebook, and you can run the cells to ensure that they are working.

    To enable Kale in the notebook, click on the Kubeflow icon on the left-side menu bar and click Enable. You should see something similar to the screenshot in Figure 5.

    A screenshot of the Kale deployment panel and details.
    Figure 5: Click the Kubeflow icon on the left-side menu bar and enable Kale.

    You can specify the role for each cell by clicking the Edit button on the top-right corner of each cell. As shown in Figure 5, we have an imports section and a prepdata pipeline step, as well as a trainmodel pipeline step (not shown) that depends on prepdata step. Name the experiment and the pipeline, then click Compile and Upload.

    For now, we will just create the pipeline and defer running it until later. When you get an Okay message, head over to the main Kubeflow portal and select Pipelines. The first pipeline listed is the Kale-generated pipeline. If you click on it, you should see the pipeline details shown in Figure 6.

    A diagram of the Kale-generated pipeline, showing the steps in the pipeline's flow.
    Figure 6: Exploring the Kale-generated pipeline.

    Adjusting the pipeline

    You can explore the code and see the different steps in the pipeline. This is a generated pipeline that assumes the underlying Argo is using a docker container. As a result, this pipeline will not run on OpenShift, which uses a CRI-O container engine and the k8sapi executor for Argo.

    Also, note that the container image that is used for each step requires root permissions, so we had to give root privileges to the service account running the workflow (oc adm policy add-role-to-user admin system:serviceaccount:namespace:default-editor). Obviously, this method of running containers on OpenShift is not advised. In the future, we hope to change the container so that it does not require root privileges.

    You can download the adjusted pipeline and a volume YAML resource from GitHub. Create the volume before uploading and running the adjusted pipeline, which is shown in Figure 7.

    Note: This adjustment does not change the containers themselves. Instead, the pipeline structure, permissions, and added volumes were changed.

    A screenshot of the Kale-generated pipeline with the adjustments for OpenShift.
    Figure 7: The adjusted pipeline.

    Conclusion

    In this article, you learned how to install Kubeflow on OpenShift using the Open Data Hub Operator, and we explored using Kubeflow's Kale extension to convert notebooks to pipelines. Moving code from notebooks to pipelines is a critical step in the artificial intelligence and machine learning (AI/ML) end-to-end workflow, and there are multiple technologies addressing this issue. While these conversion tools might be immature and in development, we see great potential and room for improvement. Please join our Open Data Hub community and contribute to developing AI/ML end-to-end technologies on OpenShift.

    Last updated: February 5, 2024

    Recent Posts

    • How to run a fraud detection AI model on RHEL CVMs

    • How we use software provenance at Red Hat

    • Alternatives to creating bootc images from scratch

    • How to update OpenStack Services on OpenShift

    • How to integrate vLLM inference into your macOS and iOS apps

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue