Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

From notebooks to pipelines: Using Open Data Hub and Kubeflow on OpenShift

July 29, 2020
Juana Nakfour
Related topics:
Artificial intelligenceCI/CDDevOps
Related products:
Red Hat OpenShift

    Data scientists often use notebooks to explore data and create and experiment with models. At the end of this exploratory phase is the product-delivery phase, which is basically getting the final model to production. Serving a model in production is not a one-step final process, however. It is a continuous phase of training, development, and data monitoring that is best captured or automated using pipelines. This brings us to a dilemma: How do you move code from notebooks to containers orchestrated in a pipeline, and schedule the pipeline to run after specific triggers like time of day, new batch data, and monitoring metrics?

    Today there are multiple current tools and proposed methods for moving code from notebooks to pipelines. For example, the tool nachlass uses the source-to-image (S2I) method to convert notebooks ultimately into containers. In this article and in my original DevNation presentation, we explore a new Kubeflow-proposed tool for converting notebooks to Kubeflow pipelines: Kale. Kale is a Kubeflow extension that is integrated with JupyterLab's user interface (UI). It offers data scientists a UI-driven way to convert notebooks to Kubeflow pipelines and run the pipelines in an experiment.

    We run these tools as part of the Open Data Hub installation on Red Hat OpenShift. Open Data Hub is composed of multiple open source tools that are packaged in the ODH Operator. When you install ODH you can specify which tools you want to install, such as Airflow, Argo, Seldon, Jupyterhub, and Spark.

    Prerequisites

    To run this demo, you will need access to an OpenShift 4.x cluster with cluster-admin rights to install a cluster-wide Open Data Hub Operator.

    Install Kubeflow on OpenShift

    We can use the Open Data Hub Operator to install Kubeflow on OpenShift. From the OpenShift portal, go to the OperatorHub and search for Open Data Hub, as shown in Figure 1.

    A screenshot of the OperatorHub with the Open Data Hub Operator selected and installed.
    Figure 1: Installing Open Data Hub from the OpenShift OperatorHub.

    Click Install and move to the next screen. Currently, Open Data Hub offers two channels for installation: beta and legacy. The beta channel is for the new Open Data Hub releases that include Kubeflow. Keep the default settings on that channel, and click Subscribe, as shown in Figure 2.

    A screenshot of the subscription page with the option to choose the beta update channel.
    Figure 2: Create a subscription to the Open Data Hub Operator using the beta update channel.

    After you subscribe, the Open Data Hub Operator will be installed in the openshift-operators namespace, where it is available cluster-wide.

    Next, create a new namespace called kubeflow. From there, go to Installed Operators, click on the Open Data Hub Operator, and create a new instance of the kfdef resource. The default is an example kfdef instance (a YAML file) that installs Open Data Hub components such as Prometheus, Grafana, JupyterHub, Argo, and Seldon. To install Kubeflow, you will need to replace the example kfdef instance with the one from Kubeflow. Replace the example file with this one, then click Create. You will see the file shown in Figure 3.

    A screenshot of the new kfdef resource file for Kubeflow.
    Figure 3: Create the kfdef resource for Kubeflow.

    That is all it takes to install Kubeflow on OpenShift. Watch the pods install in the namespace and wait until all of the pods are running before starting on the next steps

    Notebooks to pipeline

    After the installation has successfully completed, the next step is to create a notebook server in your development namespace, then create a notebook that includes tasks for creating and validating models. To get to the Kubeflow portal, head over to the istio-system namespace and click on the istio-ingressgateway route. This route brings you to the main Kubeflow portal, where you must create a new profile and a working namespace. From the left side of the menu bar in the main menu, head over to Notebook Server and click on New Server. A new form will open, where you can create a notebook server to host your notebooks. Be sure that the namespace that you just created is selected in the drop-down menu.

    In this form, you must specify a custom image that includes the Kale component. Specify the custom image: gcr.io/arrikto-public/tensorflow-1.14.0-notebook-cpu:kubecon-workshop.

    Add a new data volume, as shown in Figure 4, then hit Launch.

    A screenshot of the setup page for the custom image.
    Figure 4: Launching a new notebook server.

    Once it is ready, you can connect to the notebook server that you just created. The new notebook server gets you to the main JupyterLab portal, which includes Kubeflow's Kale extension.

    An example notebook

    We will use a very simple notebook based on this example. The notebook predicts whether a house value is below or above average house values. For this demonstration, we simplified the notebook and prepared it for a pipeline. You can download the converted notebook from GitHub.

    Tasks in this notebook include downloading the house-prediction data, preparing the data, and creating a neural network with three layers that can predict the value of a given house. At this point, the example notebook looks like a normal notebook, and you can run the cells to ensure that they are working.

    To enable Kale in the notebook, click on the Kubeflow icon on the left-side menu bar and click Enable. You should see something similar to the screenshot in Figure 5.

    A screenshot of the Kale deployment panel and details.
    Figure 5: Click the Kubeflow icon on the left-side menu bar and enable Kale.

    You can specify the role for each cell by clicking the Edit button on the top-right corner of each cell. As shown in Figure 5, we have an imports section and a prepdata pipeline step, as well as a trainmodel pipeline step (not shown) that depends on prepdata step. Name the experiment and the pipeline, then click Compile and Upload.

    For now, we will just create the pipeline and defer running it until later. When you get an Okay message, head over to the main Kubeflow portal and select Pipelines. The first pipeline listed is the Kale-generated pipeline. If you click on it, you should see the pipeline details shown in Figure 6.

    A diagram of the Kale-generated pipeline, showing the steps in the pipeline's flow.
    Figure 6: Exploring the Kale-generated pipeline.

    Adjusting the pipeline

    You can explore the code and see the different steps in the pipeline. This is a generated pipeline that assumes the underlying Argo is using a docker container. As a result, this pipeline will not run on OpenShift, which uses a CRI-O container engine and the k8sapi executor for Argo.

    Also, note that the container image that is used for each step requires root permissions, so we had to give root privileges to the service account running the workflow (oc adm policy add-role-to-user admin system:serviceaccount:namespace:default-editor). Obviously, this method of running containers on OpenShift is not advised. In the future, we hope to change the container so that it does not require root privileges.

    You can download the adjusted pipeline and a volume YAML resource from GitHub. Create the volume before uploading and running the adjusted pipeline, which is shown in Figure 7.

    Note: This adjustment does not change the containers themselves. Instead, the pipeline structure, permissions, and added volumes were changed.

    A screenshot of the Kale-generated pipeline with the adjustments for OpenShift.
    Figure 7: The adjusted pipeline.

    Conclusion

    In this article, you learned how to install Kubeflow on OpenShift using the Open Data Hub Operator, and we explored using Kubeflow's Kale extension to convert notebooks to pipelines. Moving code from notebooks to pipelines is a critical step in the artificial intelligence and machine learning (AI/ML) end-to-end workflow, and there are multiple technologies addressing this issue. While these conversion tools might be immature and in development, we see great potential and room for improvement. Please join our Open Data Hub community and contribute to developing AI/ML end-to-end technologies on OpenShift.

    Last updated: February 5, 2024

    Recent Posts

    • Every layer counts: Defense in depth for AI agents with Red Hat AI

    • Fun in the RUN instruction: Why container builds with distroless images can surprise you

    • Trusted software factory: Building trust in the agentic AI era

    • Build a zero trust AI pipeline with OpenShift and RHEL CVMs

    • Red Hat Hardened Images: Top 5 benefits for software developers

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.