Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Installing Kubeflow v0.7 on OpenShift 4.2

February 10, 2020
Juana Nakfour Václav Pavlín Landon LaSmith Chad Roberts
Related topics:
ContainersKubernetesService Mesh
Related products:
Red Hat OpenShift

Share:

    As part of the Open Data Hub project, we see potential and value in the Kubeflow project, so we dedicated our efforts to enable Kubeflow on Red Hat OpenShift. We decided to use Kubeflow 0.7 as that was the latest released version at the time this work began. The work included adding new installation scripts that provide all of the necessary changes such as permissions for service accounts to run on OpenShift.

    The installation of Kubeflow is limited to the following components:

    • Central dashboard
    • Jupyterhub
    • Katib
    • Pipelines
    • Pytorch, tf-jobs (training)
    • Seldon (serving)
    • Istio

    All of the new fixes and features will be proposed upstream to the Kubeflow project in the near future.

    Prerequisites

    To install Kubeflow on OpenShift, there are prerequisites regarding the platform and the tools.

    Platform

    To run this installation, OpenShift is needed as a platform. You can use either OpenShift 4.2 or Red Hat CodeReady Containers (CRC). If you choose OpenShift 4.2, all that you need is an available OpenShift 4.2 cluster. Or, you can try a cluster on try.openshift.com.

    If you choose CodeReady Containers, you need a CRC-generated OpenShift cluster. Here are the recommended specifications:

    • 16GB RAM
    • 6 CPUs
    • 45GB disk space

    The minimum specifications are:

    • 10GB RAM
    • 6 CPUs
    • 30GB disk space (the default for CRC)

    Note: At the minimum specs, the CRC OpenShift cluster might be unresponsive for approximately 20 minutes while the Kubeflow components are being deployed.

    When installing Kubeflow on a CRC cluster, there is an extra overlay (named "crc") to enable the metadata component in kfctl_openshift.yaml. This overlay is commented out by default. Uncomment the overlay to enable it.

    Tools

    The installation tool kfctl is needed to install/uninstall Kubeflow. Download the tool from GitHub. Version 0.7.0 is required for this installation.

    Installing Kubeflow with Istio enabled

    As noted earlier, we added a KFDef file to specifically install Kubeflow on OpenShift and included fixes for different components. To install Kubeflow 0.7 on OpenShift 4.2 please follow the steps below. It is assumed that this installation will run on an OpenShift 4.2 cluster:

    1. Clone the opendatahub-manifest fork repo, which defaults to the branch v0.7.0-branch-openshift:
    $ git clone https://github.com/opendatahub-io/manifests.git
    $ cd manifests
    1. Install using the OpenShift configuration file and the locally downloaded manifests, since at the time of writing we ran into this Kubeflow bug that would not allow downloading the manifests during a build process:
    $ sed -i 's#uri: .*#uri: '$PWD'#' ./kfdef/kfctl_openshift.yaml
    $ kfctl build --file=kfdef/kfctl_openshift.yaml
    $ kfctl apply --file=./kfdef/kfctl_openshift.yaml
    1. Verify your installation:
    $ oc get pods
    1. Launch the Kubeflow portal:
    $ oc get routes -n istio-system istio-ingressgateway -o jsonpath='http://{.spec.host}/'
    http://<istio ingress route>/

    Deleting a Kubeflow installation

    To delete a Kubeflow installation, follow these steps:

    $ kfctl delete --file=./kfdef/<kfctl file name>.yaml
    $ rm -rf kfdef/kustomize/
    $ oc delete mutatingwebhookconfigurations.admissionregistration.k8s.io --all
    $ oc delete validatingwebhookconfigurations.admissionregistration.k8s.io --all
    $ oc delete namespace istio-system

    Kubeflow components

    To enable the installation of Kubeflow 0.7 on OpenShift 4.2, we added features and fixes to alleviate the installation issues we encountered. The following is a list of components along with a description of the changes and usage examples.

    OpenShift KFDef

    KFDef is a specification designed to control the provisioning and management of Kubeflow deployment. This spec is generally distributed in YAML format and follows a pattern of custom resources popular in Kubernetes to extend the platform. With the upcoming addition of Kubeflow Operator, KFDef is becoming the custom resource used for Kubeflow deployment and lifecycle management.

    KFDef is built on top of Kustomize, which is a Kubernetes-native configuration management system. To deploy Kubeflow to OpenShift, we had to create a new KFDef YAML file that customizes the deployment manifests of Kubeflow components for OpenShift. With Kustomize as a configuration management layer for every component, it was necessary to add OpenShift-specific Kustomize overlays (patches applied to the default set of resource manifests when an overlay is selected).

    Take a look at the OpenShift-specific KFDef file used in the deployment steps above in the opendatahub-io/manifests repository.

    Central dashboard

    The central dashboard works out of the box, provided that you access the Kubeflow web UI using the route for istio-ingressgateway in the istio-system namespace.

    Upon first accessing the web UI, you will be prompted to create a Kubeflow user namespace. This is a one-time action for creating a single namespace. If you want to make additional namespaces accessible for Kubeflow deployment of notebook servers, Pipelines, etc., you can create a Kubeflow profile. By default, the central dashboard does not have authentication enabled.

    Jupyter controller

    We are using three Jupyter controller customizations: a custom notebook controller, a custom profile controller, and a custom notebook image. Let's take a look at each.

    Custom notebook controller

    We are using a customized notebook controller to avoid the default behavior of setting fsGroup: 100 in the stateful set that is created when spawning a notebook. That value would require a special security context restraint (SCC) for the service account in OpenShift. To further complicate matters, that SCC would need to be granted to a service account that is created only when the profile is created, so it’s not something that can be done during installation.

    Related links:

    • The related upstream issue.
    • The repository for this controller image.
    • The source is here.

    Custom profile controller

    We are using a customized profile controller to avoid the default behavior of newly created profiles having the label istio-injection: enabled. That label causes the container to attempt to start an istio-init container that, in turn, tries to use iptables, which is not available in OpenShift 4.x. That init container will fail and cause the notebook start to fail.

    Related links:

    • The related upstream issue.
    • The repository for the controller image.
    • The source is here.

    Custom notebook image

    We also added our own custom notebook image, which is prepopulated in the image selection dropdown. This image provides filesystem permissions in the /home/jovyan directory. It offers the functionality described here.

    Katib

    Katib suffered two main problems. The first was not being able to run cleanly as an unprivileged user (#960, #962, #967). The second is that it was damaging a generated security context in a pod by mutating the pod (#964). Both have been fixed in upstream Katib repositories and Katib now runs without issues on OpenShift.

    The second issue, in particular, is a pattern common in applications relying on mutating webhooks where part of the mutation is adding a sidecar container to the pod that is being deployed. If the new container does not have an initialized security context, the pod admission policy controller will prevent its deployment. We have seen the same issue in the KFServing component.

    Pipelines

    To get Kubeflow Pipelines working on OpenShift, we had to specify the k8sapi executor for Argo because OpenShift 4.2 does not include a Docker daemon and CLI. Instead, it uses CRI-O as the container engine by default. We also had to add the finalizers to the workflow permissions for OpenShift to be able to set owner references.

    This practice allows running YAML-based Pipelines that conform to Argo’s specification regarding k8sapi Pipelines execution, specifically for the condition of saving params and artifacts in volumes (such as emtpyDir) and not the path that is part of the base image layer (for example, /tmp). This specific requirement rendered all example Kubeflow Python Pipelines with errors. To test your Pipelines, use the fraud detection Pipelines provided in this article.

    For minio installation, we also created a service account and gave that account permission to run as anyuid.

    Training

    For training, we had to make changes for two of the apps: PyTorch and TensorFlow jobs (tf-jobs).

    PyTorch

    For PyTorch, we did not have to make any changes to the component. However, we did have to make changes to the Dockerfile of one of the examples found here. We had to add the required folders and permissions to the Dockerfile by doing the following to run the example MNIST test:

    1. Change the Dockerfile to include the following:
    FROM pytorch/pytorch:1.0-cuda10.0-cudnn7-runtime
    RUN pip install tensorboardX==1.6.0
    RUN chmod 777 /var
    WORKDIR /var
    ADD mnist.py /var
    RUN mkdir /data
    RUN chmod 777 /data
    ENTRYPOINT ["python", "/var/mnist.py"]
    1. Build and push the Dockerfile to your registry:
    podman build -f Dockerfile -t <your registry name>/pytorch-dist-mnist-test:2.0 ./
    podman push <your registry name>/pytorch-dist-mnist-test:2.0
    1. Add the registry image URL to the installation YAML file. We tested this setup without GPU and our file is the following:
    apiVersion: "kubeflow.org/v1"
    kind: "PyTorchJob"
    metadata:
    name: "pytorch-dist-mnist-gloo"
    spec:
    pytorchReplicaSpecs:
    Master:
    replicas: 1
    restartPolicy: OnFailure
    template:
    spec:
    containers:
    - name: pytorch
    image: <your registry name>/pytorch-dist-mnist-test:2.0<
    args: ["--backend", "gloo"]
    # Comment out the below resources to use the CPU.
    resources: {}
    Worker:
    replicas: 1
    restartPolicy: OnFailure
    template:
    spec:
    containers:
    - name: pytorch
    image: <your registry name>/pytorch-dist-mnist-test:2.0
    args: ["--backend", "gloo"]
    # Comment out the below resources to use the CPU.
    resources: {}
    1. Create a PyTorch job by running the command
    oc create -f v1/<filename.yaml>
    1. Check that the worker and master PyTorch pods are running with no errors.

    Tf-jobs

    To get TF-jobs training working on OpenShift we had to add the tfjob/finalizers resource for the tf-job-operator ClusterRole for OpenShift to be able to set owner references. Follow these steps to run the example MNIST training job:

    1. Run:
    $ git clone https://github.com/kubeflow/tf-operator
    $ cd tf-operator/examples/v1/mnist_with_summaries
    1. Create the PersisentVolumeClaim shown below (we did have to change the acceddModes to RedWriteOnce for our cluster):
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
    name: tfevent-volume
    namespace: kubeflow
    labels:
     type: local
    app: tfjob
    spec:
    accessModes:
    - ReadWriteOnce
    resources:
    requests:
    storage: 10Gi
    1. Run:
    $ oc apply -f tfevent-volume/<new pvc filename>.yaml
    $ oc apply -f tf_job_mnist.yaml
    $ oc describe tfjob mnist
    Events:
    Type    Reason                   Age   From      Message
    ----    ------                   ----  ----      -------
    Normal  SuccessfulCreatePod      12m tf-operator Created pod: mnist-worker-0
    Normal  SuccessfulCreateService  12m tf-operator Created service: mnist-worker-0
    Normal  ExitedWithCode           11m tf-operator Pod: kubeflow.mnist-worker-0 exited with code 0
    Normal  TFJobSucceeded           11m tf-operator TFJob mnist successfully completed.

    Serving

    For serving, we had to make changes for one of the apps: Seldon

    Seldon

    To get Seldon to work on OpenShift we had to delete the "8888" UID value assigned to the engine container that is part of a served model pod. This value dictated that every time a model is served, its engine controller container UID was assigned the value "8888," but that value is not within the allowed range of UID values in OpenShift.

    For a quick example to try this issue out for yourself, here is an example fraud detection model:

    1. Create a Seldon deployment YAML file using the following example:
    { "apiVersion": "machinelearning.seldon.io/v1alpha2", "kind": "SeldonDeployment", "metadata": { "labels": { "app": "seldon" }, "name": "modelfull", "namespace": "kubeflow" }, "spec": { "annotations": { "project_name": "seldon", "deployment_version": "0.1" }, "name": "modelfull", "oauth_key": "oauth-key", "oauth_secret": "oauth-secret", "predictors": [ { "componentSpecs": [{ "spec": { "containers": [ { "image": "nakfour/modelfull", "imagePullPolicy": "Always", "name": "modelfull", "resources": { "requests": { "memory": "10Mi" } } } ], "terminationGracePeriodSeconds": 40 } }], "graph": { "children": [], "name": "modelfull", "endpoint": { "type" : "REST" }, "type": "MODEL" }, "name": "modelfull", "replicas": 1, "annotations": { "predictor_version" : "0.1" } } ] } }
    1. Install this configuration by running:
    $ oc create -f "filename.yaml"
    1. Verify that there is a pod that includes the name modelfull running.
    2. Verify that there is a virtual service that includes the name modelfull.
    3. From a terminal, send a predict request to the model using this example curl command:
    curl -X POST -H 'Content-Type: application/json' -d '{"strData": "0.365194527642578,0.819750231339882,-0.5927999453145171,-0.619484351930421,-2.84752569239798,1.48432160780265,0.499518887687186,72.98"}' http://"Insert istio ingress domain name"/seldon/kubeflow/modelfull/api/v0.1/predictions

    Istio

    Installing the default Istio provided with Kubeflow 0.7 required adding a route to the Istio ingress gateway service and the anyuid security context. These additions give Istio permission to run as a privileged user for the multiple service accounts used by Istio's components.

    Next steps

    The Open Data Hub team is currently focused on multiple next steps or tasks:

    • Resolving component issues already discussed in this document, such as Pipeline and Katib.
    • Integrating Kubeflow 0.7 with Red Hat Service Mesh on OpenShift 4.2.
    • Proposing the changes discussed in this document back upstream to the Kubeflow community.
    • Working with the Kubeflow community to add official OpenShift platform documentation on the Kubeflow website as a supported platform.
    • Architecting and designing a solution for tight integration between Open Data Hub and Kubeflow that includes Operator redesign.
    Last updated: March 28, 2023

    Recent Posts

    • Meet the Red Hat Node.js team at PowerUP 2025

    • How to use pipelines for AI/ML automation at the edge

    • What's new in network observability 1.8

    • LLM Compressor: Optimize LLMs for low-latency deployments

    • How to set up NVIDIA NIM on Red Hat OpenShift AI

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue