Open Data Hub logo

Open Data Hub (ODH) is a blueprint for building an AI-as-a-Service (AIaaS) platform on Red Hat OpenShift 4. Version 0.7 of Open Data Hub includes support for deploying Kubeflow 1.0 on OpenShift, as well as increased component testing on the OpenShift continuous integration (CI) system. This article explores the recent updates.

Kubeflow 1.0 on OpenShift

For the 0.7 release of Open Data Hub, we focused on providing updates and fixes for the Kubeflow 1.0 deployment in the v1.0-branch-openshift branch of opendatahub-io/manifests. This release contains all of the required updates or fixes for a successful OpenShift deployment in each component's openshift overlay, with each component carefully tested and verified. The kfctl_openshift kfdef manifest combines all of these overlays into one file that you can use to install Kubeflow 1.0 on OpenShift.

If you're curious, here are the fixes we applied to resolve issues caused when deploying Kubeflow 1.0 on OpenShift:

  • Istio: Add the OpenShift overlays from the Kubeflow 0.7 release.
  • Pytorch: Add finalizers to the Pytorch roles. Add an overlay to customize the initContainer via a ConfigMap.
  • Katib: Add a missing finalizer to the assigned role.
  • Notebook controller: Update the  notebook-controller and jupyter-web-app components to version 1.0.
  • Seldon: Increase the memory limit for the seldon-core-operator to prevent OutOfMemory errors. Change the name and port for mutating and validating webhooks.

Combining Open Data Hub and Kubeflow

We are still testing the ability to mix Kubeflow with ODH components. We plan to provide full support for this feature in the next Open Data Hub release. If you want to test the combined deployment of ODH components and the TensorFlow Training (TFJob) Operator, you can deploy the opendatahub-mix kfdef manifest.

Continuous integration testing available for all components

In the previous Open Data Hub release, we successfully integrated with the OpenShift continuous integration (CI) system and created basic smoke tests for a few components available in odh-manifests. For this release, we focused on adding functionality tests for all of the components that ODH deploys. On each new pull request, we run a full ODH deployment that includes all of the available components. After a quick smoke test, we run a suite of functionality tests to confirm that each of the deployed components is working properly.

Here is a quick rundown of tests we added for each of the components:

  • AI Library and Seldon: Integration tests to verify that the AI Library Operator creates our SeldonDeployment models, and all of the deployed model APIs are online.
  • Airflow: Verify that the Airflow Operator can successfully deploy the AirflowBase and AirflowCluster custom resources.
  • Prometheus: Verify that the Prometheus application is running successfully, and the portal is up and running.
  • Grafana: Verify that the GrafanaDashboard and GrafanaDataSource custom resources deploy successfully.
  • JupyterHub: Verify that the JupyterHub server and database are online. Currently, we are working on including tests that will utilize full user-interface (UI) automation of the JupyterHub server and notebooks.
  • Spark Operator: Verify that the SparkCluster and SparkApplication custom resources deploy successfully and generate the expected outputs for specific Spark jobs.
  • Argo: Verify that the required pods are running, and the example "Hello, world" workflow runs successfully.

Seldon update to 1.2

While testing the ODH 0.7 release, we noticed deployment failures in the Seldon Operator. This failure was due to stale webhooks when the namespaced Seldon Operator was automatically updated from version 1.1 to 1.2. Starting with Seldon version 1.2, existing webhooks will be updated automatically during Operator upgrades.

Conclusion

If you want to stay current with all of the Open Data Hub updates, feel free to join our ODH community meetings and mailing list. If you have any questions about ODH deployment, please contact us via the ODH GitHub issues page.

Last updated: February 5, 2024