Open Data Hub (ODH) is a blueprint for building an AI-as-a-Service (AIaaS) platform on Red Hat OpenShift 4. Version 0.7 of Open Data Hub includes support for deploying Kubeflow 1.0 on OpenShift, as well as increased component testing on the OpenShift continuous integration (CI) system. This article explores the recent updates.
Kubeflow 1.0 on OpenShift
For the 0.7 release of Open Data Hub, we focused on providing updates and fixes for the Kubeflow 1.0 deployment in the v1.0-branch-openshift
branch of opendatahub-io/manifests. This release contains all of the required updates or fixes for a successful OpenShift deployment in each component's openshift
overlay, with each component carefully tested and verified. The kfctl_openshift kfdef
manifest combines all of these overlays into one file that you can use to install Kubeflow 1.0 on OpenShift.
If you're curious, here are the fixes we applied to resolve issues caused when deploying Kubeflow 1.0 on OpenShift:
- Istio: Add the OpenShift overlays from the Kubeflow 0.7 release.
- Pytorch: Add finalizers to the Pytorch roles. Add an overlay to customize the
initContainer
via aConfigMap
. - Katib: Add a missing finalizer to the assigned role.
- Notebook controller: Update the
notebook-controller
andjupyter-web-app
components to version 1.0. - Seldon: Increase the memory limit for the
seldon-core-operator
to preventOutOfMemory
errors. Change the name and port for mutating and validating webhooks.
Combining Open Data Hub and Kubeflow
We are still testing the ability to mix Kubeflow with ODH components. We plan to provide full support for this feature in the next Open Data Hub release. If you want to test the combined deployment of ODH components and the TensorFlow Training (TFJob) Operator, you can deploy the opendatahub-mix kfdef
manifest.
Continuous integration testing available for all components
In the previous Open Data Hub release, we successfully integrated with the OpenShift continuous integration (CI) system and created basic smoke tests for a few components available in odh-manifests
. For this release, we focused on adding functionality tests for all of the components that ODH deploys. On each new pull request, we run a full ODH deployment that includes all of the available components. After a quick smoke test, we run a suite of functionality tests to confirm that each of the deployed components is working properly.
Here is a quick rundown of tests we added for each of the components:
- AI Library and Seldon: Integration tests to verify that the AI Library Operator creates our
SeldonDeployment
models, and all of the deployed model APIs are online. - Airflow: Verify that the Airflow Operator can successfully deploy the
AirflowBase
andAirflowCluster
custom resources. - Prometheus: Verify that the Prometheus application is running successfully, and the portal is up and running.
- Grafana: Verify that the
GrafanaDashboard
andGrafanaDataSource
custom resources deploy successfully. - JupyterHub: Verify that the JupyterHub server and database are online. Currently, we are working on including tests that will utilize full user-interface (UI) automation of the JupyterHub server and notebooks.
- Spark Operator: Verify that the
SparkCluster
andSparkApplication
custom resources deploy successfully and generate the expected outputs for specific Spark jobs. - Argo: Verify that the required pods are running, and the example "Hello, world" workflow runs successfully.
Seldon update to 1.2
While testing the ODH 0.7 release, we noticed deployment failures in the Seldon Operator. This failure was due to stale webhooks when the namespaced Seldon Operator was automatically updated from version 1.1 to 1.2. Starting with Seldon version 1.2, existing webhooks will be updated automatically during Operator upgrades.
Conclusion
If you want to stay current with all of the Open Data Hub updates, feel free to join our ODH community meetings and mailing list. If you have any questions about ODH deployment, please contact us via the ODH GitHub issues page.
Last updated: February 5, 2024