Open Data Hub and Kubeflow installation customization

It is just a few short weeks since we released Open Data Hub (ODH) 0.6.0, bringing many changes to the underlying architecture and some new features. We found a few issues in this new version with the Kubeflow Operator and a few regressions that came in with the new JupyterHub updates. To make sure your experience with ODH 0.6 does not suffer because we wanted to release early, we offer a new (mostly) bugfix release: Open Data Hub 0.6.1.

Operator

Probably the most important bug fix went into the Operator's code itself. The changes there are important for the project not only because of what they fix, but also because they prove our relationship with the Kubeflow community, since most of the code went directly upstream. We have also rebased our repository to make sure any new features in the Operator get in. Let’s take a look at a few things which have been fixed.

Deleting namespace with the instance

This issue caused something you do not expect from the application in general. When a user deleted the KFDef custom resource, the Operator also deleted the namespace where the Open Data Hub had been deployed. This is definitely something we do not want since there might be other applications running in the namespace which ODH does not control. We filed the issue and worked with the community to get the Operator to behave correctly.

Caches and collisions

Kubeflow’s kfctl tool downloads and caches the manifests locally so you do not need to download them on each command run. It also generates the Kustomize structure and manifests based on the KFDef content.

This flow works fine locally where you can manually move things around, but in the Operator it means that all of the KFDef custom resources use the same cache as the first one. After that, all of the instances were also deployed from the same generated Kustomize manifests. That behavior is obviously not right and causes a lot of problems.

The new version of the Operator handles this problem much better. It puts the cache and Kustomize manifests in directories based on the namespace and KFDef name, and it also reloads the cache whenever necessary to accommodate any potential manifest changes.

Building the Operator image

The last change we needed in the Operator is about how the image gets built. Since operator-sdk names the manager binary based on the cloned directory name, we needed to parameterize that in the Dockerfile to accommodate the change from kfctl to opendatahub-operator. Another change is our preference to run on top of the Universal Base Image (UBI), which is not the case with Kubeflow Operator, so we made the build process customizable to plug in alternative Dockerfiles.

Manifests

We made a few updates in the odh-manifests repository mainly to accommodate missing components and improve the documentation.

READMEs

We added basic descriptions to all of the components in the READMEs. Simply go to the repository and click on the component. You will see a README talking about the component purpose, its dependencies and configuration options, and examples on how to enable the component in the KFDef resource.

JupyterHub

As mentioned in our announcement for ODH 0.6.0, one of the JupyterHub dependencies—JupyterHub Singleuser Profiles—went through some changes and we found a few regressions. They were all fixed, including one that blocked the successful deployment of Jupyter notebook servers on GPU enabled nodes.

AI Library and Seldon

We omitted AI Library from the previous release because we were missing Seldon, which is a dependency of AI Library. Since Seldon Operator was successfully certified recently, we were able to add it via the Operator Lifecycle Manager and thus enable AI Library again.

Testing and continuous integration

One of the main long-standing issues with Open Data Hub development and maintenance was the lack of automated testing for incoming pull requests (PRs). Because of this issue, all of our verifications were manual and took a lot of time. Since we began planning our move to GitHub, we held high hopes for OpenShift CI as a viable solution for our continuous integration infrastructure.

We are happy to share that we are now hooked into OpenShift CI and the tests are running on all of the PRs in the odh-manifests repository. We will work on adding more tests and keep an eye on new PRs to make sure they come with tests to avoid introducing regressions in the future.

Last updated: February 5, 2024