A development roadmap for Open Data Hub
Open Data Hub (ODH) is a blueprint for building an AI-as-a-Service (AIaaS) platform on Red Hat’s Kubernetes-based OpenShift 4.x. The Open Data Hub team recently released Open Data Hub 0.6.0, followed up by a smaller update of Open Data Hub 0.6.1.
We recently got together and discussed our plans and timeline for the next two releases. Our plans are based on the roadmap slide deck that we put together and presented during the Open Data Hub community meeting on April 6.
In this article, we present our roadmap for the next several Open Data Hub releases. We would like to emphasize that the target dates are optimistic, describing what we would like to achieve. With the current state of the world and vacation time coming up, these dates might change.
Open Data Hub 0.7: End of June 2020
Kubeflow 1.0 on OpenShift
The primary goal of this initiative is to verify that Kubeflow 1.0 works on Red Hat OpenShift and fix the issues that we find. Another goal is to document and ideally automate some of the verification processes to start enabling continuous integration (CI) for Kubeflow on OpenShift.
Note: Because the project is quite fresh and changes frequently, we have disabled KFServing while we work to enable Kubeflow 0.7 on OpenShift. We will continue to investigate the state of KFServing and any potential issues with it as part of our work in this area.
Improve Open Data Hub CI
The continuous integration (CI) based on OpenShift CI is running for the odh-manifests repository, but the test set is minimal. The goal of this initiative is to extend and improve the tests for all components. We want to be able to verify that containers have not only started but that the components actually work—meaning we plan to add some functionality testing.
Another aspect of this project is to enable the CI component for other Open Data Hub repositories, mainly opendatahub-operator.
Start mixing Open Data Hub and Kubeflow components
Since the inception of the Open Data Hub project on top of Kubeflow, we have planned to have the ability to mix components from Open Data Hub and Kubeflow. Currently, it’s possible, but there is no promise that the components will run and work well together.
We have found issues with running Kubeflow in custom namespaces. Kubeflow also heavily depends on Istio, which is not yet the case for Open Data Hub. For this project, we will choose a Kubeflow component and start testing it when it is running as part of Open Data Hub. (The first candidate is one of the training-job operators: TF job or Pytorch job).
Add Object Storage Component
Open Data Hub relies on Amazon Simple Storage Service (Amazon S3)-compatible object storage. Currently, we do not have an easy way to provide that for our users, so we plan to add a component implementing S3-compatible object storage. Such component could be the OpenShift Container Storage (OCS) or Rook Ceph.
Because the default installation of OCS is very resource-intensive, we will work with the team to help us with a smaller (non-production) installation that can be used for development, testing, workshops, and other use cases.
Convert Data Catalog to Kustomize
Open Data Hub 0.8: End of August 2020
For the summer release, we are focusing on automation and making sure Open Data Hub and Kubeflow work well together.
We will keep improving CI of Open Data Hub, but the second part of the story is continuous deployment (CD). We have two targets for this initiative: The Internal Data Hub running internally at Red Hat and an Open Data Hub instance deployed in Mass Open Cloud.
The idea is to automatically deploy new versions of Open Data Hub into staging on both of these targets. Because Open Data Hub consists of at least three main parts—the operator, Open Data Hub manifests, and the Kubeflow manifests—we need to investigate what we really mean by continuous deployment on Open Dat Hub. Our goal is to come up with a plan for what to automate and how we will do it with regard to deployment.
Universal base images (UBI) for Open Data Hub
Red Hat is doing great work with developing the Red Hat Universal Base Image (UBI), and we want to leverage that as much as possible. The goal of this initiative is to verify that all of our components are running on UBI and work with upstream communities where that is not the case.
Continue mixing Open Data Hub and Kubeflow components
Assuming our experiment with mixing components in Open Data Hub 0.7 goes well, we will continue to test, fix, and verify more of these components. We will complete work on our priority list of components before we start working on the next set for Open Data Hub 0.8
Open Data Hub 0.9: Autumn 2020
This is the farthest release we have a rough idea for at the moment. With this release, we will focus on enterprise requirements for Open Data Hub.
The disconnected deployment of Open Data Hub and Kubeflow is a big topic—especially in the financial sector and AI on the Edge use cases. The Kubeflow community has made attempts to solve this issue in the past. We will need to see if we can build on this work or if we should start from scratch.
Along with our push for a UBI-based Open Data Hub, we would like to take a look at Kubeflow components and images and find a way to port and maintain them on UBI. Reproducibility and automation are key to the long-term success of this project.
Follow the roadmap!
As part of creating the new roadmap, we are also redesigning the Open Data Hub Roadmap on opendatahub.io. We will keep this document updated so that you can always see where we are in our current and future plans for Open Data Hub.