Red Hat OpenShift Data Science is a managed cloud service built from a curated set of components from the upstream Open Data Hub project. It aims to provide a stable sandbox in which data scientists can develop, train, and test their machine learning (ML) workloads and then deploy results in a container-ready format. This article summarizes the advantages of using OpenShift Data Science in your machine learning projects.
Containers make data science easy
While tools like JupyterLab (shown in Figure 1) already offer intuitive ways for data scientists to develop models on their machines, there are always inherent complexities involved with collaboration and sharing work. Moreover, using specialized hardware such as powerful GPUs can be very expensive when you have to buy and maintain your own. The JupyterHub that is included with OpenShift Data Science lets data scientists take their development environments to the cloud. Because all of the workloads are run as containers, collaboration is as easy as sharing an image with your team members, or even simply adding it to the list of default containers they can use. GPUs and large amounts of memory suddenly become a lot more accessible, too, since you are no longer limited by what your laptop can support. All this, and you get to keep the same UX and development workflow you've always loved, too.
Securely built notebook images
Software stacks, especially those involved in machine learning, tend to be complex beasts. There are numerous modules and libraries in the Python ecosystem that can be used, so determining which versions of what libraries to use can be very challenging. As Figure 2 shows, OpenShift Data Science comes with many packaged notebook images that have been built with insight from data scientists and recommendation engines such as Thoth adviser. This allows data scientists to start new projects quickly on the right foot without worrying about downloading unproven and possibly insecure images from random upstream repositories.
Integrations with third-party machine learning tools
We have all run into situations where our favorite tools or services don't play well with one another. OpenShift Data Science is designed with flexibility in mind. As Figure 3 shows, a wide range of open source and third-party AI/ML tools can be used with OpenShift Data Science. These tools support the complete machine learning lifecycle, from data engineering and feature extraction to model deployment and management. No more leaving your favorite toys behind.
Tried and tested with Operate First
The Open Data Hub is an open source community project consisting of over 30 AI/ML tools that cover the entire lifecycle of possible needs for any machine learning initiative. The Operate First initiative aims to deploy a subset of the most-used components in an open environment to gain additional operational expertise and to help harden the upstream project. OpenShift Data Science takes a core set of the most commonly used and stable components and delivers them as a managed cloud service on Red Hat OpenShift Dedicated and Red Hat OpenShift Service on AWS. This means that data scientists can focus on rapid iteration and experimentation while leveraging Red Hat's experience in running complex workloads on Red Hat OpenShift.
Conclusion
Find out more about OpenShift Data Science or watch this video demo to see it in action. You can try out the upstream Open Data Hub project yourself at https://opendatahub.io/.
Last updated: February 5, 2024