Featured image for machine learning

We hope you have enjoyed the first four Red Hat OpenShift Data Science learning paths:

To complement these resources, we have released a new data science learning path that will guide you through developing a PyTorch model that will be used to predict the onset of diabetes. This article describes the PyTorch learning path and provides an overview of OpenShift Data Science.

Note: Visit the OpenShift Data Science page to see our complete library of learning paths and other resources for developers and data scientists collaborating on intelligent applications.

Build, train, and run a PyTorch model

In How to create a PyTorch model, you will perform the following tasks:

  1. Start your Jupyter notebook server for PyTorch.
  2. Explore the diabetes data set.
  3. Build, train, and run your PyTorch model.

This learning path is the first in a three-part series about working with PyTorch models. In the first learning path, we show you how to explore your data set and create a basic PyTorch model. The model will help us predict if a person might have diabetes based on current medical readings. You will work with a data set that contains a number of diabetes readings for female patients with and without diabetes.

The Diabetes data set

The Diabetes data set can be used to predict the onset of diabetes based on medical diagnostic measurements. This database is available through the Kaggle environment and is described as follows:

“This data set is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the data set is to diagnostically predict whether a patient has diabetes based on diagnostic measurements included in the data set. Several constraints were placed on selecting these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.”

The data set consists of about 800 examples of various medical readings for female patients who are members of an indigenous nation. Some of the patients have diabetes. Knowing what medical readings look like for a person with diabetes, can we predict which people might have diabetes based on the medical readings we have gathered?

Let's dive in and see if we can create a PyTorch model to achieve this. Start the learning path now.

What is OpenShift Data Science?

OpenShift Data Science is a platform that makes it easier for developers and data scientists to develop, deploy, and monitor machine learning models. As a comprehensive environment built on top of Red Hat OpenShift, OpenShift Data Science integrates Jupyter notebooks—the core IDE where data scientists train models—with model development frameworks such as TensorFlow and PyTorch.

You can think of OpenShift Data Science as a meta-operator that sits above other Kubernetes Operators and combines them into a coherent, integrated environment. Currently, OpenShift Data Science partner technologies include:

  • Anaconda Commercial Edition for secure distribution and package management
  • IBM Watson Studio for building and managing models at scale and for AutoML
  • Intel OpenVINO and oneAPI AI analytics toolkits for optimizing and tuning models
  • Seldon Deploy for deploying, managing, and monitoring models
  • Starburst Galaxy for data integration

Support for NVIDIA accelerated computing is also coming soon.

Note: You can also try OpenShift Data Science in the Developer Sandbox for Red Hat OpenShift.

Where can I learn more?

Visit the OpenShift Data Science landing page to learn more about how data scientists, data engineers, and application developers use this service to collaborate across the intelligent application life cycle.

Last updated: September 20, 2023