Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Introduction to machine learning with Jupyter notebooks

May 21, 2021
Ishu Verma
Related topics:
Artificial intelligenceEdge computingKubernetes
Related products:
Red Hat OpenShift

    Recently, I was working on an edge computing demo that uses machine learning (ML) to detect anomalies at a manufacturing site. This demo is part of the AI/ML Industrial Edge Solution Blueprint announced last year. As stated in the documentation on GitHub, the blueprint enables declarative specifications that can be organized in layers and that define all the components used within an edge reference architecture, such as hardware, software, management tools, and tooling.

    At the beginning of the project, I had only a general understanding of machine learning and lacked the practitioner's knowledge to do something useful with it. Similarly, I’d heard of Jupyter notebooks but didn’t really know what they were or how to use one.

    This article is geared toward developers who want to understand machine learning and how to carry it out with a Jupyter notebook. You'll learn about Jupyter notebooks by building a machine learning model to detect anomalies in the vibration data for pumps used in a factory. An example notebook will be used to explain the notebook concepts and workflow. There are plenty of great resources available if you want to learn how to build ML models.

    What is a Jupyter notebook?

    Computation notebooks have been used as electronic lab notebooks to document procedures, data, calculations, and findings. Jupyter notebooks provide an interactive computational environment for developing data science applications.

    Jupyter notebooks combine software code, computational output, explanatory text, and rich content in a single document. Notebooks allow in-browser editing and execution of code and display computation results. A notebook is saved with an .ipynb extension. The Jupyter Notebook project supports dozens of programming languages, its name reflecting support for Julia (Ju), Python (Py), and R.

    You can try a notebook by using a public sandbox or enabling your own server like JupyterHub. JupyterHub serves notebooks for multiple users. It spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server. In this article, JupyterHub will be running on Kubernetes.

    The Jupyter notebook dashboard

    When the notebook server first starts, it opens a new browser tab showing the notebook dashboard. The dashboard serves as a homepage for your notebooks. Its main purpose is to display the portion of the filesystem accessible by the user and to provide an overview of the running kernels, terminals, and parallel clusters. Figure 1 shows a notebook dashboard.

    A screenshot of the Jupyter notebook dashboard.
    Figure 1: A notebook dashboard.
    Figure 1: A notebook dashboard.

    The following sections describe the components of the notebooks dashboard.

    Files tab

    The Files tab provides a view of the filesystem accessible by the user. This view is typically rooted to the directory in which the notebook server was started.

    Adding a notebook

    A new notebook can be created by clicking the New button or uploaded by clicking the Upload button.

    Running tab

    The Running tab displays the currently running notebooks known to the server.

    Working with Jupyter notebooks

    When a notebook is opened, a new browser tab is created that presents the notebook's user interface. Components of the interface are described in the following sections.

    Header

    At the top of the notebook document is a header that contains the notebook title, a menu bar, and a toolbar, as shown in Figure 2.

    A screenshot of the Jupyter header.
    Figure 2: A notebook header.
    Figure 2: Notebook header.

    Body

    The body of a notebook is composed of cells. Cells can be included in any order and edited at will. The contents of the cells fall under the following types:

    • Markdown cells: These contain text with markdown formatting, explaining the code or containing other rich media content.
    • Code cells: These contain the executable code.
    • Raw cells: These are used when text needs to be included in raw form, without execution or transformation.

    Users can read the markdown and text cells and run the code cells. Figure 3 shows examples of cells.

    Examples of cells.
    Figure 3: Examples of cells.
    Figure 3: Examples of cells.

    Editing and executing a cell

    The notebook user interface is modal. This means that the keyboard behaves differently depending on what mode the notebook is in. A notebook has two modes: edit and command.

    When a cell is in edit mode, it has a green cell border and shows a prompt in the editor area, as shown in Figure 4. In this mode, you can type into the cell, like a normal text editor.

    Code cell in edit mode with prompt to allow editing
    Figure 4: A cell in edit mode.

    When a cell is in command mode, it has a blue cell border, as shown in Figure 5. In this mode, you can use keyboard shortcuts to perform notebook and cell actions. For example, pressing Shift+Enter in command mode executes the current cell.

    Cell in command mode
    Figure 5: A cell in command mode.

    Running code cells

    To run a code cell:

    1. Click anywhere inside the [ ] area at the top left of a code cell. This will bring the cell into command mode.
    2. Press Shift+Enter or choose Cell—>Run.

    Code cells are run in order; that is, each code cell runs only after all the code cells preceding it have run.

    Getting started with Jupyter notebooks

    The Jupyter Notebook project supports many programming languages. We’ll use IPython in this example. It uses the same syntax as Python but provides a more interactive experience. You’ll need the following Python libraries to do the mathematical computations needed for machine learning:

    • NumPy: For creating and manipulating vectors and matrices.
    • Pandas: For analyzing data and for data wrangling or munging. Pandas takes data such as a CSV file or a database, and creates from it a Python object called a DataFrame. A DataFrame is the central data structure in the Pandas API and is similar to a spreadsheet as follows:
      • A DataFrame stores data in cells.
      • A DataFrame has named columns (usually) and numbered rows.
    • Matplotlib: For visualizing data.
    • Sklern: For supervised and unsupervised learning. This library provides various tools for model fitting, data preprocessing, model selection, and model evaluation. It has built-in machine learning algorithms and models called estimators. Each estimator can be fitted to some data using its fit method.

    Using a Jupyter notebook for machine learning

    We’ll be using the MANUela ML model as a notebook example to explore various components needed for machine learning. The data used to train the model is located in the raw-data.csv file.

    The notebook follows the workflow shown in Figure 6. An explanation of the steps follows.

    The notebook workflow for machine learning is explained in the sections of text that follow.
    Figure 6: Notebook workflow for machine learning.

    Step 1: Explore raw data

    Use a code cell to import the required Python libraries. Then, convert the raw data file (raw-data.csv) to a DataFrame with a time series, an ID for the pump, a vibration value, and a label indicating an anomaly. The required Python code is shown in a code cell in Figure 7.

    Using a code cell to hold the IPython code that imports libraries and converts the raw data.
    Figure 7: Importing libraries and converting raw data.

    Running the cell produces a DataFrame with raw data, shown in Figure 8.

    Data frame with raw data
    Figure 8: Data frame with raw data.

    Now visualize the DataFrame. The upper graph in Figure 9 shows a subset of the vibration data. The lower graph shows manually labeled data with anomalies (1 = anomaly, 0 = normal). These are the anomalies that the machine learning model should detect.

    A visualization shows raw data and anomalies as two charts.
    Figure 9: Visualizing raw data and anomalies.

    Before it can be analyzed, the raw data needs to be transformed, cleaned, and structured into other formats more suitable for analysis. This process is called data wrangling or data munging.

    We’ll be converting the raw time series data into small episodes that can be used for supervised learning. The code is shown in Figure 10.

    Code for creating a new data frame
    Figure 10: Creating a new data frame.

    We want to convert the data to a new DataFrame with episodes of length 5. Figure 11 shows a sample time series data set.

    Example of time series data
    Figure 11: Example of time series data.

    If we convert our sample data into episodes with length = 5, we get results similar to Figure 12.

    New data frame with data converted into episodes
    Figure 12: New data frame with episodes.

    Let’s now convert our time series data into the episodes, using the code in Figure 13.

    Code for converting data into episodes

    Figure 13: Converting data into episodes.

    Figure 14 explores the data with episodes of length 5 and the label in the last column.

    Episodes of length 5 and the label in the last column

    Figure 14: Episodes of length 5 and the label in the last column.

    Note: In Figure 14, column F5 is the latest data value, where column F1 is the oldest data for a given episode. The label L indicates whether there is an anomaly.

    The data is now ready for supervised learning.

    Step 2: Feature and target columns

    Like many machine learning libraries, Sklern requires separated feature (X) and target (Y) columns. So Figure 15 splits our data into feature and target columns.

    Code for splitting data into feature and target columns

    Figure 15: Splitting data into feature and target columns.

    Step 3: Training and testing data sets

    It’s a good practice to divide your data set into two subsets: One to train a model and the other to test the trained model.

    Our goal is to create a model that generalizes well to new data. Our test set will serve as a proxy for new data. We’ll split the data set into 67% for the training sets and 33% for the test set, as shown in Figure 16.

    Code for splitting data into training and test data sets

    Figure 16: Splitting data into training and test data sets.

    We can see that the anomaly rate for both training and test sets is similar; that is, the data set is pretty fairly divided.

    Step 4: Model training

    We will perform model training with a DecisionTreeClassifier. Decision Trees is a supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.

    DecisionTreeClassifier is a class that performs multi-class classification on a dataset, although in this example we’ll be using it for classification into a single class. DecisionTreeClassifier takes as input two arrays: An array X as features and an array Y as labels. After being fitted, the model can then be used to predict the labels for the test data set. Figure 17 shows our code.

    Code for model training with DecisionTreeClassifier

    Figure 17: Model training with DecisionTreeClassifier.

    We can see that the model achieves a high accuracy score.

    Step 5: Save the model

    Save the model and load it again to validate that it works, as shown in Figure 18.

    Code for saving the model

    Figure 18: Saving the model.

    Step 6: Inference with the model

    Now that we've created the machine learning model, we can use it for inference on real-time data.

    In this example, we’ll be using Seldon to serve the model. For our model to run under Seldon, we need to create a class that has a predict method. The predict method can receive a NumPy array X and return the result of the prediction as:

    • A NumPy array
    • A list of values
    • A byte string

    Our code is shown in Figure 19.

    Code for using Seldon to serve the ML Model

    Figure 19: Using Seldon to serve the machine learning model.

    Finally, let’s test whether the model can predict anomalies for a list of values, as shown in Figure 20.

    Code for Inference using the model

    Figure 20: Inference using the model.

    We can see that the model achieves a high score for inference, as well.

    References for this article

    See the following sources for more about the topics discussed in this article:

    • Welcome to Colaboratory
    • About GESIS Notebooks
    • The Project Jupyter homepage
    Last updated: August 15, 2022

    Recent Posts

    • Federated identity across the hybrid cloud using zero trust workload identity manager

    • Confidential virtual machine storage attack scenarios

    • Introducing virtualization platform autopilot

    • Integrate zero trust workload identity manager with Red Hat OpenShift GitOps

    • Best Practice Configuration and Tuning for Linux and Windows VMs

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.