Methods for understanding data

MLOps, short for machine learning operations, is a set of practices and tools that combines DevOps principles applied to the development cycle of artificial intelligence applications.

Kubeflow Pipelines is an open source platform for implementing MLOps, providing a framework for building, deploying, and managing machine learning workflows in a scalable, repeatable, secure, and cloud-oriented manner on Kubernetes.

With the ability to drive agility and efficiency in the development and deployment of machine learning models, MLOps with Kubeflow Pipelines can also improve collaboration between data scientists and machine learning engineers, ensuring consistency and reliability throughout every step of the workflow.

MLOps

Created from the DevOps discipline, MLOps can be defined as a combination of cultural philosophies, practices and processes supported by platforms and tools with the goal of increasing an organization's ability to deliver artificial intelligence and machine learning applications and services with greater speed, without compromising safety and quality.

In general, a machine learning model development cycle has these steps, as shown in Figure 1 and described below:

  • Gather and prepare data
  • Develop model
  • Deploy models in an application
  • Model monitoring and management
Figure 1: Machine learning workflow
Figure 1: Machine Learning Workflow
Figure 1: Machine Learning Workflow

Gather and prepare data

Gather and prepare structured and/or unstructured data from data storages, data lakes, databases, and real-time data from streams like Kafka.

Develop model

Develop the model using libraries such as TensorFlow and PyTorch.

Deploy models in an application

Automate the model build and deployment process, implementing CI/CD, using tools such as Kubeflow Pipelines.

Model monitoring and management

Use tools for observability, such as Prometheus and Grafana, implementing logging and monitoring and using this feedback to improve model performance, in addition to detecting biases and model drift.

Kubeflow Pipelines

In the context of MLOps, an AI/ML pipeline refers to the end-to-end process of deploying and managing machine learning models in a production environment. It encompasses various stages from data preparation and model training to deployment, monitoring, and continuous improvement. An AI/ML pipeline is crucial for automating and streamlining the workflow of machine learning projects, ensuring efficiency, reproducibility, and scalability.

Kubeflow Pipelines is an open source machine learning platform designed to orchestrate and automate the building, deployment, and management of machine learning workflows on Kubernetes, with the ability to implement every step of operations on development of machine learning models, MLOps.

By leveraging Kubeflow Pipelines, organizations can streamline their machine learning operations and enable collaboration between team members, data scientists and machine learning engineers. Figure 2 shows an example of a pipeline run.

Figure 2: Kubeflow Pipelines example
Figure 2: Kubeflow Pipelines Example
Figure 2: Kubeflow Pipelines example

Creating a pipeline

As a requirement to create your pipeline, you need access to a Kubernetes environment with Kubeflow Pipelines installed. In addition to the infrastructure, Python (also possible with R) and Jupyter notebook (optional) will be used.

There are no restrictions, but we suggest using Red Hat OpenShift AI, as all these components are already installed by default. It is possible to create a sandbox environment through the Red Hat Developer website at no cost.

The pipeline to be created will have 3 tasks: generating a message for a given name, generating a random number within a given range and generating a final message, an odds or evens game message, based on the previous message and the random number.

All the code can be found in the ml_pipelines repository.

Creating the tasks

The tasks will be functions implemented in Python and executed by the pipeline. There are other ways to implement tasks, but in this format it is simple to maintain and reuse the code, making it possible to scale and make it also possible for different teams to work together.

You can find the implementation for all tasks in the components folder of the repository.

Create Hello World message

Creates a personalized greeting message for the given name. Here is the implementation of this task, in which you can find in the script create_hello_world_message.py:

def create_hello_world_message(name : str) -> str:
    """
    Creates a personalized greeting message for the given name.

    Parameters:
        - name (str) : The name for which the message is created.

    Returns:
        - hello_world_message (str) : A personalized greeting message for the given name.

    Raises:
        - ValueError : If the given name is empty or None.
    """

    if not name:

        raise ValueError

    hello_world_message = f'Hello World, {name}!'

    print(f'name                : { name }')
    print(f'hello_world_message : { hello_world_message }')

    return hello_world_message

Create random number

Creates a random integer within the specified range, minimum and maximum, both included.

Create odds or evens message

Creates a game result message based on the given random number, containing the given hello world message.

Create the pipeline using Kubeflow Pipelines SDK for Tekton

There is an SDK that makes creating pipelines easier: Kubeflow Pipelines SDK. In this example, we will use the Kubeflow Pipelines SDK for Tekton to compile, upload, and run Kubeflow Pipeline DSL scripts on a Kubeflow Pipelines back end with Tekton.

The code for this pipeline implementation can be found in the kfp_tekton folder of the repository, specifically on the notebook 01_hello_world.ipynb.

Install kfp-tekton package

!pip install kfp-tekton==1.5.9

Import the necessary packages, including task functions

import os
import sys
sys.path.append(os.path.dirname(os.getcwd()))

import kfp
import kfp_tekton

from components.create_hello_world_message   import create_hello_world_message
from components.create_odds_or_evens_message import create_odds_or_evens_message
from components.create_random_number         import create_random_number

Create the tasks components

Building Python function-based components.

task_base_image = 'registry.access.redhat.com/ubi9/python-311'

create_hello_world_message_op = kfp.components.create_component_from_func(
    func       = create_hello_world_message,
    base_image = task_base_image
)

create_random_number_op = kfp.components.create_component_from_func(
    func       = create_random_number,
    base_image = task_base_image
)

create_odds_or_evens_message_op = kfp.components.create_component_from_func(
    func       = create_odds_or_evens_message,
    base_image = task_base_image
)

Create the pipeline

pipeline_name        = '01_hello_world'
pipeline_description = 'Hello World Pipeline'

@kfp.dsl.pipeline(
    name        = pipeline_name,
    description = pipeline_description
)
def pipeline(
    name    : str,
    minimum : int,
    maximum : int
):

    create_hello_world_message_task = create_hello_world_message_op(
        name = name
    )

    create_random_number_task = create_random_number_op(
        minimum = minimum,
        maximum = maximum
    )

    create_odds_or_evens_message_op(
        hello_world_message = create_hello_world_message_task.output,
        random_number       = create_random_number_task.output
    )

Create pipeline YAML

In this step, a Tekton-compatible PipelineRun YAML file will be created.

pipeline_package_path = os.path.join('yaml', f'{ pipeline_name }.yaml')

kfp_tekton.compiler.TektonCompiler().compile(
    pipeline_func = pipeline,
    package_path  = pipeline_package_path
)

Using the Kubeflow Pipelines graphical interface, you can use the generated YAML file to import the pipeline, as shown in Figure 3.

Figure 3: Importing the pipeline YAML
Figure 3: Importing pipeline yaml
Figure 3: Importing the pipeline YAML

The newly imported pipeline should be visible, showing the flow of tasks and available to be executed (Figure 4).

Figure 4: Viewing the imported pipeline through the YAML file
Figure 4: Pipeline
Figure 4: Viewing the imported pipeline through the YAML file

Once imported, we can run the pipeline through the graphical interface and monitor its progress (Figure 5).

Figure 5: Running the pipeline
Figure 5: Running the pipeline
Figure 5: Running the pipeline

Create pipeline run

It is possible to execute a pipeline via code, either through the function annotated by @pipeline or by executing the pipeline definition YAML file.

kubeflow_host  = ''
kubeflow_token = ''

pipeline_arguments = {
    'name'    : 'ML Pipelines',
    'minimum' : 1,
    'maximum' : 100
}

kfp_tekton.TektonClient(host = kubeflow_host, existing_token = kubeflow_token).create_run_from_pipeline_package(
    pipeline_file = pipeline_package_path,
    arguments     = pipeline_arguments
)

As before, we can observe the pipeline execution through the Kubeflow Pipelines graphical interface (Figure 6).

Figure 6: Running pipeline from Jupyter Notebook
Figure 6: Pipeline run
Figure 6: Running pipeline from Jupyter Notebook

Creating the pipeline using Elyra

Elyra is an open source set of extensions to JupyterLab notebooks focused on AI/ML development. It provides a Pipeline Visual Editor for building AI pipelines from notebooks, Python scripts, and R scripts, simplifying the conversion of multiple notebooks or scripts files into batch jobs or workflows.

Pipelines created by this plug-in are .json files with a .pipeline extension. You can check the Hello World Pipeline in the elyra folder of the repository, and it can be opened directly in a Jupyter notebook (make sure you have the extension installed).

Before creating the pipeline, configure the runtime and runtime images to run the tasks. If you are using Red Hat OpenShift AI, all configuration will be set by default.

Drag and drop the components to create the tasks

You can also connect the outputs and inputs of the tasks (see Figure 7).

Figure 7: Elyra pipeline tasks
Figure 7: Elyra Pipeline Tasks
Figure 7: Elyra pipeline tasks

Configure tasks

Create the pipeline parameters and inject them into the corresponding tasks. Also, configure the runtime image and output file. Everything can be done from the graphical interface.

create_hello_world_message.py
runtime image: Python 3
pipeline parameter: name (str)
output: create_hello_world.json
create_random_number.py
runtime image: Python 3
pipeline parameter: minimum (int)
pipeline parameter: maximum (int)
output: create_random_number.json
create_odd_or_evens_message.py
runtime image: Python 3
output: create_odds_or_evens_message.json
 

Warning alert:

Elyra Pipeline

Using the Elyra Pipeline plug-in, components are executed in script format. It is necessary to code a wrapper to handle the execution. Example from create_hello_world_message.py:

if __name__ == '__main__':
    """
    Elyra Pipelines
    """

    import os
    import json

    name = os.getenv('name')

    hello_world_message = create_hello_world_message(
        name = name
    )

    output = {
        'name'                : name,
        'hello_world_message' : hello_world_message
    }

    with open('create_hello_world_message.json', 'w', encoding = 'utf-8') as output_file:

        json.dump(output, output_file, ensure_ascii = False, indent = 4)

If everything is configured correctly, you will be able to run the pipeline. Figure 8 shows the Hello World pipeline from the repository 01_hello_world.pipeline:

Figure 8: Elyra Hello World pipeline
Figure 8: Elyra Hello World Pipeline
Figure 8: Elyra Hello World pipeline

When running the pipeline, you should receive a job submission confirmation message, as illustrated in Figure 9.

Figure 9: Elyra Hello World pipeline job submission
Figure 9: Elyra Hello World Pipeline Job Submission
Figure 9: Elyra Hello World pipeline job submission

You can follow the execution through the Kubeflow Pipelines graphical interface (Figure 10).

Figure 10: Elyra Hello World pipeline running
Figure 10: Elyra Hello World Pipeline Run
Figure 10: Elyra Hello World pipeline running

Red Hat OpenShift AI

If you are using OpenShift AI, just like before using kfp-tekton, you can import PipelineRun YAML files or check imported pipelines directly through the OpenShift console (Figure 11).

Figure 11: OpenShift AI pipeline
Figure 11: RHOAI Pipeline
Figure 11: OpenShift AI pipeline

You can also search and check previous execution of pipelines executed through kfp-tekton or Elyra, as shown in Figure 12.

Figure 12: OpenShift AI pipeline runs
Figure 12: RHOAI Pipeline Runs
Figure 12: OpenShift AI pipeline runs

Finally, you can create new runs and experiments (Figure 13).

Red Hat OpenShift AI Pipeline Run.
Figure 13: RHOAI Pipeline Run
Figure 13: OpenShift AI pipeline run.

Conclusion

There are several tools for automating machine learning workflows. In this article, we presented 2 ways to create and run pipelines in a code (kfp-tekton) or visual way (elyra). Choose the way that best suits your process, that you and your team feel most comfortable with.

In conclusion, implementing MLOps with Kubeflow Pipelines represents a fundamental advancement in establishing a robust and efficient machine learning lifecycle.

As organizations increasingly recognize the importance of seamless collaboration between data scientists, machine learning engineers, and operations teams, Kubeflow Pipelines emerges as a powerful orchestration platform that simplifies deployment, monitoring, and management of machine learning workflows.

Last updated: April 1, 2024