Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Hyperparameter Optimisation with Ray Tune

August 16, 2024
Nicholas Caughey Kusuma Chalasani
Related topics:
Data Science
Related products:
Red Hat OpenShift AI

Share:

    An introduction to Hyperparmeter optimization

    In the dynamic world of machine learning, optimizing model performance is not just a goal—it's a necessity. This comprehensive guide aims to simplify the intricate process of hyperparameter optimization, leveraging the power of OpenShift AI, Ray Tune, and Model Registry to enhance model accuracy and efficiency. This guide is meticulously detailed based on the example code provided in this repository, offering a practical and hands-on approach to the optimization process. 

    Setup

    Before embarking on this journey, it's essential to have the right tools and resources at your disposal. You'll need:

    • An OpenShift cluster (4.0+) with OpenShift AI (RHOAI) 2.10+ installed:
      • The codeflare, dashboard, ray and workbenches components enabled;
    • Sufficient worker nodes for your configuration(s)
    • An AWS S3 bucket to store experimentation results;

    Setting the Stage: Setting up a Data Science Project and Ray Clusters on OpenShift AI

    The initial step in our optimization journey is setting up our Data Science project within the OpenShift AI cluster. To get started, ensure you have the RedHat OpenShift AI operator installed from the Operator Hub. Once installed, this operator becomes available as a service, facilitating the creation and management of Data Science projects.

     

    Assuming installation is completed, access the OpenShift AI dashboard from the top navigation bar menu:

    Image showing how to open up the OpenShift AI dashboard

    After this initiate the creation of a new Data Science project. This is where you will be able to view and manage all of the workbenches you create.

    Image showing an example of the input text for the "Create data science project" section

    Following this if one does not already exist create a cluster storage, where we will store local information.

    Image showing example input for "Add cluster storage" section

    Once this is complete, create a new workbench with a standard data science image. Below is an example of the  workbench settings. The notebook image selections which should be set to the latest versions to avoid issues. If you wish to use preexisting persistent storage, change the configuration as necessary. When the workbench is operational, you can directly access the Jupyter notebook environment. 

    Image showing example input for the configuration for a new Data sceince workbench

    Post the creation of the workbench, we can add a data connection with relevant details. If you are simply running this as an example and wish to create a local Minio storage (not meant for production) feel free to follow the steps found here. 

    Image showing example input for "Add data connection" section

    Assuming all the previous steps have been followed we should now be able to open the workbench allowing us to access the Jupyter notebook. Here we will clone the relevant repository and begin our journey using raytune to perform hyper parameter optimisation. 

    After cloning you will be able to see 3 different examples in the examples folder. In these examples you will utilize the CodeFlare SDK to configure and launch our cluster, ensuring it is fully equipped to manage the intricate demands of our machine learning tasks. OpenShift AI excels in optimizing for distributed workloads by employing a strategy that involves the integration of Ray Clusters. In this setup, a collection of worker nodes are seamlessly connected to a central Ray head node, facilitating the efficient execution of distributed workloads. 

    To tailor the Ray cluster to our needs, we specify the CPU and memory resources allocated to each node. Once configured, we bring up the cluster, ready to be utilized for our Hyperparameter Optimization (HPO) tasks. This setup ensures that our project is not only well-organized but also prepared to handle the computational demands of our optimization process.

    # Create and configure our cluster object (and appwrapper)
    cluster = Cluster(ClusterConfiguration(
    	name='terrestial-raytest',
    	num_workers=2,
    	min_cpus=1,
    	max_cpus=1,
    	min_memory=4,
    	max_memory=4,
    	num_gpus=0,
    	image="quay.io/rhoai/ray:2.23.0-py39-cu121"
    ))

    An example of the configuration using codeflare sdk in /demos/raytune-oai-demo.ipynb

    In the next few sections we will discuss the code contained in the examples folder of this repository. Feel free to follow along with the code contained in the notebooks.

    The Heart of the Matter: Hyperparameter Optimization with Ray Tune

    In this demo we're focusing on finding the optimal hyperparameters for a Simple Neural Network model using Ray Tune. This involves tuning two key parameters: hidden_size and learning_rate. Given that we're leveraging a PyTorch example, it's crucial to ensure that all necessary packages, including torch and Ray Tune, are installed in our cluster environment. This necessitates re-instantiating these packages to ensure they're correctly set up.

    # Additional libs
    runtime_env = {"pip": ["ipython","torch","onnx","ray[train]","protobuf==3.20.1"]}
    
    ray.init(address=ray_cluster_uri, runtime_env=runtime_env, _system_config={"PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION": "python"})
    
    print("Ray cluster is up and running: ", ray.is_initialized())

    Ensuring the environment is correctly setup

    Once the Ray Clusters are operational, we proceed to tune the model with Ray Tune, specifying the number of samples (trials) we wish to run. After each trial concludes, we return the model's accuracy and the trial model itself.

    Upon completion of the Ray Tune process, we're presented with the best trial and the corresponding optimal hyperparameters. Ray Tune enables us to explore a broad spectrum of hyperparameters, testing various combinations to identify the one that achieves the highest accuracy. There are multiple strategies we can employ within Ray Tune to enhance our tuning process.

    The Importance of Metadata

    Metadata in HPO experiments is the goldmine of insights. It includes details about each trial's configuration, performance metrics, and even the state of the model at the end of the trial. This information is invaluable for understanding the optimization process, identifying trends, and refining future experiments.

    What is the Model Registry

    If we view the example raytune-oai-MR-gRPC-demo.ipynb we can see we utilize the Model Registry. The Model Registry is a central repository for model developers to store and manage versions and artifacts metadata. This Go-based application leverages the ml_metadata project, as well as providing a  python api for ease of use. It is a key component in the process of managing models. In our use case it is particularly useful for managing the large number of different model versions that are generated and used.

    Integrating Model Registry for Metadata Management

    To seamlessly integrate Model Registry into our HPO process, it's crucial to confirm its setup and readiness.

    To utilize the default Model Registry service, install the model registry operator and start the service as per the instructions provided here.

    Upon integration, each trial's metadata is captured and stored via the Model Registry. This encompasses the hyperparameters utilized, the performance metrics achieved, and any other pertinent details. ModelRegistry ensures that this data is well-organized, easily accessible, and prepared for analysis.

    In our example code, we generate various types of metadata, including:

    kf.HPOConfig (Artifact) for saving HPO configurations.

    kf.HPOExperiment (Context) for saving HPO experiment details, serving as a parent to HPOTrial.

    kf.HPOTrial (Context) for saving trial information for each experiment, acting as a child to HPOExperiment.

    This metadata is saved as part of the HPO run, facilitating a comprehensive analysis. The example code utilizes the python gRPC API to access the Model Registry metadata, with future support for REST APIs. Allowing for the comparison of different trials and the identification of top optimized deployments in real-time, even as the HPO experiment continues for other trials.

    Enhancing Model Deployment

    Having a comprehensive record of our HPO experiments is not just about understanding past performance. It also informs our future model deployments. By analyzing the metadata, we can identify the most effective configurations and apply them to new models. This ensures that our deployments are not just successful but also optimized for the best possible performance.

     

    The Final Frontier: Saving and Sharing the Best Model

    Upon identifying the optimal model, it's time to make it known to the world. We choose to save our model in ONNX format, a universal standard that guarantees compatibility across a wide range of platforms and frameworks. This step is pivotal for deploying our model in various environments, ensuring its accessibility to a broader audience.

    OpenShift AI supports deployment in multiple frameworks, with ONNX being one of them. By saving our model in ONNX format, we align with this support, facilitating a smooth deployment process across different platforms.

     

    The Grand Finale: Deploying the Model for Inference

    The culmination of our optimization journey is the deployment of our refined model for inference. We upload our model to an AWS S3 bucket, ensuring its accessibility for practical applications. To facilitate deployment, we navigate to our Data Science project, deploy the model directly from the dashboard, and obtain the inference URLs. This allows us to access the model for real-world applications.

    If our interest lies in exploring the top 5 optimized models from a 50-trial experiment, we have the flexibility to save multiple models. This approach enables us to conduct further experiments with these models, enhancing our understanding and refining our optimization efforts.

    Utilizing a REST API, we can now send data to our model and receive predictions, demonstrating the effectiveness of our optimized model in a practical setting.

     

    Conclusion: The Journey Continues

    This guide has provided a glimpse into the world of hyperparameter optimization, showcasing how OpenShift AI, Ray Tune, and the Model Registry can be used to optimize machine learning models. As we continue our journey, we'll explore more advanced techniques and tools, always striving to push the boundaries of what's possible in machine learning.

    Last updated: October 29, 2024
    Disclaimer: Please note the content in this blog post has not been thoroughly reviewed by the Red Hat Developer editorial team. Any opinions expressed in this post are the author's own and do not necessarily reflect the policies or positions of Red Hat.

    Recent Posts

    • Integrate Red Hat AI Inference Server & LangChain in agentic workflows

    • Streamline multi-cloud operations with Ansible and ServiceNow

    • Automate dynamic application security testing with RapiDAST

    • Assessing AI for OpenShift operations: Advanced configurations

    • OpenShift Lightspeed: Assessing AI for OpenShift operations

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue