Submit remote RayJobs to a Ray cluster with the CodeFlare SDK

One of the chief benefits of the CodeFlare SDK within Red Hat OpenShift AI is its ability to provide Ray features to a data scientist, abstracting away many infrastructure requirements. A user of the CodeFlare SDK simply needs to execute a small number of Python cells to get access to a Ray cluster running on OpenShift.

While this is relatively simple, some users might wish to abstract this further and have a cluster administrator manage the Ray cluster custom resources. The admin might provide a data scientist with access to the Ray dashboard and allow them to submit their jobs directly.

To demonstrate how to achieve this functionally, the CodeFlare SDK GitHub repository features a guided Jupyter notebook to help get users started when attempting this. This blog post walks through that notebook in detail, explaining each step.

Set up the remote RayJobClient

Note

Before you begin, you will need a Ray dashboard URL from your cluster admin. Then, simply open a Jupyter notebook to get started.

The first step is to import the CodeFlare SDK Python package. To do this, you can paste the following code into a cell and execute it. Alternatively, you can simply copy the guided notebook as a template and change cells as needed.

# Import dependencies from codeflare-sdk
from codeflare_sdk import RayJobClient

Next, we'll need to declare some authorization-related variables. To do this, you'll need to get an OpenShift auth token from your cluster administrator. Once you have this token, create a new cell matching the below snippet and paste your token into the auth_token placeholder.

# Setup Authentication Configuration 
auth_token = "XXXX" # Replace with the actual token
header = {
    'Authorization': f'Bearer {auth_token}'
}

Connect to the Ray dashboard

We need to authenticate the Ray dashboard to use the CodeFlare SDK in this way. You'll also need to declare the dashboard URL itself. Replace the placeholder in the below with your Ray dashboard URL and execute the cell.

# Gather the dashboard URL (provided by the creator of the RayCluster)
ray_dashboard = "XXXX" # Replace with the Ray dashboard URL

Now we need to pass all of these variables to the CodeFlare SDK's RayJobClient. Execute the following cell to perform this final piece of setup.

# Initialize the RayJobClient
client = RayJobClient(address=ray_dashboard, headers=header, verify=True)

Submit and monitor your Ray job

You're now ready to submit a job on the Ray cluster. To do this, we'll need:

An entrypoint_command that refers to the execution command for your job script (i.e, python <your_job_script_name>).
A runtime_env that declares the directory in which your script is located and any prerequisites such as installing requirements, etc.

With the above, we'll invoke the client.submit_job() function with a variable declaration for submission_id. This lets us easily pass the job's submission_id to other client functions to observe the job results within the notebook itself. Replace the placeholders within the following snippet and execute the cell to submit the job.

# Submit a job using the RayJobClient
entrypoint_command = "python XXXX" # Replace with the training script name
submission_id = client.submit_job(
    entrypoint=entrypoint_command,
    runtime_env={"working_dir": "./","pip": "requirements.txt"},
)

With the job submitted, you can visit the Ray dashboard URL and go to the Jobs tab to observe it. Alternately, you can execute the following cell to use the client to check the job status.

# Get the job's status
client.get_job_status(submission_id)

You can also surface the job logs within your Jupyter notebook by executing the following cell.

# Get the job's logs
client.get_job_logs(submission_id)

Summary

The example in this article is just a quick example of some of the ways you can leverage the CodeFlare SDK within OpenShift AI. You can find additional examples in the demo notebooks directory of the CodeFlare SDK GitHub repository.

For additional information, check out the CodeFlare SDK documentation.

Report a website issue

Red Hat Developer Sandbox

Programming languages & frameworks

System design & architecture

Developer experience

Automated data processing

Platform engineering

Secure development & architectures

E-books

Cheat sheets

Documentation

Submit remote RayJobs to a Ray cluster with the CodeFlare SDK

Set up the remote RayJobClient

Connect to the Ray dashboard

Submit and monitor your Ray job

Summary

Right-sizing recommendations for OpenShift Virtualization

OpenJDK 25 now available in Red Hat Enterprise Linux 10.1

Migrating Red Hat Ansible Automation Platform: From RPM to container on Red Hat Enterprise Linux

Python 3.9 reaches end of life: What it means for RHEL users

Upgrade air-gapped OpenShift with self-signed certificates

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue