One of the chief benefits of the CodeFlare SDK within Red Hat OpenShift AI is its ability to provide Ray features to a data scientist, abstracting away many infrastructure requirements. A user of the CodeFlare SDK simply needs to execute a small number of Python cells to get access to a Ray cluster running on OpenShift.
While this is relatively simple, some users might wish to abstract this further and have a cluster administrator manage the Ray cluster custom resources. The admin might provide a data scientist with access to the Ray dashboard and allow them to submit their jobs directly.
To demonstrate how to achieve this functionally, the CodeFlare SDK GitHub repository features a guided Jupyter notebook to help get users started when attempting this. This blog post walks through that notebook in detail, explaining each step.
Set up the remote RayJobClient
Note
Before you begin, you will need a Ray dashboard URL from your cluster admin. Then, simply open a Jupyter notebook to get started.
The first step is to import the CodeFlare SDK Python package. To do this, you can paste the following code into a cell and execute it. Alternatively, you can simply copy the guided notebook as a template and change cells as needed.
# Import dependencies from codeflare-sdk
from codeflare_sdk import RayJobClient
Next, we'll need to declare some authorization-related variables. To do this, you'll need to get an OpenShift auth token from your cluster administrator. Once you have this token, create a new cell matching the below snippet and paste your token into the auth_token
placeholder.
# Setup Authentication Configuration
auth_token = "XXXX" # Replace with the actual token
header = {
'Authorization': f'Bearer {auth_token}'
}
Connect to the Ray dashboard
We need to authenticate the Ray dashboard to use the CodeFlare SDK in this way. You'll also need to declare the dashboard URL itself. Replace the placeholder in the below with your Ray dashboard URL and execute the cell.
# Gather the dashboard URL (provided by the creator of the RayCluster)
ray_dashboard = "XXXX" # Replace with the Ray dashboard URL
Now we need to pass all of these variables to the CodeFlare SDK's RayJobClient
. Execute the following cell to perform this final piece of setup.
# Initialize the RayJobClient
client = RayJobClient(address=ray_dashboard, headers=header, verify=True)
Submit and monitor your Ray job
You're now ready to submit a job on the Ray cluster. To do this, we'll need:
- An
entrypoint_command
that refers to the execution command for your job script (i.e,python <your_job_script_name>
). - A
runtime_env
that declares the directory in which your script is located and any prerequisites such as installing requirements, etc.
With the above, we'll invoke the client.submit_job()
function with a variable declaration for submission_id
. This lets us easily pass the job's submission_id
to other client functions to observe the job results within the notebook itself. Replace the placeholders within the following snippet and execute the cell to submit the job.
# Submit a job using the RayJobClient
entrypoint_command = "python XXXX" # Replace with the training script name
submission_id = client.submit_job(
entrypoint=entrypoint_command,
runtime_env={"working_dir": "./","pip": "requirements.txt"},
)
With the job submitted, you can visit the Ray dashboard URL and go to the Jobs tab to observe it. Alternately, you can execute the following cell to use the client to check the job status.
# Get the job's status
client.get_job_status(submission_id)
You can also surface the job logs within your Jupyter notebook by executing the following cell.
# Get the job's logs
client.get_job_logs(submission_id)
Summary
The example in this article is just a quick example of some of the ways you can leverage the CodeFlare SDK within OpenShift AI. You can find additional examples in the demo notebooks directory of the CodeFlare SDK GitHub repository.
For additional information, check out the CodeFlare SDK documentation.