How to serve embeddings models on OpenShift AI

In this article, you will learn how to facilitate word embeddings tasks using a Sentence Transformer model deployed on Caikit Standalone serving runtime using Red Hat OpenShift AI.

Introduction

Word embeddings are representations of text in the form of real-valued vectors. They are the foundation of several natural language processing (NLP) applications such as Retrieval Augmentation Generation. Caikit NLP is a toolkit/runtime that provides modules and exposes inference endpoints for large language models (LLMs).

In this tutorial, we will serve a Sentence Transformer model using the Caikit Standalone serving runtime. You will learn how to convert an LLM to Caikit format, deploy the model onto a serving runtime, and send requests to the served model.

Prerequisites

Access to an OpenShift cluster with Red Hat OpenShift AI 2.11+ installed.

Demo set up

First, let’s set up your working environment as a project within OpenShift AI:

Log in to the OpenShift AI dashboard on your OpenShift cluster.
Navigate to Data Science Projects.
Click the Create data science project button.
Give your project a name; for example, caikit-embeddings.
Finally, click Create.

Storage data connection

For this demo, you will need an S3-compatible storage object bucket such as MinIO or AWS S3. If you already have a bucket deployed, skip down to the next section titled Create Workbench and if not, continue the following steps to install a local MinIO bucket.

In your terminal, login to your OpenShift cluster:
```
oc login --token=<token> --server=<server>
```

Run the following command to install a local MinIO bucket in your project namespace:

export NAMESPACE="caikit-embeddings"
                oc apply -n ${NAMESPACE} -f https://github.com/christinaexyou/caikit-embeddings-demo/raw/main/minio.yaml

Expected output:

persistentvolumeclaim/minio-pvc created

secret/minio-secret created

deployment.apps/minio created

service/minio-service created

route.route.openshift.io/minio-api created

route.route.openshift.io/minio-ui created

As a sanity check, run the following command to confirm that a MinIO pod has been deployed in your namespace:

oc get pods -n ${NAMESPACE}

Expected output:

READY                            STATUS      RESTARTS      AGE
minio-69b985c6b5-w599x   1/1     Running     0             14m

Create workbench

You can define the cluster size and compute resources needed to run the workload.

Click the Workbenches tab and create a workbench with the following specifications:

Name: caikit-embeddings
Image selection: Pytorch
Version selection: 2024.1
Container size: Large
Accelerator: None

Under Data connections, select Create new connection if you already deployed an S3 bucket prior to this demo or select Use existing data connection if you followed the previous section.
If you’ve selected Create new connection, fill out the following fields according to your S3 Object Storage credentials. If you’ve selected Use existing data connection, select My Storage from the list.
Click Create workbench. You will be redirected to the project dashboard where the workbench is starting. Wait a few minutes for your workbench status to change from Starting to Running
Access your workbench by clicking Open.

Bootstrap the model

We need to bootstrap our model using the Caikit library to convert it to Caikit format. The result of this bootstrapping step is that it creates a config.yaml file that specifies the task module in the Caikit-NLP library and saves the model’s files in a new path named artifacts. For the purposes of this tutorial, we will be using BAAI/bge-en-large-v1.5 as our model, but it is replaceable with any of the supported embeddings models.

Once you are in your Jupyter environment, click the Git icon on the left side of the screen. Click Clone a repository and paste the following repository URL: https://github.com/christinaexyou/caikit-embeddings-demo/tree/main
Our first step is to install the required libraries. Navigate to caikit-embeddings > notebooks > 1-boostrap-model.ipynb. A requirements.txt file has been preconfigured with the required libraries and their correct versions. Run the following command to install them:
```
!pip install -r requirement.txt
```

Import the required libraries and packages into your Jupyter environment:

from caikit_nlp.modules.text_embedding import EmbeddingModule as module

Bootstrap the model which creates a config.yml file:

model_name_or_path = "BAAI/bge-en-large-v1.5"
output_path = "../models/bge-en-large-caikit"
module.bootstrap(model_name_or_path).save(output_path)

For a sanity check, ensure that the following files are in the models/bge-en-large-caikit directory:

ls -l ../models/bge-en-large-caikit

Expected output:

drwxr-xr-x@ 14 chrxu  staff  448 Jul 16 07:45 artifacts
rw-r--r--@  1 chrxu  staff  336 Jul 16 07:45 config.yml

Upload the model to S3 storage

Before we deploy the model, we need to upload it to S3 storage (Amazon, MinIO). Click 02-upload_model.ipynb.

Import the required libraries:

import os
import time
import boto3
import botocore

Run the following code block to retrieve your credentials and define helper functions that will help you upload and list objects in your storage bucket:

aws_access_key_id = os.environ.get('AWS_ACCESS_KEY_ID')
aws_secret_access_key = os.environ.get('AWS_SECRET_ACCESS_KEY')
endpoint_url = os.environ.get('AWS_S3_ENDPOINT')
region_name = os.environ.get('AWS_DEFAULT_REGION')
bucket_name = os.environ.get('AWS_S3_BUCKET')
session = boto3.session.Session(
            aws_access_key_id=aws_access_key_id,
            aws_secret_access_key=aws_secret_access_key
          )
s3_resource = session.resource(
               's3',
               config=botocore.client.Config(signature_version='s3v4'),
               endpoint_url=endpoint_url,
               region_name=region_name
              )
bucket = s3_resource.Bucket(bucket_name)
                        
def upload_directory_to_s3(local_directory, s3_prefix):
    for root, dirs, files in os.walk(local_directory):
        for filename in files:
            file_path = os.path.join(root, filename)
            relative_path = os.path.relpath(file_path, local_directory)
            s3_key = os.path.join(s3_prefix, relative_path)
            print(f"{file_path} -> {s3_key}")
            bucket.upload_file(file_path, s3_key)
                                
def list_objects(prefix):
    filter = bucket.objects.filter(Prefix=prefix)
    for obj in filter.all():
        print(obj.key)

Upload your model to your S3 storage bucket:

upload_directory_to_s3("../models", "models")

Perform a sanity check to make sure that all the files have been uploaded:

list_objects("models")

Expected output:

drwxr-xr-x@ 14 chrxu  staff  448 May 29 08:57 artifacts
-rw-r--r--@  1 chrxu  staff  336 May 29 08:50 config.yml

Deploy model

Click File > Hub Control Panel to navigate back to the OpenShift AI dashboard. Click the Models tab and then in the Single-model serving tile, click Deploy Model.
In the form:
1. Fill out the Model Name with the value bge-en-caikit.
2. Select the Serving runtime: Caikit Standalone Serving Runtime for KServe.
3. Select the Model framework: caikit.
4. Set the Model server replicas to 1.
5. Select the Model Server size: Small.
6. Select the Existing data connection: My Storage.
7. Enter the Path to your uploaded model: models.
Click Deploy.
Wait until the Status shows a green checkmark to ensure your model’s been deployed.

Model inference

In the OpenShift AI dashboard, click the Models tab and copy the model’s inference endpoint.
Return to the Jupyter environment and click notebooks > 3-rest_requests.ipynb.
You can either run each cell individually or click on Run all to test out the different embeddings endpoints.

Conclusion

In this article, you set up a working project using OpenShift AI, deployed a local MinIO bucket, prepared a model for serving using the Caikit NLP module, deployed it on Caikit Standalone Serving Runtime, and made various inference requests to perform RAG-related tasks.

The single serving platform on Red Hat OpenShift AI supports additional LLM tasks. For more information, visit the OpenShift AI product page.

How to serve embeddings models on OpenShift AI

Share:

Introduction

Prerequisites

Demo set up

Storage data connection

Create workbench

Bootstrap the model

Upload the model to S3 storage

Deploy model

Model inference

Conclusion

Products

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue