In this article, you will learn how to facilitate word embeddings tasks using a Sentence Transformer model deployed on Caikit Standalone serving runtime using Red Hat OpenShift AI.
Introduction
Word embeddings are representations of text in the form of real-valued vectors. They are the foundation of several natural language processing (NLP) applications such as Retrieval Augmentation Generation. Caikit NLP is a toolkit/runtime that provides modules and exposes inference endpoints for large language models (LLMs).
In this tutorial, we will serve a Sentence Transformer model using the Caikit Standalone serving runtime. You will learn how to convert an LLM to Caikit format, deploy the model onto a serving runtime, and send requests to the served model.
Prerequisites
- Access to an OpenShift cluster with Red Hat OpenShift AI 2.11+ installed.
Demo set up
First, let’s set up your working environment as a project within OpenShift AI:
- Log in to the OpenShift AI dashboard on your OpenShift cluster.
- Navigate to Data Science Projects.
- Click the Create data science project button.
- Give your project a name; for example,
caikit-embeddings
. - Finally, click Create.
Storage data connection
For this demo, you will need an S3-compatible storage object bucket such as MinIO or AWS S3. If you already have a bucket deployed, skip down to the next section titled Create Workbench and if not, continue the following steps to install a local MinIO bucket.
In your terminal, login to your OpenShift cluster:
oc login --token=<token> --server=<server>
Run the following command to install a local MinIO bucket in your project namespace:
export NAMESPACE="caikit-embeddings" oc apply -n ${NAMESPACE} -f https://github.com/christinaexyou/caikit-embeddings-demo/raw/main/minio.yaml
Expected output:
persistentvolumeclaim/minio-pvc created secret/minio-secret created deployment.apps/minio created service/minio-service created route.route.openshift.io/minio-api created route.route.openshift.io/minio-ui created
As a sanity check, run the following command to confirm that a MinIO pod has been deployed in your namespace:
oc get pods -n ${NAMESPACE}
Expected output:
READY STATUS RESTARTS AGE minio-69b985c6b5-w599x 1/1 Running 0 14m
Create workbench
You can define the cluster size and compute resources needed to run the workload.
- Click the Workbenches tab and create a workbench with the following specifications:
Name: caikit-embeddings
Image selection: Pytorch
Version selection: 2024.1
Container size: Large
Accelerator: None
- Under Data connections, select Create new connection if you already deployed an S3 bucket prior to this demo or select Use existing data connection if you followed the previous section.
- If you’ve selected Create new connection, fill out the following fields according to your S3 Object Storage credentials. If you’ve selected Use existing data connection, select My Storage from the list.
- Click Create workbench. You will be redirected to the project dashboard where the workbench is starting. Wait a few minutes for your workbench status to change from Starting to Running
- Access your workbench by clicking Open.
Bootstrap the model
We need to bootstrap our model using the Caikit library to convert it to Caikit format. The result of this bootstrapping step is that it creates a config.yaml
file that specifies the task module in the Caikit-NLP library and saves the model’s files in a new path named artifacts. For the purposes of this tutorial, we will be using BAAI/bge-en-large-v1.5 as our model, but it is replaceable with any of the supported embeddings models.
- Once you are in your Jupyter environment, click the Git icon on the left side of the screen. Click Clone a repository and paste the following repository URL:
https://github.com/christinaexyou/caikit-embeddings-demo/tree/main
Our first step is to install the required libraries. Navigate to caikit-embeddings > notebooks > 1-boostrap-model.ipynb. A
requirements.txt
file has been preconfigured with the required libraries and their correct versions. Run the following command to install them:!pip install -r requirement.txt
Import the required libraries and packages into your Jupyter environment:
from caikit_nlp.modules.text_embedding import EmbeddingModule as module
Bootstrap the model which creates a
config.yml
file:model_name_or_path = "BAAI/bge-en-large-v1.5" output_path = "../models/bge-en-large-caikit" module.bootstrap(model_name_or_path).save(output_path)
For a sanity check, ensure that the following files are in the
models/bge-en-large-caikit
directory:ls -l ../models/bge-en-large-caikit
Expected output:
drwxr-xr-x@ 14 chrxu staff 448 Jul 16 07:45 artifacts rw-r--r--@ 1 chrxu staff 336 Jul 16 07:45 config.yml
Upload the model to S3 storage
Before we deploy the model, we need to upload it to S3 storage (Amazon, MinIO). Click 02-upload_model.ipynb.
Import the required libraries:
import os import time import boto3 import botocore
Run the following code block to retrieve your credentials and define helper functions that will help you upload and list objects in your storage bucket:
aws_access_key_id = os.environ.get('AWS_ACCESS_KEY_ID') aws_secret_access_key = os.environ.get('AWS_SECRET_ACCESS_KEY') endpoint_url = os.environ.get('AWS_S3_ENDPOINT') region_name = os.environ.get('AWS_DEFAULT_REGION') bucket_name = os.environ.get('AWS_S3_BUCKET') session = boto3.session.Session( aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key ) s3_resource = session.resource( 's3', config=botocore.client.Config(signature_version='s3v4'), endpoint_url=endpoint_url, region_name=region_name ) bucket = s3_resource.Bucket(bucket_name) def upload_directory_to_s3(local_directory, s3_prefix): for root, dirs, files in os.walk(local_directory): for filename in files: file_path = os.path.join(root, filename) relative_path = os.path.relpath(file_path, local_directory) s3_key = os.path.join(s3_prefix, relative_path) print(f"{file_path} -> {s3_key}") bucket.upload_file(file_path, s3_key) def list_objects(prefix): filter = bucket.objects.filter(Prefix=prefix) for obj in filter.all(): print(obj.key)
Upload your model to your S3 storage bucket:
upload_directory_to_s3("../models", "models")
Perform a sanity check to make sure that all the files have been uploaded:
list_objects("models")
Expected output:
drwxr-xr-x@ 14 chrxu staff 448 May 29 08:57 artifacts -rw-r--r--@ 1 chrxu staff 336 May 29 08:50 config.yml
Deploy model
- Click File > Hub Control Panel to navigate back to the OpenShift AI dashboard. Click the Models tab and then in the Single-model serving tile, click Deploy Model.
- In the form:
- Fill out the Model Name with the value
bge-en-caikit
. - Select the Serving runtime: Caikit Standalone Serving Runtime for KServe.
- Select the Model framework: caikit.
- Set the Model server replicas to 1.
- Select the Model Server size: Small.
- Select the Existing data connection: My Storage.
- Enter the Path to your uploaded model:
models
.
- Fill out the Model Name with the value
- Click Deploy.
- Wait until the Status shows a green checkmark to ensure your model’s been deployed.
Model inference
- In the OpenShift AI dashboard, click the Models tab and copy the model’s inference endpoint.
- Return to the Jupyter environment and click notebooks > 3-rest_requests.ipynb.
- You can either run each cell individually or click on Run all to test out the different embeddings endpoints.
Conclusion
In this article, you set up a working project using OpenShift AI, deployed a local MinIO bucket, prepared a model for serving using the Caikit NLP module, deployed it on Caikit Standalone Serving Runtime, and made various inference requests to perform RAG-related tasks.
The single serving platform on Red Hat OpenShift AI supports additional LLM tasks. For more information, visit the OpenShift AI product page.