Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

How to serve embeddings models on OpenShift AI

September 25, 2024
Christina Xu
Related topics:
Artificial intelligenceData Science
Related products:
Red Hat OpenShift AI

Share:

    In this article, you will learn how to facilitate word embeddings tasks using a Sentence Transformer model deployed on Caikit Standalone serving runtime using Red Hat OpenShift AI. 

    Introduction 

    Word embeddings are representations of text in the form of real-valued vectors. They are the foundation of several natural language processing (NLP) applications such as Retrieval Augmentation Generation. Caikit NLP is a toolkit/runtime that provides modules and exposes inference endpoints for large language models (LLMs).

    In this tutorial, we will serve a Sentence Transformer model using the Caikit Standalone serving runtime. You will learn how to convert an LLM to Caikit format, deploy the model onto a serving runtime, and send requests to the served model. 

    Prerequisites

    • Access to an OpenShift cluster with Red Hat OpenShift AI 2.11+ installed.

    Demo set up

    First, let’s set up your working environment as a project within OpenShift AI: 

    1. Log in to the OpenShift AI dashboard on your OpenShift cluster.
    2. Navigate to Data Science Projects.
    3. Click the Create data science project button.
    4. Give your project a name; for example, caikit-embeddings.
    5. Finally, click Create.

    Storage data connection

    For this demo, you will need an S3-compatible storage object bucket such as MinIO or AWS S3. If you already have a bucket deployed, skip down to the next section titled Create Workbench and if not, continue the following steps to install a local MinIO bucket. 

    1. In your terminal, login to your OpenShift cluster:

      oc login --token=<token> --server=<server>
    2. Run the following command to install a local MinIO bucket in your project namespace:

      export NAMESPACE="caikit-embeddings"
                      oc apply -n ${NAMESPACE} -f https://github.com/christinaexyou/caikit-embeddings-demo/raw/main/minio.yaml

      Expected output:

      persistentvolumeclaim/minio-pvc created
      
      secret/minio-secret created
      
      deployment.apps/minio created
      
      service/minio-service created
      
      route.route.openshift.io/minio-api created
      
      route.route.openshift.io/minio-ui created
    3.   As a sanity check, run the following command to confirm that a MinIO pod has been deployed in your namespace:

      oc get pods -n ${NAMESPACE} 

      Expected output:

      READY                            STATUS      RESTARTS      AGE
      minio-69b985c6b5-w599x   1/1     Running     0             14m

    Create workbench

    You can define the cluster size and compute resources needed to run the workload.

    1. Click the Workbenches tab and create a workbench with the following specifications:
    • Name: caikit-embeddings

    • Image selection: Pytorch

    • Version selection: 2024.1

    • Container size: Large

    • Accelerator: None

    1. Under Data connections, select Create new connection if you already deployed an S3 bucket prior to this demo or select Use existing data connection if you followed the previous section. 
    2. If you’ve selected Create new connection, fill out the following fields according to your S3 Object Storage credentials. If you’ve selected Use existing data connection, select My Storage from the list.
    3. Click Create workbench. You will be redirected to the project dashboard where the workbench is starting. Wait a few minutes for your workbench status to change from Starting to Running
    4. Access your workbench by clicking Open.

    Bootstrap the model

    We need to bootstrap our model using the Caikit library to convert it to Caikit format. The result of this bootstrapping step is that it creates a config.yaml file that specifies the task module in the Caikit-NLP library and saves the model’s files in a new path named artifacts. For the purposes of this tutorial, we will be using BAAI/bge-en-large-v1.5 as our model, but it is replaceable with any of the supported embeddings models. 

    1. Once you are in your Jupyter environment, click the Git icon on the left side of the screen. Click Clone a repository and paste the following repository URL: https://github.com/christinaexyou/caikit-embeddings-demo/tree/main
    2. Our first step is to install the required libraries. Navigate to caikit-embeddings > notebooks > 1-boostrap-model.ipynb. A requirements.txt file has been preconfigured with the required libraries and their correct versions. Run the following command to install them:

      !pip install -r requirement.txt
    3. Import the required libraries and packages into your Jupyter environment:

      from caikit_nlp.modules.text_embedding import EmbeddingModule as module
    4. Bootstrap the model which creates a config.yml file: 

      model_name_or_path = "BAAI/bge-en-large-v1.5"
      output_path = "../models/bge-en-large-caikit"
      module.bootstrap(model_name_or_path).save(output_path)
    5. For a sanity check, ensure that the following files are in the models/bge-en-large-caikit directory:

      ls -l ../models/bge-en-large-caikit

      Expected output:

      drwxr-xr-x@ 14 chrxu  staff  448 Jul 16 07:45 artifacts
      rw-r--r--@  1 chrxu  staff  336 Jul 16 07:45 config.yml

    Upload the model to S3 storage

    Before we deploy the model, we need to upload it to S3 storage (Amazon, MinIO). Click 02-upload_model.ipynb. 

    1. Import the required libraries:

      import os
      import time
      import boto3
      import botocore
    2. Run the following code block to retrieve your credentials and define helper functions that will help you upload and list objects in your storage bucket:

      aws_access_key_id = os.environ.get('AWS_ACCESS_KEY_ID')
      aws_secret_access_key = os.environ.get('AWS_SECRET_ACCESS_KEY')
      endpoint_url = os.environ.get('AWS_S3_ENDPOINT')
      region_name = os.environ.get('AWS_DEFAULT_REGION')
      bucket_name = os.environ.get('AWS_S3_BUCKET')
      session = boto3.session.Session(
                  aws_access_key_id=aws_access_key_id,
                  aws_secret_access_key=aws_secret_access_key
                )
      s3_resource = session.resource(
                     's3',
                     config=botocore.client.Config(signature_version='s3v4'),
                     endpoint_url=endpoint_url,
                     region_name=region_name
                    )
      bucket = s3_resource.Bucket(bucket_name)
                              
      def upload_directory_to_s3(local_directory, s3_prefix):
          for root, dirs, files in os.walk(local_directory):
              for filename in files:
                  file_path = os.path.join(root, filename)
                  relative_path = os.path.relpath(file_path, local_directory)
                  s3_key = os.path.join(s3_prefix, relative_path)
                  print(f"{file_path} -> {s3_key}")
                  bucket.upload_file(file_path, s3_key)
                                      
      def list_objects(prefix):
          filter = bucket.objects.filter(Prefix=prefix)
          for obj in filter.all():
              print(obj.key)
    3. Upload your model to your S3 storage bucket:

      upload_directory_to_s3("../models", "models")
    4.  Perform a sanity check to make sure that all the files have been uploaded:

      list_objects("models")

      Expected output:

      drwxr-xr-x@ 14 chrxu  staff  448 May 29 08:57 artifacts
      -rw-r--r--@  1 chrxu  staff  336 May 29 08:50 config.yml

    Deploy model

    1. Click File > Hub Control Panel to navigate back to the OpenShift AI dashboard. Click the Models tab and then in the Single-model serving tile, click Deploy Model.
    2. In the form:
      1. Fill out the Model Name with the value bge-en-caikit.
      2. Select the Serving runtime: Caikit Standalone Serving Runtime for KServe.
      3. Select the Model framework: caikit.
      4. Set the Model server replicas to 1.
      5. Select the Model Server size: Small.
      6. Select the Existing data connection: My Storage.
      7. Enter the Path to your uploaded model: models.
    3. Click Deploy.
    4. Wait until the Status shows a green checkmark to ensure your model’s been deployed.

    Model inference

    1. In the OpenShift AI dashboard, click the Models tab and copy the model’s inference endpoint.
    2. Return to the Jupyter environment and click notebooks > 3-rest_requests.ipynb. 
    3. You can either run each cell individually or click on Run all to test out the different embeddings endpoints.

    Conclusion

    In this article, you set up a working project using OpenShift AI, deployed a local MinIO bucket, prepared a model for serving using the Caikit NLP module, deployed it on Caikit Standalone Serving Runtime, and made various inference requests to perform RAG-related tasks. 

    The single serving platform on Red Hat OpenShift AI supports additional LLM tasks. For more information, visit the OpenShift AI product page.

    Related Posts

    • Red Hat OpenShift AI installation and setup

    • How to integrate Quarkus applications with OpenShift AI

    • Create an OpenShift AI environment with Snorkel

    • Model training in Red Hat OpenShift AI

    • How to integrate and use RStudio Server on OpenShift AI

    • ​​Try OpenShift AI and integrate with Apache Camel

    Recent Posts

    • Storage considerations for OpenShift Virtualization

    • Upgrade from OpenShift Service Mesh 2.6 to 3.0 with Kiali

    • EE Builder with Ansible Automation Platform on OpenShift

    • How to debug confidential containers securely

    • Announcing self-service access to Red Hat Enterprise Linux for Business Developers

    What’s up next?

    Learn how large language models (LLMs) are created and use Red Hat Enterprise Linux AI to experiment within an LLM in this hands-on learning path.

    Start the activity
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue