Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Deploy a coding copilot model with OpenShift AI

October 7, 2024
Haoting Zhao
Related topics:
Artificial intelligenceDeveloper Productivity
Related products:
Red Hat OpenShiftRed Hat OpenShift AI

Share:

    Continue.dev is an innovative plug-in for Visual Studio Code designed to enhance the productivity of developers by enabling connections to a variety of providers and models. Recognizing the unique needs of certain projects, there are instances where an isolated environment is essential, particularly when there’s a requirement to deploy a fine-tuned base model that aligns more closely with a team’s specific coding practices. 

    Red Hat OpenShift AI

    Red Hat OpenShift AI offers a robust solution for such scenarios, providing a secure and isolated platform that supports model serving in environments with stringent connectivity restrictions. It leverages the power of service mesh and Knative to offer inference APIs, ensuring seamless integration and operation within these specialized setups. This allows teams to maintain high levels of accuracy in code generation while operating within the confines of their tailored development ecosystem. OpenShift AI thus represents a critical tool for teams seeking to harness the full potential of AI in their development processes while adhering to their particular operational requirements.

    KServe

    KServe is a powerful Kubernetes Custom Resource Definition called InferenceService that facilitates machine learning model serving across various frameworks. Designed to address production inference scenarios, it offers a high-performance, highly abstracted interface for popular ML frameworks such as TensorFlow, XGBoost, Scikit-Learn, PyTorch, and ONNX. 

    Leveraging Red Hat OpenShift Serverless and Red Hat OpenShift Service Mesh, KServe simplifies the complexities of auto-scaling, networking, health checks, and service configuration (Figure 1). It brings cutting-edge service features to ML deployments, such as GPU auto-scaling, scale-to-zero capabilities, traffic monitoring, and canary releases. KServe presents a straightforward, pluggable, and comprehensive solution for production ML serving, encompassing prediction, preprocessing, postprocessing, and explainability.

    Model Serving Architecture in OpenShift AI
    KServe model serving architecture
    Figure 1: KServe model serving architecture.

    Caikit

    On the other hand, the Caikit toolkit offers developer-friendly APIs for model management with standard gRPC and HTTP interfaces for querying base models, channeling requests to the TGIS inference service. TGIS, an early branch of the Hugging Face text generation inference service toolkit, enhances LLM performance with features like continuous batching, dynamic batching, tensor parallelism (model sharding), and support for PyTorch 2 compilation.

    NVIDIA GPU Operator

    The NVIDIA GPU Operator automates the management of software components required for configuring and utilizing NVIDIA GPUs within OpenShift. This includes driver containers, device plug-ins, and the DCGM metrics exporter, which integrates with OpenShift’s Prometheus monitoring to provide insights into GPU metrics such as memory utilization and streaming multiprocessor (SM) utilization.

    To effectively implement the TGI serving, it is essential to leverage a specialized serving runtime tailored to meet the specific needs of the application. This can be achieved by importing a custom serving runtime that accompanies the product (Figure 2). The next step involves customizing this runtime to align with the HuggingFace TGI image specifications. 

    Additionally, it is important to ensure that the runtime is configured to support a RESTful type of inference API, which will facilitate seamless interaction and communication with the service.

    Create new serving runtime definition
    Create new serving runtime definition
    Figure 2: Create a new serving runtime definition.

    The core spec of YAML spec is as follows:

    spec:
      containers:
        - args:
            - --model-id=/mnt/models/
            - --port=3000
          command:
            - text-generation-launcher
          env:
            - name: TRANSFORMERS_CACHE
              value: /tmp/transformers_cache
            - name: MAX_INPUT_LENGTH
              value: "1024"
            - name: MAX_TOTAL_TOKENS
              value: "2048"
          image: ghcr.io/huggingface/text-generation-inference:1.3.3
          livenessProbe:
            exec:
              command:
                - curl
                - localhost:3000/health
            initialDelaySeconds: 5
          name: kserve-container
          ports:
            - containerPort: 3000
              name: h2c
              protocol: TCP
          readinessProbe:
            exec
              command:
                - curl
                - localhost:3000/health
            initialDelaySeconds: 5
      multiModel: false
      supportedModelFormats:
        - autoSelect: true
          name: pytorch

    Then create a model deploy directly in the AI project scope with imported model data in S3, as shown in Figure 3. The serving in OpenShift AI will copy the model files in the init stage and set up the disconnected serving runtime.

    Deploy the model
    Deploy the model
    Figure 3: Deploy the model.

    For a few minutes, you'll see the pod running with several restarts (due to the resources, most likely). See Figure 4.

    The running model pod
    The running model pod
    Figure 4: The running model pod.

    Also note that the TLS certificate is self-signed by default; we have to replace it with a correctly signed certificate or the client cannot proceed with the request to the server. By placing the TLS secret in the ods-operator project, we can point to it with a config in the Data Science Cluster custom resource. See Figure 5.

    TLS cert config
    TLS cert config
    Figure 5: TLS cert config.

    Then we can configure the Continue’s config.json (Figure 6) to point to the newly setup model serving entry point by Service Mesh.

    VSCode plugin config
    VSCode plugin config
    Figure 6: VS Code plug-in config.

    Then test some of the features in VS Code with Continue (see Figure 7).

    VSCode plugin menu action
    VSCode plugin menu action
    Figure 7: VS Code plug-in menu action.

    Comments are generated as expected, as shown in Figure 8.

    Generated result by AI model
    Coding pilot results
    Figure 8: Coding pilot results.

    Conclusion

    In response to the need for secure and private model serving in industries where data sensitivity is paramount, such as finance and manufacturing, it is crucial to establish a robust and trusted model serving runtime within a private data center. This approach ensures that sensitive code and data are not exposed to the vulnerabilities associated with the internet. 

    To facilitate seamless integration and maintain industry standards, the runtime environment should be compatible with open-source model serving protocols. By doing so, organizations can leverage the benefits of advanced analytics and machine learning while upholding strict data governance and security policies. Implementing such a solution allows these industries to harness the power of their data without compromising on confidentiality or compliance requirements.

    Related Posts

    • Integrate a private AI coding assistant into your CDE using Ollama, Continue, and OpenShift Dev Spaces

    • Empower conversational AI at scale with KServe

    • How to install KServe using Open Data Hub

    • Open source AI coding assistance with the Granite models

    • Create an OpenShift AI environment with Snorkel

    • Model training in Red Hat OpenShift AI

    Recent Posts

    • Why Models-as-a-Service architecture is ideal for AI models

    • How to run MicroShift as a container using MINC

    • OpenShift 4.19 brings a unified console for developers and admins

    • 3 steps to secure network segmentation with Ansible and AWS

    • Integrate vLLM inference on macOS/iOS using OpenAI APIs

    What’s up next?

    Learn how large language models (LLMs) are created and use Red Hat Enterprise Linux AI to experiment within an LLM in this hands-on learning path.

    Start the activity
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue