Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Deploy a coding copilot model with OpenShift AI

October 7, 2024
Haoting Zhao
Related topics:
Artificial intelligenceDeveloper productivity
Related products:
Red Hat OpenShiftRed Hat OpenShift AI

    Continue.dev is an innovative plug-in for Visual Studio Code designed to enhance the productivity of developers by enabling connections to a variety of providers and models. Recognizing the unique needs of certain projects, there are instances where an isolated environment is essential, particularly when there’s a requirement to deploy a fine-tuned base model that aligns more closely with a team’s specific coding practices. 

    Red Hat OpenShift AI

    Red Hat OpenShift AI offers a robust solution for such scenarios, providing a secure and isolated platform that supports model serving in environments with stringent connectivity restrictions. It leverages the power of service mesh and Knative to offer inference APIs, ensuring seamless integration and operation within these specialized setups. This allows teams to maintain high levels of accuracy in code generation while operating within the confines of their tailored development ecosystem. OpenShift AI thus represents a critical tool for teams seeking to harness the full potential of AI in their development processes while adhering to their particular operational requirements.

    KServe

    KServe is a powerful Kubernetes Custom Resource Definition called InferenceService that facilitates machine learning model serving across various frameworks. Designed to address production inference scenarios, it offers a high-performance, highly abstracted interface for popular ML frameworks such as TensorFlow, XGBoost, Scikit-Learn, PyTorch, and ONNX. 

    Leveraging Red Hat OpenShift Serverless and Red Hat OpenShift Service Mesh, KServe simplifies the complexities of auto-scaling, networking, health checks, and service configuration (Figure 1). It brings cutting-edge service features to ML deployments, such as GPU auto-scaling, scale-to-zero capabilities, traffic monitoring, and canary releases. KServe presents a straightforward, pluggable, and comprehensive solution for production ML serving, encompassing prediction, preprocessing, postprocessing, and explainability.

    Model Serving Architecture in OpenShift AI
    KServe model serving architecture
    Figure 1: KServe model serving architecture.

    Caikit

    On the other hand, the Caikit toolkit offers developer-friendly APIs for model management with standard gRPC and HTTP interfaces for querying base models, channeling requests to the TGIS inference service. TGIS, an early branch of the Hugging Face text generation inference service toolkit, enhances LLM performance with features like continuous batching, dynamic batching, tensor parallelism (model sharding), and support for PyTorch 2 compilation.

    NVIDIA GPU Operator

    The NVIDIA GPU Operator automates the management of software components required for configuring and utilizing NVIDIA GPUs within OpenShift. This includes driver containers, device plug-ins, and the DCGM metrics exporter, which integrates with OpenShift’s Prometheus monitoring to provide insights into GPU metrics such as memory utilization and streaming multiprocessor (SM) utilization.

    To effectively implement the TGI serving, it is essential to leverage a specialized serving runtime tailored to meet the specific needs of the application. This can be achieved by importing a custom serving runtime that accompanies the product (Figure 2). The next step involves customizing this runtime to align with the HuggingFace TGI image specifications. 

    Additionally, it is important to ensure that the runtime is configured to support a RESTful type of inference API, which will facilitate seamless interaction and communication with the service.

    Create new serving runtime definition
    Create new serving runtime definition
    Figure 2: Create a new serving runtime definition.

    The core spec of YAML spec is as follows:

    spec:
      containers:
        - args:
            - --model-id=/mnt/models/
            - --port=3000
          command:
            - text-generation-launcher
          env:
            - name: TRANSFORMERS_CACHE
              value: /tmp/transformers_cache
            - name: MAX_INPUT_LENGTH
              value: "1024"
            - name: MAX_TOTAL_TOKENS
              value: "2048"
          image: ghcr.io/huggingface/text-generation-inference:1.3.3
          livenessProbe:
            exec:
              command:
                - curl
                - localhost:3000/health
            initialDelaySeconds: 5
          name: kserve-container
          ports:
            - containerPort: 3000
              name: h2c
              protocol: TCP
          readinessProbe:
            exec
              command:
                - curl
                - localhost:3000/health
            initialDelaySeconds: 5
      multiModel: false
      supportedModelFormats:
        - autoSelect: true
          name: pytorch

    Then create a model deploy directly in the AI project scope with imported model data in S3, as shown in Figure 3. The serving in OpenShift AI will copy the model files in the init stage and set up the disconnected serving runtime.

    Deploy the model
    Deploy the model
    Figure 3: Deploy the model.

    For a few minutes, you'll see the pod running with several restarts (due to the resources, most likely). See Figure 4.

    The running model pod
    The running model pod
    Figure 4: The running model pod.

    Also note that the TLS certificate is self-signed by default; we have to replace it with a correctly signed certificate or the client cannot proceed with the request to the server. By placing the TLS secret in the ods-operator project, we can point to it with a config in the Data Science Cluster custom resource. See Figure 5.

    TLS cert config
    TLS cert config
    Figure 5: TLS cert config.

    Then we can configure the Continue’s config.json (Figure 6) to point to the newly setup model serving entry point by Service Mesh.

    VSCode plugin config
    VSCode plugin config
    Figure 6: VS Code plug-in config.

    Then test some of the features in VS Code with Continue (see Figure 7).

    VSCode plugin menu action
    VSCode plugin menu action
    Figure 7: VS Code plug-in menu action.

    Comments are generated as expected, as shown in Figure 8.

    Generated result by AI model
    Coding pilot results
    Figure 8: Coding pilot results.

    Conclusion

    In response to the need for secure and private model serving in industries where data sensitivity is paramount, such as finance and manufacturing, it is crucial to establish a robust and trusted model serving runtime within a private data center. This approach ensures that sensitive code and data are not exposed to the vulnerabilities associated with the internet. 

    To facilitate seamless integration and maintain industry standards, the runtime environment should be compatible with open-source model serving protocols. By doing so, organizations can leverage the benefits of advanced analytics and machine learning while upholding strict data governance and security policies. Implementing such a solution allows these industries to harness the power of their data without compromising on confidentiality or compliance requirements.

    Related Posts

    • Integrate a private AI coding assistant into your CDE using Ollama, Continue, and OpenShift Dev Spaces

    • Empower conversational AI at scale with KServe

    • How to install KServe using Open Data Hub

    • Open source AI coding assistance with the Granite models

    • Create an OpenShift AI environment with Snorkel

    • Model training in Red Hat OpenShift AI

    Recent Posts

    • Federated identity across the hybrid cloud using zero trust workload identity manager

    • Confidential virtual machine storage attack scenarios

    • Introducing virtualization platform autopilot

    • Integrate zero trust workload identity manager with Red Hat OpenShift GitOps

    • Best Practice Configuration and Tuning for Linux and Windows VMs

    What’s up next?

    Learn how large language models (LLMs) are created and use Red Hat Enterprise Linux AI to experiment within an LLM in this hands-on learning path.

    Start the activity
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.