Featured image for: Building an API using Quarkus from the ground up.

The Quarkus community recently released the first version of the Red Hat OpenShift AI integration with Quarkus. You can now invoke large-language models (LLMs) served by OpenShift AI from Quarkus applications (as illustrated in Figure 1). And that’s the topic of this article.

Quarkus applications with arrows pointing to OpenShift AI model serving.
Figure 1: Invoking LLMs served by OpenShift AI from Quarkus.

A brief history: Quarkus and Quarkus LangChain4J

Quarkus is a Java framework designed to build truly cloud-native applications. It offers a faster startup time and reduces the memory footprint, making it highly efficient. Tailored specifically for the cloud, containers, and Kubernetes, Quarkus has become a go-to framework for many sectors, including banking, logistics, and Software-as-a-Service (SaaS). Despite being under five years old, its impact is significant and widespread.

In addition, Quarkus greatly enhances developer productivity with its robust tooling, extensive set of integrations (known as extensions), live reload, dev services, continuous testing, and much more. It not only accelerates development velocity but also enhances the resilience, scalability, and observability of applications. Therefore, when integrating LLMs, we sought the same characteristics: a joyful developer experience and production-ready capabilities. So, we needed a library to incorporate LLMs while allowing us to implement all these features. 

LangChain4j, a Java-based variant of LangChain, is designed to streamline the integration of AI and LLM capabilities into your Java application. It offers a straightforward and consistent layer of abstractions, ensuring your code remains independent of specific LLM APIs. LangChain4j provides a wide range of LLMs, vector databases, and features like autonomous agents, prompt templates, and structured outputs. It allows for swift integration of LLMs into a Java application. The Quarkus community started collaborating with the LangChain4J team to give us the necessary flexibility. A few weeks later, they released the first version of the Quarkus LangChain4J extension.

With this extension, your Quarkus applications can:

  • Invoke an LLM using a declarative approach (named AiService), naturally integrated with the CDI development model promoted by Quarkus.
  • Integrate the Quarkus fault-tolerance features such as timeout, retry, fallback, and rate limiting.
  • Trace the calls to the LLMs using OpenTelemetry.

The Quarkus LangChain4j extension integrates many LLM providers, such as OpenAI GPTs, Azure OpenAI, Ollama, and Hugging Face. It also supports RAG with multiple vector stores (Chroma, Redis, PG-Vector…). 

This post focuses on the latest addition to the Quarkus LangChain4j extension: the OpenShift AI integration.

Integrating OpenShift AI with Quarkus LangChain4j

The integration of OpenShift AI with Quarkus LangChain4J relies on a new model runtime provided by OpenShift AI: caikit (shown in Figure 2). LLMs served with this runtime are easily invokable from any application.

The caikit model runtimes listed in the Models and model servers tab.
Figure 2: The caikit model runtime.

Nothing beats an example to understand the integration better. The complete code of this example is available at https://github.com/cescoffier/quarkus-customer-review-triage.

A customer review triage application

The customer review triage application, illustrated in Figure 3, utilizes the Mistral-7B Language Model (LLM) served by OpenShift AI to discern the sentiment of customer reviews, categorizing them as positive or negative.

Diagram of the customer review triage application.
Figure 3: The sample application.

The entire integration process in Quarkus unfolds in three straightforward steps:

  1. Add dependency: Begin by adding the necessary LLM provider dependency. In our case, it involves including the following dependency in the pom.xml file.
    <dependency>
     <groupId>io.quarkiverse.langchain4j</groupId>
     <artifactId>quarkus-langchain4j-openshift-ai</artifactId>
     <version>0.6.3</version>
    </dependency>
  1. Write AI service model: Quarkus LangChain4j AI Services are the cornerstone of integration of the LLM. It involves creating an AI Service interface that models the interaction between the application and the LLM. The declarative model abstracts implementation details, facilitates fault tolerance, and simplifies tracing, metrics, and auditing. In our application, the AI Service is the following:

package me.escoffier.demo;

import dev.langchain4j.service.UserMessage;
import io.quarkiverse.langchain4j.RegisterAiService;

@RegisterAiService
public interface TriageAi {


   @UserMessage("""
           [INST] Instructions: Analyze the review and determine if it is
          positive or negative.
          
           ---

           {review}

           ---

           If the review is positive, say:'POSITIVE',
          otherwise:'NEGATIVE'.

           [/INST]

           """)

   String triage(String review);
}

Info alert: Note

The [INST] and [/INST] tags delimit the instructions for the LLM. The Quarkus and LangChain4J communities are working on automatically adding them.

In the application logic, the HTTP endpoint injects the TriageAi instance and seamlessly incorporates sentiment analysis into the review triage process. The resulting triaged reviews are then persisted in a PostgreSQL database:

package me.escoffier.demo;

import jakarta.inject.Inject;
import jakarta.transaction.Transactional;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import java.util.List;
import java.util.stream.Collectors;

@Path("/reviews")
public class ReviewEndpoint {

   @Inject TriageAi ai;

   public record Review(String customerId, String review) {}


   @POST
   @Transactional
   public TriagedReview triage(Review review) {
       var sentiment = ai.triage(review.review());
       var triaged = new TriagedReview();
       triaged.review = review.review();
       triaged.customerId = review.customerId();
       triaged.sentiment = TriagedReview.Sentiment.from(sentiment);
       triaged.persist();
       return triaged;
   }

   @GET
   public List<TriagedReview> getAllTriagedReviews() {
       return TriagedReview.<TriagedReview>streamAll().limit(5).collect(Collectors.toList());
   }
}
  1. Configure the OpenShift AI LLM provider: Finally, configure the LLM provider in the application.properties file. Specify the base URL, model ID, and any necessary timeouts.
quarkus.langchain4j.openshift-ai.base-url=https://<url>:443/api
quarkus.langchain4j.openshift-ai.chat-model.model-id=mistral7b-xl
quarkus.langchain4j.openshift-ai.timeout=60s

With these steps, your Quarkus application effortlessly interacts with the Mistral-7B LLM served by OpenShift AI, enhancing your customer review triage process with advanced language understanding capabilities (Figure 4).

Diagram of the customer review triage process with advanced language capabilities.
Figure 4: Customer review triage process with the OpenShift AI LLM provider configured.

Observability and fault tolerance

Ensuring resilience and observability are crucial, as emphasized throughout this post. Let's delve into how these features are implemented in our AI service.

We leverage fault tolerance annotations within our AI service to improve our application against potential issues. The code snippet below showcases the application of these annotations.

@UserMessage("""

       [INST] Instructions: Analyze the review and determine if it is positive or negative.

       ---

       {review}

       ---

       If the review is positive, say:'POSITIVE', otherwise:'NEGATIVE'.

       [/INST]

       """)
@Timeout(value = 1, unit = ChronoUnit.MINUTES)
@Retry(maxRetries = 2, delay = 1, delayUnit =ChronoUnit.SECONDS)
@RateLimit(value = 1, window = 2, windowUnit = ChronoUnit.SECONDS)
@Fallback(fallbackMethod = "fallback")
String triage(String review);

static String fallback(String review) {
   return "NEGATIVE";
}

In the provided code snippet, we employ a combination of timeout, retry, rate limiting, and fallback mechanisms. This approach ensures robustness by handling scenarios where the LLM invocation times out, fails, or encounters excessive frequency (more than once every 2 seconds). In such cases, the designated fallback method (fallback) is invoked, allowing for a graceful response to the failure with a predefined value; in this example, "NEGATIVE."

It's worth noting that the invocation of LLM is automatically measured and traced, providing inherent observability into the system (Figure 5). This capability ensures that the performance of LLM invocations can be monitored seamlessly, contributing to a comprehensive understanding of the application's behavior and aiding in identifying and resolving potential issues.

Monitoring LLM invocation performance for the sample application.
Figure 5: Monitoring LLM invocation performance for the sample application.

Summary

In conclusion, this article has highlighted the seamless integration of AI-powered Quarkus applications utilizing models served by OpenShift AI. With the Quarkus Langchain4J extension, the process is remarkably straightforward:

  1. Add the io.quarkiverse:quarkus-langchain4j-openshit-ai dependency to your project.
  2. Craft the AI Service interface along with the corresponding prompt.
  3. Configure the base URL and model ID in the application.properties file.

Furthermore, Quarkus enriches the development experience by offering essential features such as tracing, metrics, fault tolerance, and diverse vector stores. These capabilities empower developers to create robust, production-ready, cloud-native, AI-powered Java applications. Embrace the power of Quarkus and embark on a journey to integrate cutting-edge OpenShift AI capabilities into your Java applications seamlessly.