Integrate Red Hat AI Inference Server & LangChain in agentic workflows

Integrating the powerful capabilities of large language models (LLMs) into real-world workflows is one of the most significant advancements in modern application development. This article explores how to integrate Red Hat AI Inference Server with LangChain to build agentic workflows, with a focus on document processing. In this example, Red Hat AI Inference Server serves the LLM, while LangChain manages the application state and workflow.

About LangChain

LangChain is a framework designed to facilitate the development of applications powered by large language models. LangChain implements a standard interface for LLMs and related technologies, such as embedding models and vector stores, and integration with other providers.

LangChain can also be integrated with the LangGraph framework, which enables fine-grained control over workflows that integrate LLMs. Therefore, if the use case involves a sequence of steps requiring orchestration and decision-making at key points, LangGraph provides the structure to manage that complexity effectively.

About Red Hat AI Inference Server

Red Hat AI Inference Server is a robust, enterprise-grade solution for LLM inference and serving, supported and maintained by Red Hat, which is leveraged from the popular community-supported vLLM inference server.

One of Red Hat AI Inference Server's great advantages is the flexibility for model serving options, having a wide range of validated models, parametrization, and more. Also, it is cloud native, allowing it to scale in any hybrid cloud OpenShift environment. For more details, refer to the Red Hat AI Hugging Face space and the Red Hat AI Inference Server official documentation.

Use case description

To demonstrate the integration between LangChain and Red Hat AI Inference Server, a document processing use case was developed. Depending on the LLM's reasoning, the document will be either approved or rejected.

The application contains a document-processing agent that extracts information from an uploaded PDF and embeds it in a vector store. This vectorized information is processed by the LLM, which checks compliance rules, and finally responds with a status, Approved or Rejected. All coordinated via LangGraph states.

Planning the app workflow

Having described the use case, the next step is to design the graph nodes and workflow (Figure 1). For this use case, there are three main nodes to build:

Upload the file: Locate the document in a valid local path. Its contents will be processed by the LLM.
Checking compliance rules: Verify if the previously uploaded document contains the desired rule, in this case, if it includes a termination clause.
Response: Generated a final response approving or rejecting the document. It also includes a reasoning from the LLM.

App Workflow — Figure 1: Designed application workflow.

Translating this to LangChain in Python code, an option is to define the states as functions (you could also define them as classes, depending on each use case and development practices). In the current case, each node is defined as a function:

def upload_and_extract(state):
    print("Loading & Chunking PDF...")
...
def check_compliance_with_retrieval(state):
    print("Checking for 'termination clause'...")
...
def respond(state):
    if state["compliant"]:
        msg = "Document Approved."
    else:
        msg = "Document Rejected. Missing termination clause."
...

Once we define the node functions and logic, it should build the state graph to manage the use case workflow:

# build the state graph
graph = StateGraph(State)
graph.add_node("upload", RunnableLambda(upload_and_extract))
graph.add_node("check", RunnableLambda(check_compliance_with_retrieval))
graph.add_node("respond", RunnableLambda(respond))
graph.set_entry_point("upload")
graph.add_edge("upload", "check")
graph.add_edge("check", "respond")
graph.set_finish_point("respond")
# execute the workflow
workflow = graph.compile()
workflow.invoke({})

The full application example, along with its local execution configuration, is detailed in the next section.

Set up the environment and execute the application

Once the use case workflow and the nodes' purposes are designed, it’s time to implement the code details. The below-described procedure was configured in an environment with the following technical details:

OS: Fedora 41
GPU: NVIDIA RTX 4060 Ti (16 GB vRAM)
RHAIIS 3.0.0 image
Python 3.12

Configure the application environment

Clone the GitHub repository:

git clone https://github.com/alexbarbosa1989/rhaiis-langchain.git

Move to the application directory:
```
cd rhaiis-langchain
```
Create an .env file where will be configured sensitive information, such as variables and passwords. In this case, define the DOCUMENT_PATH variable with the full path to the PDF document that will be analyzed. An example PDF document is provided in the application repository in the rhaiis-langchain/docs/example/contract-template.pdf path:
```
echo "DOCUMENT_PATH=/home/user/rhaiis-langchain/docs/example/contract-template.pdf" >> .env
```
Verifying the contents:
```
cat .env 
DOCUMENT_PATH=/home/user/Downloads/rhaiis-langchain/docs/example/contract-template.pdf
```

Create a Python virtual environment:

python3.12 -m venv --upgrade-deps venv

Activate the environment:
```
source venv/bin/activate
```
Install the required libraries to run the application correctly:
```
pip install -r requirements.txt
```

Start the Red Hat AI Inference Server container

In another terminal window, start a Red Hat AI Inference Server instance to serve an LLM; in this case, the quantized RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8 model was served. Make sure that the Podman environment is previously configured by following the suggested procedure in the product documentation, specifically the Serving and inferencing with AI Inference Server section:

podman run -ti --rm --pull=newer \
--user 0 \
--shm-size=0 \
-p 127.0.0.1:8000:8000 \
--env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
--env "HF_HUB_OFFLINE=0" \
-v ./rhaiis-cache:/opt/app-root/src/.cache  \
--device nvidia.com/gpu=all \
--security-opt=label=disable \
--name rhaiis \
registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.0.0 \
--model RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8 \
--max_model_len=4096

Run the application

Go back to the terminal session where the application was cloned and the Python virtual environment is activated. There, run the Python application:

python app.py

The application will automatically follow the design & build workflow, processing the set PDF document specified in the DOCUMENT_PATH variable. It will analyze the document's contents using the served LLM to check for the presence of a termination clause. Based on this review, the document will either be approved or rejected. The provided contract-template.pdf document complies with the established rule; therefore, it should be approved:

Loading & Chunking PDF...
Checking for 'termination clause'...
Final Response: Document Approved.
Reasoning: yes.
the document contains a termination clause in section 3, which states: "either party may terminate this agreement with 30 days written notice. upon termination, all outstanding balances must be paid." this clause outlines the conditions for terminating the agreement.

For a rejected document test, only need to update the DOCUMENT_PATH variable in the .env file with another document that doesn’t contain any termination clause, and re-run the application:

Loading & Chunking PDF...
Checking for 'termination clause'...
Final Response: Document Rejected. Missing termination clause.
Reasoning: no, the document does not contain a termination clause. 
a termination clause is a provision in a contract that specifies the conditions under which the contract may be terminated. the document provided appears to be an instruction manual for installing a topcase for a royal enfield himalayan motorcycle, and it does not mention anything about terminating a contract or agreement.

Finally, it is possible to print the graph based on the configured workflow by uncommenting the last line in app.py and re-running the application:

# print the graph in ASCII format (OPTIONAL)
#print(workflow.get_graph().draw_ascii())

# print the graph in ASCII format (OPTIONAL)
print(workflow.get_graph().draw_ascii())

Additional expected output:

+-----------+  
| __start__ |  
+-----------+  
      *        
      *        
      *        
  +--------+   
  | upload |   
  +--------+   
      *        
      *        
      *        
  +-------+    
  | check |    
  +-------+    
      *        
      *        
      *        
 +---------+   
 | respond |   
 +---------+   
      *        
      *        
      *        
 +---------+   
 | __end__ |   
 +---------+

Wrapping up

This is just a basic example that shows one of the alternatives to integrate LLMs into an application's design. Using frameworks like LangChain takes the power of LLMs to another level by adding workflow and state management to fit into specific business needs. Incorporating agentic approaches and tools like vector stores for document processing is just one of many advanced capabilities you can leverage with these integrations.

Next, you can explore more alternatives for RHAIIS serving and integrations. We encourage you to walk through the steps to run OpenAI’s Whisper model using Red Hat AI Inference Server on a Red Hat Enterprise Linux 9 environment: Speech-to-text with Whisper and Red Hat AI Inference Server

Red Hat Developer Sandbox

Programming languages & frameworks

System design & architecture

Developer experience

Automated data processing

Platform engineering

Secure development & architectures

E-books

Cheat sheets

Documentation

Integrate Red Hat AI Inference Server & LangChain in agentic workflows

About LangChain

About Red Hat AI Inference Server

Use case description

Planning the app workflow

Set up the environment and execute the application

Configure the application environment

Start the Red Hat AI Inference Server container

Run the application

Wrapping up

Introduction to distributed inference with llm-d

How to build your dynamic plug-ins for Developer Hub

Defining success: Evaluation metrics and data augmentation for oversaturation detection

Deploying OpenShift hosted clusters on bare metal

Get started with language model post-training using Training Hub

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue