Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Integrate Red Hat AI Inference Server & LangChain in agentic workflows

June 19, 2025
Alexander Barbosa Ayala
Related topics:
Artificial intelligenceContainersPython
Related products:
Red Hat AIRed Hat Enterprise Linux AIRed Hat OpenShift AI

    Integrating the powerful capabilities of large language models (LLMs) into real-world workflows is one of the most significant advancements in modern application development. This article explores how to integrate Red Hat AI Inference Server with LangChain to build agentic workflows, with a focus on document processing. In this example, Red Hat AI Inference Server serves the LLM, while LangChain manages the application state and workflow.

    About LangChain

    LangChain is a framework designed to facilitate the development of applications powered by large language models. LangChain implements a standard interface for LLMs and related technologies, such as embedding models and vector stores, and integration with other providers.

    LangChain can also be integrated with the LangGraph framework, which enables fine-grained control over workflows that integrate LLMs. Therefore, if the use case involves a sequence of steps requiring orchestration and decision-making at key points, LangGraph provides the structure to manage that complexity effectively.

    About Red Hat AI Inference Server

    Red Hat AI Inference Server is a robust, enterprise-grade solution for LLM inference and serving, supported and maintained by Red Hat, which is leveraged from the popular community-supported vLLM inference server.

    One of Red Hat AI Inference Server's great advantages is the flexibility for model serving options, having a wide range of validated models, parametrization, and more. Also, it is cloud native, allowing it to scale in any hybrid cloud OpenShift environment. For more details, refer to the  Red Hat AI Hugging Face space and the Red Hat AI Inference Server official documentation.

    Use case description

    To demonstrate the integration between LangChain and Red Hat AI Inference Server, a document processing use case was developed. Depending on the LLM's reasoning, the document will be either approved or rejected. 

    The application contains a document-processing agent that extracts information from an uploaded PDF and embeds it in a vector store. This vectorized information is processed by the LLM, which checks compliance rules, and finally responds with a status, Approved or Rejected. All coordinated via LangGraph states.

    Planning the app workflow

    Having described the use case, the next step is to design the graph nodes and workflow (Figure 1). For this use case, there are three main nodes to build:

    • Upload the file: Locate the document in a valid local path. Its contents will be processed by the LLM.
    • Checking compliance rules: Verify if the previously uploaded document contains the desired rule, in this case, if it includes a termination clause.
    • Response: Generated a final response approving or rejecting the document. It also includes a reasoning from the LLM.
    App Workflow
    Figure 1: Designed application workflow.

     

    Translating this to LangChain in Python code, an option is to define the states as functions (you could also define them as classes, depending on each use case and development practices). In the current case, each node is defined as a function:

    def upload_and_extract(state):
        print("Loading & Chunking PDF...")
    ...
    def check_compliance_with_retrieval(state):
        print("Checking for 'termination clause'...")
    ...
    def respond(state):
        if state["compliant"]:
            msg = "Document Approved."
        else:
            msg = "Document Rejected. Missing termination clause."
    ...

    Once we define the node functions and logic, it should build the state graph to manage the use case workflow:

    # build the state graph
    graph = StateGraph(State)
    graph.add_node("upload", RunnableLambda(upload_and_extract))
    graph.add_node("check", RunnableLambda(check_compliance_with_retrieval))
    graph.add_node("respond", RunnableLambda(respond))
    graph.set_entry_point("upload")
    graph.add_edge("upload", "check")
    graph.add_edge("check", "respond")
    graph.set_finish_point("respond")
    # execute the workflow
    workflow = graph.compile()
    workflow.invoke({})

    The full application example, along with its local execution configuration, is detailed in the next section.

    Set up the environment and execute the application

    Once the use case workflow and the nodes' purposes are designed, it’s time to implement the code details. The below-described procedure was configured in an environment with the following technical details:

    • OS: Fedora 41
    • GPU: NVIDIA RTX 4060 Ti (16 GB vRAM)
    • RHAIIS 3.0.0 image
    • Python 3.12 

    Configure the application environment

    1. Clone the GitHub repository:

      git clone https://github.com/alexbarbosa1989/rhaiis-langchain.git
    2.  Move to the application directory:

      cd rhaiis-langchain
    3. Create an .env file where will be configured sensitive information, such as variables and passwords. In this case, define the DOCUMENT_PATH variable with the full path to the PDF document that will be analyzed. An example PDF document is provided in the application repository in the rhaiis-langchain/docs/example/contract-template.pdf path:

      echo "DOCUMENT_PATH=/home/user/rhaiis-langchain/docs/example/contract-template.pdf" >> .env

      Verifying the contents:

      cat .env 
      DOCUMENT_PATH=/home/user/Downloads/rhaiis-langchain/docs/example/contract-template.pdf
    4. Create a Python virtual environment:

      python3.12 -m venv --upgrade-deps venv
    5. Activate the environment:

      source venv/bin/activate
    6. Install the required libraries to run the application correctly:

      pip install -r requirements.txt

    Start the Red Hat AI Inference Server container

    In another terminal window, start a Red Hat AI Inference Server instance to serve an LLM; in this case, the quantized RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8 model was served. Make sure that the Podman environment is previously configured by following the suggested procedure in the product documentation, specifically the Serving and inferencing with AI Inference Server section:

    podman run -ti --rm --pull=newer \
    --user 0 \
    --shm-size=0 \
    -p 127.0.0.1:8000:8000 \
    --env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
    --env "HF_HUB_OFFLINE=0" \
    -v ./rhaiis-cache:/opt/app-root/src/.cache  \
    --device nvidia.com/gpu=all \
    --security-opt=label=disable \
    --name rhaiis \
    registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.0.0 \
    --model RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8 \
    --max_model_len=4096

    Run the application

    Go back to the terminal session where the application was cloned and the Python virtual environment is activated. There, run the Python application:

    python app.py

    The application will automatically follow the design & build workflow, processing the set PDF document specified in the DOCUMENT_PATH variable. It will analyze the document's contents using the served LLM to check for the presence of a termination clause. Based on this review, the document will either be approved or rejected. The provided contract-template.pdf document complies with the established rule; therefore, it should be approved:

    Loading & Chunking PDF...
    Checking for 'termination clause'...
    Final Response: Document Approved.
    Reasoning: yes.
    the document contains a termination clause in section 3, which states: "either party may terminate this agreement with 30 days written notice. upon termination, all outstanding balances must be paid." this clause outlines the conditions for terminating the agreement.

    For a rejected document test, only need to update the DOCUMENT_PATH variable in the .env file with another document that doesn’t contain any termination clause, and re-run the application:

    Loading & Chunking PDF...
    Checking for 'termination clause'...
    Final Response: Document Rejected. Missing termination clause.
    Reasoning: no, the document does not contain a termination clause. 
    a termination clause is a provision in a contract that specifies the conditions under which the contract may be terminated. the document provided appears to be an instruction manual for installing a topcase for a royal enfield himalayan motorcycle, and it does not mention anything about terminating a contract or agreement.

    Finally, it is possible to print the graph based on the configured workflow by uncommenting the last line in app.py and re-running the application:

    # print the graph in ASCII format (OPTIONAL)
    #print(workflow.get_graph().draw_ascii())
    # print the graph in ASCII format (OPTIONAL)
    print(workflow.get_graph().draw_ascii())

    Additional expected output:

    +-----------+  
    | __start__ |  
    +-----------+  
          *        
          *        
          *        
      +--------+   
      | upload |   
      +--------+   
          *        
          *        
          *        
      +-------+    
      | check |    
      +-------+    
          *        
          *        
          *        
     +---------+   
     | respond |   
     +---------+   
          *        
          *        
          *        
     +---------+   
     | __end__ |   
     +---------+

    Wrapping up

    This is just a basic example that shows one of the alternatives to integrate LLMs into an application's design. Using frameworks like LangChain takes the power of LLMs to another level by adding workflow and state management to fit into specific business needs. Incorporating agentic approaches and tools like vector stores for document processing is just one of many advanced capabilities you can leverage with these integrations.

    Next, you can explore more alternatives for RHAIIS serving and integrations. We encourage you to walk through the steps to run OpenAI’s Whisper model using Red Hat AI Inference Server on a Red Hat Enterprise Linux 9 environment: Speech-to-text with Whisper and Red Hat AI Inference Server

    Related Posts

    • How to use LLMs in Java with LangChain4j and Quarkus

    • Enable 3.5 times faster vision language models with quantization

    • Introducing Podman AI Lab: Developer tooling for working with LLMs

    • LLMs and Red Hat Developer Hub: How to catalog AI assets

    • Structured outputs in vLLM: Guiding AI responses

    • vLLM V1: Accelerating multimodal inference for large language models

    Recent Posts

    • Protect data offloaded to GPU-accelerated environments with OpenShift sandboxed containers

    • Case study: Measuring energy efficiency on the x64 platform

    • How to prevent AI inference stack silent failures

    • Preventing GPU waste: A guide to JIT checkpointing with Kubeflow Trainer on OpenShift AI

    • How to manage TLS certificates used by OpenShift GitOps operator

    What’s up next?

    Configure your RHEL AI machine, download, serve, and interact with large language models (LLM) using RHEL AI and InstructLab, and discover how you can benefit from AI models tailored to your needs.

    Start the activity
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.