Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • See all Red Hat products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Red Hat OpenShift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • See all technologies
    • Programming languages & frameworks

      • Java
      • Python
      • JavaScript
    • System design & architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer experience

      • Productivity
      • Tools
      • GitOps
    • Automated data processing

      • AI/ML
      • Data science
      • Apache Kafka on Kubernetes
    • Platform engineering

      • DevOps
      • DevSecOps
      • Red Hat Ansible Automation Platform for applications and services
    • Secure development & architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & cloud native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • See all learning resources

    E-books

    • GitOps cookbook
    • Podman in action
    • Kubernetes operators
    • The path to GitOps
    • See all e-books

    Cheat sheets

    • Linux commands
    • Bash commands
    • Git
    • systemd commands
    • See all cheat sheets

    Documentation

    • Product documentation
    • API catalog
    • Legacy documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore the Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Hybrid loan-decisioning with OpenShift AI and Vertex AI

Building a hybrid loan-decisioning demo with Vertex AI, OpenShift AI, and a Llama chatbot

March 19, 2026
Harshil Sabhnani Harshil Sabhnani
Related topics:
Artificial intelligenceData scienceHybrid cloud
Related products:
Red Hat AI Inference ServerRed Hat AIRed Hat Enterprise Linux AIRed Hat OpenShift AI

    This blog presents a practical solution pattern that demonstrates how a modern financial application can make loan decisions using multiple machine learning (ML) systems deployed across hybrid environments. The architecture reflects real-world financial services requirements, where regulatory, compliance, and data residency constraints influence where models are deployed.

    A distributed architecture for regulated environments

    In this pattern, a loan approval classifier runs on Google Cloud using Vertex AI, while an ONNX-based regression model for interest rate prediction is deployed on Red Hat OpenShift AI running on premise. Many financial institutions require sensitive customer and risk data to remain on premise or within tightly controlled environments. Deploying OpenShift AI on premise enables these organizations to run ML workloads close to regulated data while still integrating with cloud-based services.

    A lightweight React frontend and a FastAPI backend orchestrate both models, delivering a unified application experience despite the distributed deployment. The models are intentionally hosted in different environments to illustrate a core hybrid principle: models run where data resides.

    The architecture also includes a Llama-based chatbot deployed on Red Hat OpenShift Container Platform using Red Hat AI Inference Server. This setup provides efficient on premise inference and contextual guidance while maintaining full control over enterprise data.

    This article walks through the implementation, integration, and operational considerations of this hybrid AI setup. It shares a realistic and reproducible pattern for organizations building intelligent applications across OpenShift and Google Cloud, especially for teams with strong systems or DevOps backgrounds. You can find the project on GitHub.

    Figure 1 shows the overall architecture.

    A hybrid cloud architecture diagram showing data flow between a User Frontend, Backend, Red Hat AI on-premise, and Google Cloud Platform.
    Figure 1: A loan approval workflow using Red Hat AI for interest rate prediction and financial advisory services, integrated with Google Cloud for model training and data storage.

    Build a hybrid AI foundation with Google Cloud and Vertex AI

    The process begins with a data ingestion pipeline and a gatekeeper model deployed in Vertex AI. This model serves as the first line of defense, determining whether a loan application is approved or rejected.

    The architecture: Cloud-first ingestion

    Our architecture begins by ingesting historical loan data. To handle this data reliably, we use Google Cloud Pub/Sub. The application publishes the loan data to a specific topic, which triggers a subscription to push that data directly into BigQuery.

    This setup ensures that our training data is decoupled from our application logic, allowing for scalable data accumulation.

    Feature engineering with SQL

    Once the data is in BigQuery, you can use SQL for initial preprocessing instead of complex Python scripts. The goal is to create a synthetic label called loan_approval_status that the model learns to predict.

    A SQL query applies business logic to the raw data:

    • Auto-approve: If the credit score and income meet the thresholds (for example, a score greater than 700), the status is set to 1.
    • Reject: If the criteria aren't met, the status is set to 0.

    This creates a clean, labeled dataset (loan_training_data_v1) ready for training.

    Vertex AI AutoML: The gatekeeper model

    With the dataset ready, use the Vertex AI AutoML tabular training feature to build a classification model.

    1. Input: The BigQuery table we just created.
    2. Target column: loan_approval_status.
    3. Training: Vertex AI automatically tests various algorithms to find the best fit.

    Once trained, deploy this model to a Vertex AI endpoint. This endpoint acts as the gatekeeper for our application.

    The logic is as follows:

    • If the model predicts Approve with a confidence score greater than 75%, the request proceeds to the next stage.
    • If the prediction is Reject or has low confidence, the process stops, saving computational resources for downstream models.

    Test the endpoint

    Using the Google Cloud console, you can send a sample JSON payload:

    {
      "instances": [{
        "credit_score": 720,
        "annual_income": 85000,
        "loan_amount": 20000
      }]
    }

    The result returns a prediction of 0 or 1 and a confidence score. This outcome determines if the application continues to the next stage of the workflow.

    Moving to the edge with OpenShift AI and predictive modeling

    Many industries require sensitive financial logic, such as calculating specific interest rates, to run closer to the data or on specific on-premise infrastructure.

    We use Red Hat OpenShift AI to handle the second stage of the pipeline: a regression model that predicts the exact interest rate for approved loans. This approach helps ensure that sensitive data remains on premise by bringing the model to the data.

    The environment: OpenShift AI

    For this demo, we run OpenShift AI on a Google Cloud cluster to simulate a hybrid setup. Our development work takes place in Jupyter notebooks provided by OpenShift AI workbenches.

    Workflow: From cloud to container

    Building a predictive model on premise requires a pipeline that brings cloud storage with your containerized environment. This process involves the following steps:

    1. Data sync: We pull the approved loan data from a Google Cloud Storage (GCS) bucket into the OpenShift environment. You can also use Apache Spark or pull data directly from the application that publishes the historical data.
    2. Feature engineering: Use pandas and scikit-learn in the notebook to refine the features for interest rate prediction.
    3. Training: Train a TensorFlow and Keras regression model. Unlike the classification model described earlier, this model outputs a continuous value for the interest rate.

    Standardize with ONNX

    To ensure the model is portable and optimized for inference, convert the TensorFlow model into the ONNX (Open Neural Network Exchange) format. This setup serves the model using various runtimes without being locked into specific framework dependencies.

    Deploy with OpenVINO Model Server

    OpenShift AI makes deployment simple. We use the OpenVINO Model Server runtime.

    1. Upload the ONNX model to an S3-compatible bucket (or back to GCS).
    2. In the OpenShift AI dashboard, create a model server.
    3. Deploy the model to expose an inference endpoint using gRPC or REST.
    !curl -X POST -H "Content-Type: application/json" \
    -d '{"inputs": [{"name": "input:0", "shape": [1, 5], "datatype": "FP32", "data": [720, 95000, 35000, 60, 0.04]}]}' \
    https://interest-rate-loan-rate-model.apps.<domain>/v2/models/interest-rate/infer
    {
        "model_name": "interest-rate",
        "model_version": "5",
        "outputs": [{
                "name": "Identity:0",
                "shape": [1, 1],
                "datatype": "FP32",
                "data": [11.088425636291504]
            }]
    }

    As you can see, the request returns a predicted interest rate of 11.08% for these inputs. We now have two endpoints:

    • Vertex AI: This endpoint decides if a loan is approved.
    • OpenShift AI: This endpoint determines the interest rate for the approved loans.

    Next, we combine these two endpoints in the frontend UI to route traffic between the models.

    Orchestration, LLMs, and the intelligent user experience

    We have two deterministic models running in a hybrid environment. However, displaying a raw number like 11.08% is not a helpful experience for the user. We need to orchestrate these models to present the data in a clear, useful way.

    A large language model (LLM) running on Red Hat OpenShift acts as a financial advisor. A Python backend and a React frontend connect the system components.

    Infrastructure spotlight: GPU provisioning

    To run high-performance workloads and a chatbot, you need GPUs. OpenShift simplifies this using MachineSet resources.

    Define a MachineSet for an NVIDIA A100 node and verify that the NVIDIA GPU Operator is running to manage drivers and toolkit injection. Then, apply taints and tolerations to ensure only specific AI workloads land on these expensive GPU nodes. Adding GPU nodes in OpenShift is a straightforward process.

    Hosting Llama 3 on OpenShift

    Once our cluster is GPU-enabled, we chose the Llama 3 8B Instruct model for this project. The model strikes a balance between performance and resource usage, fitting comfortably on a single NVIDIA A100 GPU.

    To serve this LLM, we use Red Hat Inference Server, a high-throughput and memory-efficient serving engine. We deploy vLLM as a custom serving runtime in OpenShift AI. This setup exposes an API compatible with standard chat interfaces.

    The backend: The orchestrator

    A Python backend serves as the core of this application, routing requests based on business logic. The infer.py script executes the following workflow:

    1. Receive request: The user submits data from the frontend.
    2. Call Vertex AI: The backend sends a request to the Google Cloud endpoint.
      • If the model rejects the request, the process stops.
      • If the model approves the request with more than 75% confidence, the backend proceeds.
    3. Call OpenShift AI: The backend calls the OpenVINO endpoint to determine the interest rate.
    4. Prompt engineering: The backend uses the approval status and interest rate to construct a prompt for the LLM: You are a helpful financial advisor. The customer's loan was approved at a rate of 10.5%. Explain this to them and offer financial advice.
    5. Call OpenShift AI: The backend sends the prompt to the vLLM endpoint, which returns a natural language response.

    The frontend: Context-aware chat

    The React frontend offers two modes. In the Prediction form, users input their credit score and income. The Loan assistant chat provides a context-aware interface for further interaction.

    If a user gets rejected, they can ask the chatbot why they were denied. Because the backend passed the prediction context to the LLM, the chatbot can explain that a credit score of 500 is just below the threshold, for example. It can then suggest improvements, like lowering the requested amount or consolidating debt.

    Conclusion

    This project demonstrates that modern AI solutions rarely rely on just one model. Success comes from a combination of hybrid infrastructure and hybrid models, where Google Cloud and Red Hat OpenShift work together. This foundation allows traditional predictive AI, such as regression and classification, to combine with generative AI. By orchestrating these components, we create applications that are accurate, engaging, and helpful.

    Related Posts

    • Configure NVIDIA Blackwell GPUs for Red Hat AI workloads

    • Automate AI agents with the Responses API in Llama Stack

    • How to collaborate with AI to improve your Ansible skills

    • Estimate GPU memory for LLM fine-tuning with Red Hat AI

    • Practical strategies for vLLM performance tuning

    • Synthetic data for RAG evaluation: Why your RAG system needs better testing

    Recent Posts

    • Hybrid loan-decisioning with OpenShift AI and Vertex AI

    • Rebalance hub workloads with managed cluster migration

    • How to operate OpenShift in air-gapped environments

    • Automate test and failure analysis via streams for Apache Kafka

    • LLM Compressor v0.10: Faster compression with distributed GPTQ

    What’s up next?

    Learning Path intro-to-OS-LP-feature-image

    Introduction to OpenShift AI

    Learn how to use Red Hat OpenShift AI to quickly develop, train, and deploy...
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue