Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Your LLM is too large: How I generate production-ready failure analysis on a toaster

Why pattern preprocessing makes small models mighty

September 2, 2025
Caleb Evans
Related topics:
Artificial intelligenceAutomation and managementDevOpsKubernetesPlatform engineering
Related products:
Red Hat AIRed Hat OpenShift

    I'm running production-grade Kubernetes failure analysis on an edge computing device—a piece of hardware that costs less than what many teams spend on LLM API calls in just two to three months. The model is Llama 3.2:3B with 4-bit quantization, delivering comprehensive root cause analysis in 70 seconds that, for common production failures, matches the practical value of commercial models.

    Let me show you how pattern preprocessing fundamentally changes the economics and performance of production AI.

    The challenge with LLMs in production

    When you send 10,000 lines of raw logs to a state-of-the-art LLM, you're essentially paying it to rediscover what grep already knows. Common patterns like connection refused, out of memory errors, and permission failures were solved decades ago. Yet we're burning tokens to teach sophisticated AI systems to recognize basic connection timeouts.

    Despite their compute power, large models can struggle with the signal-to-noise ratio in raw logs. They excel at complex reasoning but may miss obvious patterns hidden in thousands of lines of output.

    Enter pattern preprocessing: The architecture

    Instead of throwing raw logs at an LLM and hoping for brilliance, I built a multilayer system:

    Raw Logs (10,000 lines) 
        ↓
    Pattern Engine (deterministic regex matching)
        ↓
    Scored & Contextualized Matches (500 lines)
        ↓
    Small LLM
        ↓
    Human-Readable Analysis

    The pattern engine does the heavy lifting—identifying known failures, extracting relevant context, and scoring matches based on severity and proximity. The LLM then takes this pre-digested information and explains it like a senior engineer would.

    Real-world results: 270 lines to root cause in 70 seconds

    Let me show you actual output from analyzing a Kubernetes pod failure with database connection issues. The system processed 270+ lines of logs and delivered this analysis:

    Output

    Root Cause: The primary root cause of the pod failure is the **REST Client Connection Error** 
    with a score of 9.775666542449912. This error occurred due to the remote service being unreachable, 
    leading to multiple timeouts and ultimately causing the REST client to fail.
    Evidence: 
    * Log message from Line 13: `ConnectException: Connection refused (Connection refused)`
    * Context Before: The Quarkus application was started in 2.234s.
    * Context After: A `java.net.ConnectException` occurred with a connection refused error.
    Fix: 
    1. Verify that the remote service is running and accessible by checking its health endpoints or status pages.
    2. If the remote service is down, restart it or investigate any underlying issues causing the downtime.
    3. Once the remote service is up and running, update the Quarkus configuration to increase the timeout 
       values for the REST client.
    Example command:
    # Increase timeout values in configuration
    echo "quarkus.rest.client.timeout=30" >> application.properties
    Prevention: 
    1. Regularly monitor the health of remote services using health endpoints or status pages.
    2. Implement circuit breakers or retries to handle temporary connectivity issues.
    3. Consider using a load balancer or service discovery mechanism to ensure that requests are routed 
       to available instances of the remote service.
    Sequence of Events: 
    * Initial trigger: The REST client attempts to connect to the remote service, but it is unreachable 
      due to being down or unavailable (Line 13).
    * Cascade: The `ConnectException` causes multiple timeouts and ultimately leads to a failure of the 
      REST client (Lines 14-17, 30-32).
    * Final state: The pod fails due to the propagation of the errors from the REST client connection error.
    By addressing the root cause of the REST Client Connection Error and implementing preventative 
    measures, you can reduce the likelihood of similar failures occurring in the future.

    The performance comparison

    Traditional approach:

    • Input: All 270 lines sent to large commercial LLM
    • Cost: $0.30-3.00 per analysis (depending on model/service)
    • Result: Often generic advice

    Preprocessed approach (above):

    • Input: Pattern engine identifies key events, sends contextualized lines to the model
    • Cost: <$0.001
    • Result: Specific root cause, evidence trail, and actionable remediation

    All from a model that fits in 2 GB of RAM.

    Why this changes everything

    • Accessible hardware: This approach runs on consumer-grade hardware rather than enterprise GPU clusters. I'm using an edge device originally designed for autonomous vehicles, but it works equally well on a decent laptop with a GPU.
    • Dramatic cost reduction: We achieve a 99.7% reduction in inference costs. In practical terms, a traditional LLM approach costs more for a single analysis than this system costs for an entire day of operations.
    • Speed without sacrifice: Prefiltered context means the model sees exactly what matters. The system focuses on relevant error patterns rather than processing thousands of lines of startup logs and normal operations.
    • Community intelligence: These patterns represent community knowledge that can be shared and improved collectively, similar to how antivirus definitions work:

      
      patterns:
        - id: "quarkus_connection_pool_exhausted"
          primary_pattern:
            regex: "Connection pool.*exhausted|Unable to acquire connection"
            confidence: 0.95
          secondary_patterns:
            - regex: "timeout.*waiting for connection"
              weight: 0.7
          remediation:
            description: "Database connection pool is exhausted"
            common_causes:
              - "Spike in traffic"
              - "Connection leak in application"
              - "Database performance degradation"

      Every pattern is reviewable, versioned, and improvable through standard Git workflows. Your senior engineers' knowledge becomes codified, shareable, and composable.

    Beyond logs: Expanding the pattern

     Once you have pattern preprocessing infrastructure, the same approach can apply to many domains:

    • Metrics anomalies: Patterns for CPU spikes, memory leaks, disk pressure
    • Security events: Known attack signatures, suspicious access patterns
    • Performance regressions: Response time degradations, throughput drops
    • User behavior: Error click patterns, rage-quit sequences

    The same architecture that makes log analysis efficient works for any structured data where domain expertise exists.

    Building your own pattern-augmented system

    The pattern is simple:

    1. Collect domain patterns: Start with your runbooks. Every "if you see X, do Y" is a pattern.
    2. Build a scoring engine: Patterns rarely appear in isolation. Score them by severity, proximity, and temporal relationships.
    3. Create context windows: Extract relevant surrounding information for each match.
    4. Choose a small model: Llama 3.2, Phi-3, or even Mistral 7B work brilliantly with good context.
    5. Iterate with feedback: Every false positive or negative improves the patterns.

    The GitOps advantage for AI knowledge

    Managing patterns in Git provides unexpected benefits:

    • Code review for AI: Pattern changes go through standard review processes.
    • Accountability: Git blame shows who added or modified each pattern.
    • Collaborative improvement: Teams can iterate on patterns based on real incidents.
    • Versioned intelligence: Roll back patterns if they cause issues.

    This approach makes AI knowledge manageable and auditable by engineering teams.

    Back to reality

    Let me be clear: Large language models are remarkable. On general knowledge tasks, creative writing, and complex reasoning, they're in a different league. But for production operations—where patterns are known, speed matters, and costs compound—pattern preprocessing with small models isn't just competitive; it's superior for the majority of common failure scenarios.

    The "toaster" in my title is a deliberate exaggeration—my edge device is considerably more capable. But the point stands: the future of production AI isn't necessarily bigger models. It's smarter engineering around smaller ones.

    Practical next steps

    Before investing in large-scale AI infrastructure, consider these questions:

    • What percentage of your problems are truly novel versus known patterns?
    • How much you're currently spending to analyze well-understood failures?
    • Can your team's expertise be codified into reviewable patterns?

    If you're interested in seeing this in action, I've open-sourced the entire stack. The Podmortem operator demonstrates pattern-augmented analysis for Kubernetes, but the principles apply everywhere.

    Note: Performance metrics are based on real-world testing with common Kubernetes failure patterns. Results may vary based on specific use cases and pattern coverage.

    A simple way to experiment: Podman AI Lab

    The idea of running powerful models on your laptop might seem complex, but tools are emerging to make it incredibly straightforward. If you want to experiment with the pattern-augmented approach I've described, one of the easiest ways to get started is with Podman AI Lab.

    Podman AI Lab lets you download and run popular, optimized models with just a few commands. It handles the environmental setup for you, so you can focus on development and experimentation with AI. For hands-on guides to getting a model running in minutes, check out these excellent articles:

    • Getting started: AI meets containers: My first step into Podman AI Lab
    • Building an application: Build your AI application with an AI Lab extension in Podman Desktop

    Conclusion: Intelligence is more than model size

    The industry has focused heavily on model size as the primary metric for capability. But effective intelligence isn't just about raw capability—it's about applying the right tool to the right problem. When you combine human expertise (patterns) with AI explanation (small LLMs), you get something powerful: production-ready AI that's fast, cheap, and reliable.

    Related Posts

    • What is GPU programming?

    • How to use LLMs in Java with LangChain4j and Quarkus

    • How we optimized vLLM for DeepSeek-R1

    • How I built an agentic application for Docling with MCP

    • AI meets containers: My first step into Podman AI Lab

    • Ollama vs. vLLM: A deep dive into performance benchmarking

    Recent Posts

    • Protect data offloaded to GPU-accelerated environments with OpenShift sandboxed containers

    • Case study: Measuring energy efficiency on the x64 platform

    • How to prevent AI inference stack silent failures

    • Preventing GPU waste: A guide to JIT checkpointing with Kubeflow Trainer on OpenShift AI

    • How to manage TLS certificates used by OpenShift GitOps operator

    What’s up next?

    Open source AI for developers introduces and covers key features of Red Hat OpenShift AI, including Jupyter Notebooks, PyTorch, and enhanced monitoring and observability tools, along with MLOps and continuous integration/continuous deployment (CI/CD) workflows.

    Get the e-book
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.