Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • View All Red Hat Products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Secure Development & Architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • Product Documentation
    • API Catalog
    • Legacy Documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Krkn-AI: A feedback-driven approach to chaos engineering

October 21, 2025
Rahul Shetty Naga Ravi Chaitanya Elluri
Related topics:
Artificial intelligenceAutomation and managementDevOpsKubernetesMicroservices
Related products:
Red Hat OpenShift

Share:

    Chaos engineering is the practice of deliberately introducing controlled failures into a system to uncover weaknesses before they affect end users. By continuously running chaos experiments, teams can build greater confidence in their systems and identify real performance bottlenecks. However, applying chaos in real-world environments can be challenging due to the complex, dynamic nature of applications and infrastructure, especially in environments like Kubernetes.

    In this article, we introduce Krkn-AI, a project that addresses these challenges by providing a framework for AI-assisted, objective-driven chaos testing.

    The challenge of reliability in modern systems

    Modern applications are no longer monolithic programs. They are distributed, cloud-native, and composed of dozens—sometimes hundreds—of microservices spanning clusters and regions. This architecture enables flexibility and scalability, but it also introduces new layers of complexity. A small glitch in a single service can ripple outward, causing system-wide disruptions. Even brief downtime can result in revenue loss and eroded user trust. Reliability has become more than an engineering goal—it is a business-critical requirement.

    The challenge lies in the unpredictability of real-world conditions. Sudden traffic surges, misconfigured pods, dependency failures, or hardware degradation can interact in ways that are nearly impossible to anticipate. Traditional testing, focused on correctness under normal conditions, cannot expose these unpredictable issues. Chaos engineering helps by simulating failures, but defining meaningful experiments and keeping them relevant as systems evolve is difficult. Today’s systems demand resilience strategies that adapt dynamically, learn intelligently, and continuously strengthen the system’s ability to withstand failures.

    Why traditional chaos engineering isn’t enough

    Chaos engineering has proven its value by exposing weaknesses through controlled failures—like shutting down pods, adding latency, or simulating resource exhaustion. But most practices still rely on manually defined experiments, where engineers decide what to break and then spend hours sifting through logs and metrics.

    Real-world systems, however, are dynamic. Kubernetes clusters scale, workloads shift, and dependencies evolve, making static experiments insufficient. To keep pace, chaos testing must be adaptive, automated, and capable of learning from the system itself. Instead of guessing scenarios, engineers need tools that intelligently target weak points, run experiments, and deliver actionable insights. That’s where Krkn-AI comes in.

    Our vision: Krkn-AI

    Krkn-AI isn’t just another chaos testing tool—it’s designed to make resilience testing smarter, more automated, and easier to adopt. By combining evolutionary algorithms with the Krkn project, it automates experiment discovery, execution, and analysis so that engineers can focus on insights instead of manual setup.

    Key highlights:

    • Auto-framework for chaos engineering eliminates the need to manually author chaos experiments.
    • Cluster-aware discoverability automatically scans your Kubernetes cluster to identify components and services.
    • Enhanced test coverage detects complex system-disrupting paths that manual testing or human analysis might overlook due to the vast fault space.
    • Objective-driven testing (service-level objective–aware) lets users align chaos testing directly with business and operational goals (for example, latency, error rates, availability).
    • Built-in health checks quickly surface which failures actually degrade performance or availability of your application by incorporating real-time monitoring.

    How Krkn-AI works

    Krkn-AI applies a genetic algorithm—an evolutionary technique inspired by natural selection—to refine chaos experiments automatically. Instead of relying on static, pre-defined tests, it generates, evaluates, and evolves scenarios based on measurable system impact, guided by service-level objectives (SLOs) and real-time health signals.

    How it works:

    1. Generate scenarios: Krkn-AI can run a variety of experiments, from single-fault (killing pods, adding latency, exhausting resources) to multi-fault (parallel and sequential complexity).
    2. Evaluate impact: Each experiment is measured against SLOs (like response time thresholds) and application health checks (latency, error rates, availability).
    3. Score results: Experiments that expose more stress or performance degradation receive higher scores.
    4. Evolve tests: The algorithm carries forward the most impactful experiments, mutates them, and produces a new generation of scenarios.
    5. Refine continuously: This feedback loop iteratively converges on tests that uncover hidden bottlenecks and weak points, without engineers needing to guess what to break next.

    Try it out

    The Krkn-AI getting started guide shows you how to set up a microservice on Kubernetes or Red Hat OpenShift and run your first test.

    For installation details, configuration options, and advanced scenarios, check out the documentation.

    Watch a short video demo:

    To learn more about the project, and if you are interested in using and contributing, visit the Krkn-AI - GitHub repository.

    Related Posts

    • Enhancing system resilience with Krkn chaos dashboard

    • Unleash controlled chaos with krknctl

    • Istio Chaos Engineering: I Meant to Do That

    • Enhancing system resilience with Krkn chaos dashboard

    • Manage Advanced Cluster Management policies using Ansible

    • Try Istio ambient mode on Red Hat OpenShift

    Recent Posts

    • Krkn-AI: A feedback-driven approach to chaos engineering

    • How to import provider network routes to OpenShift via BGP

    • A case study in Kubelet regression in OpenShift

    • Profiling vLLM Inference Server with GPU acceleration on RHEL

    • Network performance in distributed training: Maximizing GPU utilization on OpenShift

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue