Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Detecting nondeterministic test cases with Bunsen

June 9, 2022
Serhei Makarov
Related topics:
CI/CDPython
Related products:
Developer Tools

Share:

    Many open source projects have test suites that include nondeterministic test cases with unpredictable behavior. Tests might be nondeterministic because they launch several parallel processes or threads that interact in an unpredictable manner, or because they depend on some activity in the operating system that has nondeterministic behavior. The presence of these tests can interfere with automated regression checking in CI/CD pipelines. This article shows how to automate the discovery of nondeterministic test cases using a short Python script based on the Bunsen test suite analysis toolkit.

    The problem: Finding nondeterministic ("flaky") test cases

    Test cases in an open source project's test suite can have nondeterministic behavior and produce different outcomes when run repeatedly. Such test cases are commonly referred to as flaky, and their presence in a test suite tends to complicate the evaluation of test results. Without additional investigation, a PASS or FAIL outcome for a nondeterministic test case doesn't conclusively prove the presence or absence of a problem.

    Nondeterministic test cases are usually found in test suites of projects such as SystemTap and the GNU Debugger (GDB) because they provide value when testing the project's functionality under ideal conditions. Rewriting these test suites to eliminate nondeterminism would be a large and low-priority task tying up a large amount of scarce developer time. Therefore, it's worthwhile to develop tools to analyze test results from a project and identify nondeterministic test cases. A developer reading test results could use this analysis to recognize nondeterministic test cases and interpret their outcomes separately from outcomes of reliable test cases.

    In a previous article, Automating the testing process for SystemTap, Part 2: Test result analysis with Bunsen, I described Bunsen, a tool that collects a set of test result log files from a project and stores them in a deduplicated Git repository together with an index in JSON format. Bunsen also provides a Python library for accessing the data in this repository. These capabilities can be used to implement a script to detect nondeterministic test cases.

    Developing the script

    The overall strategy of the script is to find test cases that have been run multiple times on the same system configuration with varying outcomes. Such test cases are likely to be nondeterministic.

    Basic setup

    The analysis script starts by importing and initializing the Bunsen library:

    1 #!/usr/bin/env python3
    
    2  info="""Detect nondeterministic testcases that yield different outcomes when tested
    3  multiple times on the same configuration."""
    
    4  from bunsen import Bunsen, BunsenOptions
    
    5  if __name__=='__main__':
    6     BunsenOptions.add_option('source_repo', group='source_repo',
    7          cmdline='source-repo', default=None,
    8          help_str="Use project commit history from Git repo <path>",
    9          help_cookie="<path>")
    
    10    BunsenOptions.add_option('branch', group='source_repo', default=None,
    11        help_str="Use project commit history from <branch> in source_repo",
    12        help_cookie="<branch>")
    
    13    BunsenOptions.add_option('project', group='filtering', default=None,
    14        help_str="Restrict the analysis to testruns in <projects>",
    15        help_cookie="<projects>")
    16
    17 import git
    18 import tqdm
    19 from common.utils import * # add_list, add_set
    
    20 if __name__=='__main__':
    21
    22     b, opts = Bunsen.from_cmdline(info=info)
    23     projects = opts.get_list('project', default=b.projects)
    24     repo = git.Repo(opts.source_repo)
    
           for # Iterate all testruns in projects
                   # Collect information from the testrun
            # Print the results

    A Bunsen analysis script is a Python program that imports the bunsen module. Lines 5-15 of the preceding script define the following options:

    • source_repo identifies a Git repository containing up-to-date source code for the project. The commit history of this repository identifies the relative version order of test runs.
    • branch identifies a branch within source_repo.
    • project names a project within the Bunsen repository, and is present because the Bunsen repository can store test results from more than one project. Test results from separate projects are stored in separate branches, and the analysis script can be instructed to scan and compare test results from a single project or from a subset of the projects. If this option is omitted, all test runs in the Bunsen repository will be scanned.

    Options for an analysis script can be passed as command-line arguments or specified in the Bunsen repository's configuration file. For example, if the Bunsen repository is stored under /path/to/bunsen/.bunsen, the configuration file is located at /path/to/bunsen/.bunsen/config.

    The second part of the script (lines 20-24) instantiates the following objects:

    • b, an instance of the Bunsen class providing access to the Bunsen repository
    • opts, an instance of the BunsenOptions class providing access to the script's options
    • repo, an instance of the git.Repo class from the GitPython library, providing access to the version history of the project in the source_repo repository.

    Collecting the test results

    A test case is considered to be nondeterministic if it was tested more than once on the same source_repo commit and system configuration with varying outcomes—a PASS outcome in one test run and a FAIL outcome in another, for instance. To determine which test cases produce varying outcomes, the script gathers a list of test runs for each commit and configuration. The script then iterates through the test runs for each combination and compares the outcomes of each test case for different test runs. The script uses a dictionary named all_testruns to store the list of test runs corresponding to each commit and configuration:

    26 all_testruns = {} # maps (commit, config) -> list(Testrun)
    27
    28 for testrun in b.testruns(opts.projects):
    
    29     commit, config = testrun.get_source_commit(), testrun.get_config()
    
    30     if commit is None: continue
    
    31     add_list(all_testruns, (commit,config), testrun)
    
    for # Iterate all (commit, config)
            # Iterate the set of testruns matching (commit, config),
            # and compare the outcome of each testcase to detect nondeterminism
    
    # Print the results

    An instance of the Testrun class in the Bunsen library represents a single test run. The instance provides access to the commit that was tested, the system configuration, and the outcomes of individual test cases. The all_testruns dictionary, defined on line 26, maps a (commit, config) pair to a list of Testrun instances.

    For each test run, the loop invokes the utility method add_list at line 31 to add the test run to the dictionary. The add_list method is a simple utility method that appends a value to a list stored at a specified key:

    def add_list(d,k,v):
        if k not in d: d[k] = []
        d[k].append(v)

    Identifying the nondeterministic test cases

    Next, the script iterates over the list of Testrun objects for each commit and configuration. To record the list of test cases that produced varying outcomes, the script uses a second dictionary named known_flakes, whose keys are (testcase, config) pairs:

    26 all_testruns = {} # maps (commit, config) -> list(Testrun)
    27
    28 for testrun in b.testruns(opts.projects):
    29     commit, config = testrun.get_source_commit(), testrun.get_config()
    30     if commit is None: continue
    31     add_list(all_testruns, (commit,config), testrun)
    32
    33 known_flakes = {} # maps (tc_info, config) -> set(commit)
    34 # where tc_info is (name, subtest, outcome)
    35
    36 for commit, config in tqdm.tqdm(all_testruns, \
    37     desc="Scanning configurations", unit="configs"):
    
        if len(all_testruns[commit, config]) <= 1:
            continue # no possibility of flakes
        commit_testcases = {} # maps tc_info -> list(Testrun)
        for testrun in all_testruns[commit, config]:
            # Gather the list of failing tc_info tuples appearing in testrun
        for # each tc_info tuple that appears in some testrun":
            # Check whether the failing tuple appears in all testruns;
            # If not, mark the tuple as a flake
    
    # Print the results

    The second loop, iterating over the commits and configurations, could take a long time. So the script uses the Python tqdm library to display a progress bar (lines 36-37).

    After the remaining code is filled in, the second loop appears as follows:

    …
    
    36 for commit, config in tqdm.tqdm(all_testruns, \
    37     desc="Scanning configurations", unit="configs"):
    38
    39     if len(all_testruns[commit, config]) <= 1:
    40        continue # no possibility of flakes
    41
    42     commit_testcases = {} # maps tc_info -> list(Testrun)
    43
    44 for testrun in all_testruns[commit, config]:
    45     for tc in testrun.testcases:
    46         if tc.is_pass(): continue
    47         tc_info = (tc.name, tc.outcome, tc.subtest)
    48          add_list(commit_testcases, tc_info, testrun)
    49
    50 expected_testruns = len(all_testruns[commit, config])
    51 for tc_info in commit_testcases:
    52     if len(commit_testcases[tc_info]) < n_testruns:
    53         # XXX tc_info didn't appear in all runs
    54         add_set(known_flakes, tc_info, commit)
    
    …

    The second loop skips (commit, config) pairs for which only one test run was found (lines 39-40). For each of the other test runs, the loop iterates over its test case outcomes and gathers a list of test failures that appear in the test run. Test case outcomes are represented by instances of Bunsen's Testcase class. In accordance with the DejaGNU framework's test result model, a Testcase object has fields called 'name' (the name of the top-level Expect file defining the test case), 'outcome' (one of the standard POSIX outcome codes, such as PASS, FAIL, or UNTESTED), and 'subtest' (a string providing additional information about the outcome).

    A third dictionary named commit_testcases stores failing test case outcomes. The dictionary maps the (name, outcome, subtest) tuple describing the test failure to a list of test runs where this tuple was found to occur. The script assembles commit_testcases on lines 44-48 and iterates over it on lines 51-54 to collect every (name, outcome, subtest) tuple that appeared in some test runs but not all of them. Such a tuple fits our definition of a varying test outcome and therefore is stored in the known_flakes dictionary. The known_flakes dictionary maps each (testcase, config) combination to a set of commit IDs on which that combination was found to produce varying outcomes.

    Reporting the nondeterministic test cases

    Having accumulated a list of suspected nondeterministic tests in the known_flakes dictionary, the script iterates through it and prints the nondeterministic tests' outcomes:

    56 sorted_tc = []
    57 for tc_info in all_testcases:
    58     sorted_tc.append((tc_info, all_testcases[tc_info]))
    59 sorted_tc.sort(reverse=True, key=lambda tup: len(tup[1]))
    60 for tc_info, commits in sorted_tc:
    61     print(len(commits),"commits have nondeterministic",tc_info)

    The script sorts the test outcomes (on lines 56-59) in decreasing order of frequency: test cases found to produce varying outcomes on a larger number of commits are printed first. An additional loop can be added to print the commits on which the test outcomes were found to be nondeterministic:

    60 for tc_info, commits in sorted_tc:
    61     print(len(commits),"commits have nondeterministic",tc_info)
    62     for hexsha in commits:
    63         commit = repo.commit(hexsha)
    64         print("*",commit.hexsha[:7],commit.summary)

    Lines 63-64 use the GitPython library and the git.Repo object that was instantiated at the beginning of the script to retrieve a summary of the commit message.

    The completed analysis script is less than 100 lines of Python code. When tested on a modest laptop (2.3GHz i3-6100U), the script took approximately 42 seconds with a maximum resident memory size of 285MB to scan a Bunsen repository from the SystemTap project containing data from 4,158 test runs across 693 commits. Within that Bunsen repository, 368 (commit, config) pairs were tested by more than one test run and provided useful data for the analysis script. In practice, more complex analysis scripts that compare test case results over time (rather than within the same commit) will tend to have larger RAM requirements.

    When run, the analysis script produces output similar to the following:

    72 commits have nondeterministic ('systemtap.base/attach_detach.exp', 'FAIL: attach_detach (initial load) - EOF\n', 'FAIL')
    
    72 commits have nondeterministic ('systemtap.base/attach_detach.exp', 'FAIL: attach_detach (initial load) - begin seen (1 0)\n', 'FAIL')
    
    61 commits have nondeterministic ('systemtap.examples/check.exp', 'FAIL: systemtap.examples/io/ioblktime run\n', 'FAIL')
    
    51 commits have nondeterministic ('systemtap.base/utrace_p5.exp', 'FAIL: UTRACE_P5_07 unexpected output (after passing output)\n', 'FAIL')
    
    47 commits have nondeterministic ('systemtap.syscall/tp_syscall.exp', 'FAIL: 32-bit madvise tp_syscall\n', 'FAIL')
    
    40 commits have nondeterministic ('systemtap.base/abort.exp', 'FAIL: abort: TEST 6: abort() in timer.profile (using globals): stdout: string should be "fire 3!\\nfire 2!\\nfire 1!\\n", but got "fire 2!\n', 'FAIL')
    
    39 commits have nondeterministic ('systemtap.syscall/tp_syscall.exp', 'FAIL: 64-bit clock tp_syscall\n', 'FAIL')
    
    39 commits have nondeterministic ('systemtap.syscall/tp_syscall.exp', 'FAIL: 32-bit clock tp_syscall\n', 'FAIL')
    
    38 commits have nondeterministic ('systemtap.syscall/tp_syscall.exp', 'FAIL: 32-bit socket tp_syscall\n', 'FAIL')
    
    37 commits have nondeterministic ('systemtap.onthefly/kprobes_onthefly.exp', 'FAIL: kprobes_onthefly - otf_start_disabled_iter_5 (invalid output)\n', 'FAIL')
    
    37 commits have nondeterministic ('systemtap.onthefly/kprobes_onthefly.exp', 'FAIL: kprobes_onthefly - otf_timer_50ms (invalid output)\n', 'FAIL')
    
    36 commits have nondeterministic ('systemtap.syscall/tp_syscall.exp', 'FAIL: 64-bit madvise tp_syscall\n', 'FAIL')
    
    34 commits have nondeterministic ('systemtap.bpf/nonbpf.exp', 'FAIL: bigmap1.stp unexpected output\n', 'FAIL')
    
    33 commits have nondeterministic ('systemtap.onthefly/kprobes_onthefly.exp', 'FAIL: kprobes_onthefly - otf_timer_10ms (invalid output)\n', 'FAIL')
    
    33 commits have nondeterministic ('systemtap.bpf/bpf.exp', 'FAIL: timer2.stp incorrect result\n', 'FAIL')
    
    33 commits have nondeterministic ('systemtap.bpf/bpf.exp', 'KFAIL: bigmap1.stp unexpected output (PRMS: BPF)\n', 'KFAIL')
    
    33 commits have nondeterministic ('systemtap.bpf/bpf.exp', 'FAIL: stat3.stp incorrect result\n', 'FAIL')
    
    33 commits have nondeterministic ('systemtap.onthefly/kprobes_onthefly.exp', 'FAIL: kprobes_onthefly - otf_timer_100ms (invalid output)\n', 'FAIL')
    
    32 commits have nondeterministic ('systemtap.server/client.exp', 'FAIL: New trusted servers matches after reinstatement by ip address\n', 'FAIL')
    
    32 commits have nondeterministic ('systemtap.unprivileged/unprivileged_myproc.exp', 'FAIL: unprivileged myproc: --unprivileged process.thread.end\n', 'FAIL')
    
    32 commits have nondeterministic ('systemtap.base/procfs_bpf.exp', 'FAIL: PROCFS_BPF initial value: cat: /var/tmp/systemtap-root/PROCFS_BPF/command: No such file or directory\n', 'FAIL')
    
    32 commits have nondeterministic ('systemtap.base/abort.exp', 'FAIL: abort: TEST 6: abort() in timer.profile (using globals): stdout: string should be "fire 3!\\nfire 2!\\nfire 1!\\n", but got "fire 3!\n', 'FAIL')
    
    31 commits have nondeterministic ('systemtap.syscall/nd_syscall.exp', 'FAIL: 32-bit clock nd_syscall\n', 'FAIL')
    
    31 commits have nondeterministic ('systemtap.onthefly/kprobes_onthefly.exp', 'FAIL: kprobes_onthefly - otf_start_enabled_iter_4 (invalid output)\n', 'FAIL')
    
    31 commits have nondeterministic ('systemtap.onthefly/kprobes_onthefly.exp', 'FAIL: kprobes_onthefly - otf_start_enabled_iter_5 (invalid output)\n', 'FAIL')
    
    31 commits have nondeterministic ('systemtap.onthefly/kprobes_onthefly.exp', 'FAIL: kprobes_onthefly - otf_start_disabled_iter_3 (invalid output)\n', 'FAIL')
    
    30 commits have nondeterministic ('systemtap.syscall/syscall.exp', 'FAIL: 32-bit clock syscall\n', 'FAIL')

    Conclusion

    This article illustrates how Bunsen's Python library can be used to quickly develop analysis scripts to answer questions about a project's testing history. More generally, the example demonstrates the benefit of keeping a long-term archive of test results that can be used to answer questions about a project's testing history.

    Last updated: February 5, 2024

    Recent Posts

    • GuideLLM: Evaluate LLM deployments for real-world inference

    • Unleashing multimodal magic with RamaLama

    • Integrate Red Hat AI Inference Server & LangChain in agentic workflows

    • Streamline multi-cloud operations with Ansible and ServiceNow

    • Automate dynamic application security testing with RapiDAST

    What’s up next?

    Getting GitOps e-book card

    Learn how to navigate the complex world of modern container-based software development and distribution with Getting GitOps: A Practical Platform with OpenShift, Argo CD, and Tekton.

    Get the free e-book
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue