Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Static analysis updates in GCC 11

January 28, 2021
David Malcolm
Related topics:
SecurityC, C#, C++Open sourceLinux

Share:

    The GNU logo.
    I work at Red Hat on the GNU Compiler Collection (GCC). In GCC 10, I added the new -fanalyzer option, a static analysis pass for identifying various problems at compile-time, rather than at runtime. The initial implementation was aimed at early adopters, who found a few bugs, including a security vulnerability: CVE-2020-1967. Bernd Edlinger, who discovered the issue, had to wade through many false positives accompanying the real issue. Other users also managed to get the analyzer to crash on their code.

    I've been rewriting the analyzer to address these issues in the next major release, GCC 11. In this article, I describe the steps I'm taking to reduce the number of false positives and make this static analysis tool more robust.

    Tracking program states

    I've been attempting to fix bugs in -fanalyzer as they are reported via GCC's Bugzilla instance. The analyzer's state-tracking component in GCC 10 had many crasher bugs. The more bugs I fixed, the more bugs turned up, with no apparent slowdown in the rate of discovery. This suggested to me that I needed to rewrite the component.

    I made at least two big mistakes in how I tracked program states in the original -fanalyzer implementation. These were in how I tracked symbolic values and regions. The GCC 10 implementation attempted to assign unique IDs to these symbolic entities and canonicalize them so that different states could be compared (equivalent entities ought to have the same ID between different states). Unfortunately, there was always one more canonicalization issue.

    In the new implementation, I've made these entities singletons. As a result, a unique object now represents the (symbolic) initial value of a particular parameter at a function call at the entry to the analysis. The change to singletons got rid of large amounts of fiddly canonicalization code, using simple pointers instead. The implementation is simpler, faster, and I've been able to fix all of the crasher bugs. (I'm not quite sure what benefit I saw in the original approach, but hindsight is 20/20, I guess.)

    The second big change is in what the symbolic values and regions represent. Previously, I represented a mapping to symbolic values, where the keys were symbolic access paths of memory regions. In the new implementation, I've represented the state as mappings of clusters of bit-offsets within memory. These are sometimes concrete (for example, at a specific bit-offset) and sometimes symbolic (such as an array offset where the index is symbolic). This approach does a much better job of handling unions, pointer aliasing, and so forth. Additionally, lots of fiddly bugs "fixed themselves" when I switched to the new implementation, which reassured me that I was on the right track.

    Memory leak detection and non-determinism

    I had to rewrite memory leak detection for the new implementation completely. That said, the old implementation had many false positives, whereas the new one seems much less prone to them.

    Another issue I ran into is non-determinism, where the analyzer's exact behavior would vary from invocation to invocation. At various places, the implementation would iterate though values, and the order of iteration would depend implicitly on precise pointer values due to hashing algorithms. The pointer values can differ due to address-space layout randomization, which led to different results. I've now fixed such logic in the code to ensure that the analyzer's behavior is repeatable from run to run.

    Four new warnings

    The GCC 10 implementation of -fanalyzer added 15 warnings:

    • Warnings relating to memory management:
      • -Wanalyzer-double-free
      • -Wanalyzer-use-after-free
      • -Wanalyzer-free-of-non-heap
      • -Wanalyzer-malloc-leak
    • Warnings relating to missing error-checking or misusing NULL pointers:
      • -Wanalyzer-possible-null-argument
      • -Wanalyzer-possible-null-dereference
      • -Wanalyzer-null-argument
      • -Wanalyzer-null-dereference
    • Warnings relating to stdio streams:
      • -Wanalyzer-double-fclose
      • -Wanalyzer-file-leak
    • Warnings relating to use-after-return from stack frames:
      • -Wanalyzer-stale-setjmp-buffer
      • -Wanalyzer-use-of-pointer-in-stale-stack-frame
    • Unsafe call warning:
      • -Wanalyzer-unsafe-call-within-signal-handler
    • Proof-of-concept warnings:
      • -Wanalyzer-tainted-array-index
      • -Wanalyzer-exposure-through-output-file

    For GCC 11, I've added four new warnings:

    • -Wanalyzer-write-to-const
    • -Wanalyzer-write-to-string-literal
    • -Wanalyzer-shift-count-negative
    • -Wanalyzer-shift-count-overflow

    Each of these corresponds to a pre-existing warning implemented in the C and C++ front ends, but with a "-Wanalyzer" prefix rather than "-W." As an example, -Wanalyzer-write-to-const corresponds to -Wwrite-to-const. It's important to note that the two implementations are slightly different: Whereas the existing warning merely walks the syntax tree of a particular expression, the analyzer variant does an interprocedural path-based analysis, looking for code paths that attempt to write to a const global.

    After discussing whether to reuse the existing command-line options for such warnings, I chose to create new options to make it explicit that the warnings are implemented differently. The -Wanalyzer-prefixed warnings will find more issues, but they are much more expensive at compile-time. (Though you've already paid that price by choosing -fanalyzer.)

    In progress: Attributes for marking APIs

    GCC has long had __attribute__((malloc)) for marking an API entry point as being a memory allocator. In previous GCC releases, this was purely a hint to the optimizer's pointer-aliasing logic. The attribute let the optimizer "know" that the pointer returned from the function pointed to different memory than the other pointers being optimized. The optimizer could then eliminate reads from locations that had not been clobbered after a write through the returned pointer.

    In GCC 11, this attribute can now take an additional parameter marking which deallocator function should be called on the result. I'm working on generalizing -fanalyzer to warn about mismatches, leaks, and double-frees for APIs marked with this attribute. So far, however, it's unclear if the results will be useful without many additional attributes. For example, I attempted to use the following attribute to detect a leak in a Linux driver (CVE-2019-19078):

      extern struct urb *usb_alloc_urb(int iso_packets, gfp_t mem_flags);
      extern void usb_free_urb(struct urb *urb);
    

    I added the attribute to mark the fns as an allocation/deallocation pair, where there is a leak of an urb on an error-handling path. Unfortunately, various other functions take struct urb *, and the analyzer conservatively assumes that an urb passed to them might or might not be freed. It thus stops tracking state for them and only reports the issue if I disable much of the intervening code. This feature needs additional work to be useful except in the simplest cases.

    In progress: HTML output

    The analyzer's emitted control flow paths can be very verbose, so I've been experimenting with other forms of output. I have an implementation of HTML output, in which the path information is written out to a separate HTML file. Here are a few examples:

    • Double-free bug
    • Signal handler issue
    • Memory leak (due to longjmp past a free)

    The HTML path output shows stack frames and runs of events, using drop-shadows to give a 3D look. The idea is to highlight the stack of frames as if it were an actual stack of overlapping cards. I also added JavaScript to use j and k to move forward and back through control-flow events.

    Unfortunately, the HTML output doesn't capture the warnings themselves, just the paths. Fixing that would require deep changes to GCC's diagnostics subsystem, which I'm wary of doing at this point in the development cycle. So, I'm not sure I've found the best way to enable the HTML format as an option; it seems better to capture all of the diagnostics somehow as build artifacts, rather than just the paths of those diagnostics that have paths associated with them.

    What's next for GCC 11 and -fanalyzer

    We're in the bug-fixing phase of GCC 11 development, aiming for a release in the spring of 2021. The analyzer still needs a fair bit of bug-fixing, and we're working on scaling it up. I plan to focus on that for this first part of the new year. (These problems can be related, by the way: Bugs sometimes lead to loop-handling going awry. The analyzer will then attempt to effectively unroll a loop, which leads to hitting a safety limit and a slow, incomplete analysis.)

    I am still developing -fanalyzer only for C in GCC 11. I added partial support for C++'s new and deleteBut there are enough missing features that it's not yet worth using on real C++ code. I plan to make the analyzer robust and scalable for C code in GCC 11 and defer C++ support to GCC 12.

    GCC 11 will be in Fedora 34, which should also be out in the spring of 2021. For simple code examples, you can play around with the new GCC online at godbolt.org. Select your GCC "trunk" and add -fanalyzer to the compiler options. Have fun!

    Last updated: February 8, 2021

    Recent Posts

    • How Kafka improves agentic AI

    • How to use service mesh to improve AI model security

    • How to run AI models in cloud development environments

    • How Trilio secures OpenShift virtual machines and containers

    • How to implement observability with Node.js and Llama Stack

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue