Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

The state of static analysis in the GCC 12 compiler

April 12, 2022
David Malcolm
Related topics:
C, C#, C++CompilersLinux
Related products:
Red Hat Enterprise Linux

Share:

    Building a static analyzer into the C compiler offers several advantages over having a separate tool, because the analyzer can track what the compiler and assembler are doing intimately. As a Red Hat employee, I work on GCC, the GNU Compiler Collection. Our static analyzer is still experimental but is making big strides in interesting areas, including a taint mode and an understanding of assembly-language code.

    GNU logo

     

     

     

     

    My work on adding static analysis has spanned the past three releases of GCC (versions 10, 11, and 12). The static analysis is enabled through the -fanalyzer option, and works together with several other options. This article describes what we've accomplished on static analysis in the upcoming major release of GCC, GCC 12, which is in feature-freeze and which I hope will be released in April of 2022.

    Uncovering uninitialized values

    It's far too easy for C programmers to mess up by forgetting to initialize a value, whether on the stack or on the heap. C++ ameliorates this risk to some extent with constructors, but in both C and C++ there are plenty of ways to try to use uninitialized memory.

    For GCC 12, I've implemented a new -Wanalyzer-use-of-uninitialized-value warning. This option is similar to the existing -Wuninitialized and -Wmaybe-uninitialized warnings in GCC, but is based on the analyzer's interprocedural, path-sensitive exploration of the code. The upgraded version should therefore be able to detect more problems, return fewer false positives, and provide more details about how a problem can arise. I reworked how the analyzer simulates the contents of memory so that it can answer the question "Is this initialized?" at the per-bit level.

    Consider the sample code at CWE-457: Use of Uninitialized Variable. For examples 2 and 3 on that page, the analyzer now emits the following:

    uninit-CWE-457-examples.c: In function 'example_2_bad_code':
    uninit-CWE-457-examples.c:56:3: warning: use of uninitialized value 'bN' [CWE-457] [-Wanalyzer-use-of-uninitialized-value]
       56 |   repaint(aN, bN);
          |   ^~~~~~~~~~~~~~~
      'example_2_bad_code': events 1-4
        |
        |   34 |   int aN, bN;
        |      |           ^~
        |      |           |
        |      |           (1) region created on stack here
        |   35 |   switch (ctl) {
        |      |   ~~~~~~
        |      |   |
        |      |   (2) following 'default:' branch...
        |......
        |   51 |   default:
        |      |   ~~~~~~~
        |      |   |
        |      |   (3) ...to here
        |......
        |   56 |   repaint(aN, bN);
        |      |   ~~~~~~~~~~~~~~~
        |      |   |
        |      |   (4) use of uninitialized value 'bN' here
        |
    uninit-CWE-457-examples.c: In function 'example_3_bad_code':
    uninit-CWE-457-examples.c:95:3: warning: use of uninitialized value 'test_string' [CWE-457] [-Wanalyzer-use-of-uninitialized-value]
       95 |   printf("%s", test_string);
          |   ^~~~~~~~~~~~~~~~~~~~~~~~~
      'example_3_bad_code': events 1-4
        |
        |   90 |   char *test_string;
        |      |         ^~~~~~~~~~~
        |      |         |
        |      |         (1) region created on stack here
        |   91 |   if (i != err_val)
        |      |      ~
        |      |      |
        |      |      (2) following 'false' branch (when 'i == err_val')...
        |......
        |   95 |   printf("%s", test_string);
        |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
        |      |   |
        |      |   (3) ...to here
        |      |   (4) use of uninitialized value 'test_string' here
        |

    In both cases, the output shows the execution path through the code that leads to the attempted use of an uninitialized value.

    The analyzer also treats heap-allocated buffers as uninitialized, as in the following message:

    malloc-1.c:461:5: warning: use of uninitialized value '*p' [CWE-457] [-Wanalyzer-use-of-uninitialized-value]
      461 |   i = *p;
          |   ~~^~~~
      'test_40': events 1-2
        |
        |  460 |   int *p = (int*)malloc(sizeof(int));
        |      |                  ^~~~~~~~~~~~~~~~~~~
        |      |                  |
        |      |                  (1) region created on heap here
        |  461 |   i = *p;
        |      |   ~~~~~~
        |      |     |
        |      |     (2) use of uninitialized value '*p' here
        |

    In GCC 12, other developers have also added an -ftrivial-auto-var-init option that can mitigate against many uses of uninitialized variables. The analyzer still warns about the use of uninitialized memory if you use this option, because -ftrivial-auto-var-init is merely intended for the detection and mitigation of problems. It's good to have -ftrivial-auto-var-init as an extra layer of defense, but a prudent developer should still fix their code.

    A taint mode for C

    Some other languages, such as Perl, can track input and flag any variable that should not be trusted because it was read from an outside source such as a web form. Flagging variables in this manner is called tainting. After a program runs the variable through a check, the variable can be untainted, a process called sanitization.

    Our GCC analyzer's taint mode is activated by -fanalyzer-checker=taint (which should be specified in addition to -fanalyzer). Taint mode attempts to track attacker-controlled values entering the program and to warn if they are used without sanitization.

    The initial version of taint mode in GCC 10 was very much an experiment. The only source of tainting was buffers populated by fread calls, the only warning was -Wanalyzer-tainted-array-index, and the state tracking tended to explode, causing the analyzer to give up.

    For GCC 12, taint mode is closer to being useful, but is still experimental.

    To expand the sources of tainted data, I've added a new __attribute__ ((tainted_args)) attribute to the C and C++ front ends. The idea behind this change is to make it easy to annotate the attack surface of the program. The attribute can be used on functions, causing the analyzer's taint mode to treat all parameters and buffers pointed to by parameters of such functions as potentially under attacker control.

    The tainted_args attribute can also be used on fields in structs that are function pointer callbacks. When taint mode sees a function being used as an initializer for such a field, the static analyzer flags all arguments to that function as tainted. As an example, such flagging could help an operating system kernel mark the ioctl callback field of a struct device, so that all ioctl handlers can be treated as attacker-facing.

    I've added four new taint-based warnings:

    • -Wanalyzer-tainted-allocation-size to be used, for example, for attacker-controlled malloc and alloca
    • -Wanalyzer-tainted-divisor for detecting where an attacker can inject a divide-by-zero error
    • -Wanalyzer-tainted-offset for attacker-controlled pointer offsets
    • -Wanalyzer-tainted-size for attacker-controlled values being used as a size parameter to calls to memset or to functions marked with __attribute__ ((access, ....))

    I've improved the state-tracking problems mentioned earlier, but unfortunately the analyzer can still run into difficulty. So taint checking must still be turned on explicitly, and I still regard it as much more experimental than the rest of the analyzer.

    Scaling up the analyzer

    The GCC 10 release of -fanalyzer was very much just for early adopters. I made at least two major mistakes in how I tracked the program state along an execution path, requiring a big rewrite in GCC 11.

    I'm hearing anecdotally of more people trying out -fanalyzer on their C code and finding real bugs in their programs—but also false positives. Static analysis is not perfect, but I'm working on making it more useful.

    For GCC 12, I've been focusing on two things: getting -fanalyzer to run on the Linux kernel, and trying to drive down the number of false positives.

    The Linux kernel

    As a compiler developer, I like to think the single most important package in a Linux distribution is the compiler, but I suspect most people would pick the kernel. In addition to actually letting you use your computer, the kernel makes a great test case for the analyzer, seeing as it's big and complicated.

    Simply getting the kernel to build with -fanalyzer (and with Red Hat Enterprise Linux's kernel configuration and compilation flags) shook out various bugs in the analyzer, and working through the warnings revealed more. For example, I found that I had misunderstood how GCC represents switch statements internally, so I had to rewrite how the analyzer handles them.

    One issue I ran into is that the kernel makes heavy use of inline assembly. For example, I was seeing false positives from -Wanalyzer-null-dereference on code like the following (simplified from drivers/staging/wfx/sta.c):

    struct ieee80211_vif *test (struct wfx_vif *wvif) {
       if (wdev_to_wvif(wvif->wdev, 1))            /* (1) */
         return wdev_to_wvif(wvif->wdev, 1)->vif;  /* (2) */
       else
         return NULL;
    }

    The analyzer was considering the execution path where the evaluation of wdev_to_wvif(wvif->wdev, 1) at (1) returns non-NULL, but at (2) returns NULL, which then gets dereferenced.

    This programming flow is non-trivial to figure out, but the analyzer was almost managing it. The analyzer saw:

    /* Simplified from drivers/staging/wfx/wfx.h */
    
    static inline struct wfx_vif *wdev_to_wvif(struct wfx_dev *wdev, int vif_id) {
      vif_id = array_index_nospec(vif_id, ARRAY_SIZE(wdev->vif));
      if (!wdev->vif[vif_id]) {
        return NULL;
      }
      return (struct wfx_vif *)wdev->vif[vif_id]->drv_priv;
    }

    which uses the array_index_nospec macro:

    /* Simplified from include/linux/nospec.h */
    
    #define array_index_nospec(index, size)                                 \
    ({                                                                      \
            typeof(index) _i = (index);                                     \
            typeof(size) _s = (size);                                       \
            unsigned long _mask = array_index_mask_nospec(_i, _s);          \
            /* snip */                                                      \
            (typeof(_i)) (_i & _mask);                                      \
    })

    The array_index_mask_nospec macro uses inline assembly, and the analyzer was assuming an arbitrary result for the mask each time:

    /* Copied from arch/x86/include/asm/barrier.h */
    
    static inline unsigned long array_index_mask_nospec(unsigned long index,
                    unsigned long size)
    {
            unsigned long mask;
    
            asm volatile ("cmp %1,%2; sbb %0,%0;"
                            :"=r" (mask)
                            :"g"(size),"r" (index)
                            :"cc");
            return mask;
    }

    So for GCC 12, I've implemented enough support for inline assembly for the analyzer to handle cases like this. I don't attempt to actually parse and interpret the assembly code; I just needed to add enough heuristics to recognize that the result of the inline assembler isn't going to change between the two calls to array_index_mask_nospec. With this enhancement, the analyzer is able to figure out that the result from wdev_to_wvif is the same both times, fixing the false positive the analyzer had previously returned.

    Finally, I need to consider the speed (or lack thereof) of the analyzer. I noticed one particular kernel source file that the analyzer seemed to get stuck on; one I ended up killing the compilation after 15 hours. That perverse behavior involved a debug build of the compiler. With a release build, I got the run down to four minutes, and optimizations to the analyzer have reduced the run to 17 seconds, where the non-analyzer part of compiling that file takes 19 seconds. My goal for -fanalyzer is for it to "merely" double your compile time, so I declared victory at that point for that file. Unfortunately, the analyzer can still take ages to finish on other files.

    Fixing false positives

    For a static analyzer to be useful, it shouldn't spam the developer with so many messages about things that aren't real problems that they stop paying attention. The tool needs to be tested on real-world code, written in a variety of styles. Thanks to open source, there's no shortage of code to try the analyzer on.

    As part of our Red Hat Enterprise Linux production pipeline, Red Hat uses various static analysis tools on our source packages. We tried adding -fanalyzer to that bank of tests, marking it as "experimental." I've been working through the results of these runs for pre-releases of Red Hat Enterprise Linux 9, looking through the most frequent false positives and trying to fix them.

    For example, when building OpenBLAS, I saw thousands of warnings from -Wanalyzer-malloc-leak. In each case, it was flagging code of the following form:

       if( LAPACKE_lsame( vect, 'b' ) || LAPACKE_lsame( vect, 'p' ) ) {
                pt_t = (lapack_complex_float*)
                    LAPACKE_malloc( sizeof(lapack_complex_float) *
                                    ldpt_t * MAX(1,n) );
          ...snip...
       }
    
       /* [...snip lots of code...] */
    
       if( LAPACKE_lsame( vect, 'b' ) || LAPACKE_lsame( vect, 'p' ) ) {
                LAPACKE_free( pt_t );
       }

    The relevant line is where the code uses a dynamically allocated buffer guarded by a condition:

      LAPACKE_lsame( vect, 'b' ) || LAPACKE_lsame( vect, 'p' )

    LAPACKE_lsame is a case-insensitive comparison, so, in theory, there are only two possible paths through the code: One invoking a malloc/free pair, and the other skipping both malloc and free. However, the definition of LAPACKE_lsame is in its own source file. The static analyzer doesn't yet work well with LTO and therefore can't peer inside LAPACKE_lsame while evaluating its invocation. So the analyzer considers four execution paths through this code: the two valid ones, and two impossible ones. In particular, the false positive from -Wanalyzer-malloc-leak arises from the impossible execution path in which the guard condition first evaluates to true for the malloc branch, but then evaluates to false for the free branch.

    GCC supports marking function declarations with __attribute__((const)). For GCC 12, I've added smarts for this attribute to the analyzer, so that it will understand that such functions return the same value when given the same inputs (PR104434), and have no side effects (PR104576).

    Marking LAPACKE_lsame with __attribute__((const)) lets the analyzer figure out that the condition is the same both times, which fixes the false positives. OpenBLAS has now made this change upstream.

    Toward support for C++

    Unfortunately, I can't recommend running -fanalyzer on C++ code yet. We're slowly getting better: For Google Summer of Code (GSoC) 2021, I mentored a student named Ankur Saini who generalized the analyzer's interprocedural path exploration logic. Thanks to his work, in GCC 12 the analyzer is now able to track calls through function pointers and thus handle virtual function calls.

    That said, there are still enough C++ features (e.g., exception handling) missing from the analyzer's implementation that the results won't be useful on any real-world C++ project. So please still use -fanalyzer only for C code in GCC 12.

    Understanding the realloc function

    The analyzer now handles realloc calls by splitting the execution path into three parts, representing three possible outcomes for the call:

    • Failure, returning NULL
    • Success, growing the buffer in place without moving it
    • Success, by allocating a new buffer, copying the content of the old buffer to it, and freeing the old buffer

    This enhancement should help catch the PTR = realloc (PTR, sz) anti-pattern (which leaks memory on failure). The analyzer should also recognize when a buffer gets moved on reallocation but the code tries to use a pointer that's still pointing to the old location of the buffer.

    Trying it out

    There are lots of other improvements to the analyzer in GCC 12 that I haven't covered in this article. GCC 12 will be the system compiler in the soon-to-be-released Fedora 36.

    For simple code examples, you can play around with the new GCC online at the Compiler Explorer site. Select GCC "trunk" and add -fanalyzer to the compiler options to run static analysis.

    GCC 12 can also help prevent a particularly tricky kind of security exploit called a Trojan source attack. For more on how this works, check out my article on the subject from Red Hat Developer.

    Finally, if this whetted your appetite for looking at the insides of the toolchain in more detail, GCC is participating in GSoC 2022, so you might want to look at GCC's GSoC project ideas. I'm hoping to mentor a new GCC contributor this summer for the analyzer, but there are plenty of other interesting aspects of the compiler to work on.

    Have fun!

    Last updated: October 8, 2024

    Recent Posts

    • Container starting and termination order in a pod

    • More Essential AI tutorials for Node.js Developers

    • How to run a fraud detection AI model on RHEL CVMs

    • How we use software provenance at Red Hat

    • Alternatives to creating bootc images from scratch

    What’s up next?

    systemd Commands cheat sheet card image

    systemd is a system and service manager for Linux operating systems. This cheat sheet makes it easy to learn command-line executables for working with systemd to consolidate service configuration and application behavior.

    Get the systemd Commands cheat sheet
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue