Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Understanding GCC warnings, Part 2

March 13, 2019
Martin Sebor
Related topics:
Developer ToolsC, C#, C++
Related products:
Developer Tools

Share:

    In part 1, I shed light on trade-offs involved in the GCC implementation choices for various types of front-end warnings, such as preprocessor warnings, lexical warnings, type-safety warnings, and other warnings.

    As useful as front-end warnings are, those based on the flow of control or data through the program have rather inconvenient limitations. To overcome them, flow-based warnings have increasingly been implemented in what GCC calls the "middle end." Middle-end warnings are the focus of this article.

    Middle-end warnings

    For our purposes, the middle end starts with the Gimplifier (the part of GCC that converts GENERIC to GIMPLE) and includes all subsequent stages up to the expansion of GIMPLE to RTL (the low-level Register Transfer Language). Quite a number of GCC warnings fit this description. A representative sampling includes

    • -Warray-bounds, which detects out-of-bounds array indices
    • -Wformat-overflow (new in GCC 7), which looks for buffer overflows in calls to sprintf and related functions
    • -Wnonnull, which detects passing null pointers to functions that don't expect it
    • -Wstringop-overflow (new in GCC 7), which helps find buffer overflow in calls to string and raw memory functions
    • -Wuninitialized, which finds uses of uninitialized variables

    As GCC tries to detect more and more difficult bugs, the number of middle-end warnings has been growing.

    The middle end in GCC is designed as a series of over a hundred more-or-less independent modules called passes. Each pass traverses the intermediate representation of the source code (initially GENERIC, and then GIMPLE) in a predetermined order and transforms sequences of statements into others, performing various simplifications along the way. In the process, some passes expose interesting properties of the data the program operates on, such as the size of objects referenced by pointers or ranges of values that integer variables can take on in different branches of conditional statements, upper bounds on the number of loop iterations, and so on.

    Although the purpose of a handful of passes is specifically to look for bugs, the primary purpose of almost all passes is to make code efficient. Because these optimization passes are selected only when their optimization is turned on, the number and quality of the warnings will be lower if optimization is disabled with -O0. If somewhere along the way, a pass discovers a construct that is necessarily invalid, it may issue a warning, but with a few exceptions, doing so is secondary. The consequence is that implementing quality warnings in the middle end is challenging.

    Limitations of middle-end warnings

    The unavoidable challenge of all warnings implemented in the middle end is inherent in all flow-based warnings: because of the limits of computability, they are prone to both false negatives and false positives. But beyond that, GCC middle-end warnings are also imperfect because they depend on an infrastructure and representation that were designed primarily for optimization and with different priorities than static analyzers. Their number-one objective is to emit efficient object code for programs that follow the rules of the language. The basic assumption GCC optimizers have historically made, one that is enshrined in the standards for C and C++ and other languages, is that code they work with must be valid. Otherwise, all bets are off and their behavior is undefined. A design based on this view makes it difficult to detect even straightforward bugs and issue diagnostics for them. The same design also leads to inadvertently issuing warnings for correct constructs. Let's look at examples.

    False negatives due to assumptions about program validity

    As a simple example, consider the following trivially invalid program:

    const int primes[10] = { 1, 3, 5, 7, 11, 13, 17, 19 };
    int f (void)
    {
      return primes[17];   // index out-of-bounds 
    }

    One would expect that compiling it would trigger a -Warray-bounds warning for the read of the primes array, but GCC issues no such warning. That seems like a simple bug (it can be found in GCC Bugzilla: 78678 and 86691) but, as it turns out, it is actually a "feature" of the internal design based on the assumption that the input to the middle end is a valid program. There is even a comment in the function that computes the result of the array dereference that makes this explicit (GCC uses the term constructor to denote an aggregate initializer):

      /* Memory not explicitly mentioned in constructor is 0 (or
         the reference is out of range). */
      return type ? build_zero_cst (type) : NULL_TREE;
    }

    The code considers the value of any element of the primes array to be zero that has not been explicitly initialized, regardless of whether it is in or out of bounds. As a result, invalid indices into static constants are not diagnosed.

    The problem with early folding

    To emit code that is as efficient as possible, GCC aggressively performs transformations that simplify the internal representation of a program and make it easier to apply additional simplifications to. A simple example is an assignment such as

    n = strlen ("hello");

    where GCC can readily compute the result of the call and fold it into a constant, transforming the assignment to n = 5. A somewhat more involved example is one that involves an offset into a constant string:

    extern int i;
    n = strlen ("hello" + i);

    Here, the result of the call cannot be folded into a constant unless i's value is known; however, because the length of the string is known, and because the only values i can take on for the resulting pointer to be valid are between zero and 5, the result also must be between zero and 5. Therefore, the assignment can be folded into n = 5 - i. And that is, in fact, close to what happens: GCC replaces the call with the expression

    n = (size_t)i < 5 ? 5 - (size_t)i : 0;

    The conditional isn't strictly necessary because a valid program must guarantee that i is not out of the bounds of the string, but it ensures that the result isn't entirely meaningless (and excessively large) even for invalid values. Replacing the strlen call with the simple conditional makes the computation much faster and the code much more compact, so it is a useful optimization, but it comes at a price. Because GCC performs this transformation very early on, if i's value is determined by some subsequent optimization to be outside the valid range of zero to 5, it's too late to detect that the pointer addition in the strlen argument is invalid. The call along with its argument have been eliminated and replaced by the integer subtraction. The upshot is that the invalid code is not diagnosed. This can be seen in the very first GIMPLE dump of the example below. Notice that the strlen call is not present in the GIMPLE:

    size_t f (int i)
    {
      i = 17;
      return strlen ("hello" + i);
    }

    The dump, obtained by compiling the function with the -fdump-tree-gimple=/dev/stdout option, looks like this:

    f (int i)
    {
      long unsigned int D.1909;
      long unsigned int iftmp.0;
    
      i = 17;
      if (i <= 5) goto <D.1911>; else goto <D.1912>;
      <D.1911>:
      _2 = (sizetype) i;
      iftmp.0 = 5 - _2;
      goto <D.1913>;
      <D.1912>:
      iftmp.0 = 0;
      <D.1913>:
      D.1909 = iftmp.0;
      return D.1909;
    }

    There are many other transformations besides computing constant string lengths that GCC performs early on, including replacing calls to library functions such as strcpy with others, often memcpy, or even lower-level primitives like aggregate assignments (in GIMPLE, all aggregates, including arrays, can be assigned by using various forms of the MEM_REF expression). Most (but not all) of them make the transformed code easier to transform further or open up other interesting optimization opportunities. Unfortunately, most also result in the loss of detailed information that, although rarely important for optimization, may be essential to bug detection in later passes.

    Missing optimizations imply missing warnings

    All middle-end warnings depend to one extent or another on optimizations exposing information about the program or data that is not immediately apparent from the source code. GCC is a very good optimizing compiler but it has unavoidable limits, and so in cases when an optimization that a certain warning depends on has not been implemented, the warning is simply not going to be issued.

    Let's use the -Wnonnull warning as an example to illustrate this. Because the argument to the strchr function below is a global constant, GCC is able to evaluate it very early on and substitute its result, or null, for the second argument to strcpy. When the -Wnonnull checker sees the strcpy call, the null has been substituted for the second argument and the warning triggers:

    const char a[] = "123";
    
    void f (char *d)
    {
      char *p = strchr (a, '9');
      strcpy (d, p);
    }
    
    warning: argument 2 null where non-null expected [-Wnonnull]
    6 | strcpy (d, p);
      | ^~~~~~~~~~~~~

    The result of the optimization can be seen by compiling the function with the -fdump-tree-optimized=/dev/stdout option:

    ;; Function f (f, funcdef_no=0, decl_uid=1907, cgraph_uid=1, symbol_order=1)
    
    f (char * d)
    {
      <bb 2> [local count: 1073741824]:
      strcpy (d_2(D), 0B); [tail call]
      return;
    }

    But with the constant array defined as a local variable, GCC is no longer able to evaluate the strchr call at compile time and fold its result into a constant. This is because local variables are initialized dynamically even if they are declared const. As a result, the -Wnonnull warning, which has no additional smarts of its own, does not trigger. This is simply because the strchr optimization does not handle dynamically initialized variables.

    void g (char *d)
    {
      const char b[] = "234";
      char *p = strchr (b, '9');
      strcpy (d, p);
    }

    And indeed, the output of the optimizer confirms that the strchr call has not been folded in this case:

    ;; Function g (g, funcdef_no=1, decl_uid=1911, cgraph_uid=2, symbol_order=2)
    
    g (char * d)
    {
      char * p;
      const char b[4];
    
      <bb 2> [local count: 1073741824]:
      b = "234";
      p_3 = strchr (&b, 57);   // 57 == '9'
      strcpy (d_4(D), p_3);
      a ={v} {CLOBBER};
      return;
    }

    False positives in middle-end warnings

    As various optimizations pass over the internal representation of a program, they perform various transformations with the goal of either simplifying them or exposing simplification opportunities for later passes. From the perspective of false positive warnings, while most of these transformations are benign, sometimes they can introduce problems. The two biggest sources of problems are what's commonly referred to as early folding and an optimization known as jump threading.

    Jump threading

    The jump threading pass attempts to minimize conditional branches in code by merging existing conditionals or moving or even duplicating statements into the conditional blocks. To the untrained eye, the pass appears to introduce execution paths through the program where none exist in the source code. The duplication of statements in conditional blocks can then cause trouble when constants or ranges are subsequently propagated into those statements. A full explanation of the jump threading pass is beyond the scope of this article. For more background, read "A gentle introduction to jump threading optimizations" by Aldy Hernandez.

    As an example of the effects of jump threading, take the simple function below (reduced from GCC bug 88771). Compiling it with GCC 8 in ILP32 mode results in the two suspicious warnings below. The natural reaction of the author of the code is to exclaim, "Where did the huge numbers come from? I didn't write them!"

    int f (char *d, const char *s, size_t i)
    {
      size_t n = i + 1 ? i + 1 : i;
    
      strncpy (d, s, n);
    
      if (i + 1)
        return -1;
      return 0;
    }
    
    warning: ‘strncpy’ pointer overflow between offset 0 and size [4294967295, 2147483647] [-Warray-bounds]
       strncpy (d, s, n);
       ^~~~~~~~~~~~~~~~~
    warning: ‘strncpy’ specified size 4294967295 exceeds maximum object size 2147483647 [-Wstringop-overflow=]

    What triggers the warnings can be understood from the dump of the Value Range Propagation pass (VRP) obtained by compiling the function with the -fdump-tree-vrp option. The test for n + 1 being non-zero has resulted in the pass, in conjunction with the jump-threading optimization, introducing a whole new basic block (<bb 5>) into the function with an obviously invalid call to strncpy. The -Warray-bounds and -Wstringop-overflow warnings that run during a subsequent pass over the transformed code then notice the impossibly excessive size in the second strncpy call and, on the assumption that it comes from the source code, issue the (somewhat confusing) diagnostics.

    f (char * d, const char * s, size_t i)
    {
      int _2;
      size_t iftmp.0_4;
    
      <bb 2> [local count: 1073741825]:
      if (i_3(D) != 4294967295)    ;; 4294967295 == (size_t)-1
        goto <bb 3>; [66.00%]
      else
        goto <bb 5>; [34.00%]
    
      <bb 3> [local count: 708669604]:
      iftmp.0_4 = i_3(D) + 1;
      strncpy (d_6(D), s_7(D), iftmp.0_4);
    
      <bb 4> [local count: 1073741825]:
      # _2 = PHI <0(5), 1(3)>
      return _2;
    
      <bb 5> [local count: 365072224]:
      strncpy (d_6(D), s_7(D), 4294967295);
      goto <bb 4>; [100.00%]
    }

    Jump threading leads to other hard-to-deal-with warnings as well, such as -Wunintialized and -Wmaybe-uninitialized. A popular construct that seems to be particularly prone to its effects is the C++ class template std::optional. See GCC bug 80635 for a test case and a detailed discussion of the challenges it presents.

    Several solutions to this problem have been discussed and remediations have been put in place for some warnings, but none is optimal. One commonly advocated approach is discussed in the next section.

    Interaction with sanitizers

    Another source of unintended instances of middle-end warnings can be the interaction of optimizers with the sanitizers (such as the Address Sanitizer or the Undefined Behavior Sanitizer). The instrumentation inserted by the sanitizers sometimes interferes with the optimizers' ability to track control or data flow or to limit the optimizers' ability to determine properties like pointer nullness. Because warnings and sanitizers both serve the same purpose, ideally they would both be usable at the same time with the same efficacy. Unfortunately, despite efforts to minimize these undesirable interactions, it seems unlikely that they can be avoided altogether.

    Solutions to the challenges for middle-end warnings

    A number of solutions are being discussed by GCC developers to improve the signal-to-noise ratio in middle-end warnings. Implementing additional optimizations is a good way to make warnings more effective. In many cases, that's a win-win proposition because it also leads to more efficient object code; however, in some cases, especially when the cost of implementing the optimization is high and the coding pattern insufficiently common, the return on investment is less clear.

    Probably the hardest problem is avoiding the false negatives due to early folding because preserving the original detail after statements have been folded requires architectural changes to the representation of programs in the compiler.

    Avoiding false positives caused by jump threading is a similar problem, due to the fact that they result in sequences of statements being inserted by GCC that don't exist in the original source code. It may be possible to throttle jump threading if it results in invalid statements, but only if the validity can be determined at the time of making the initial threading decision. If the statements turn out to be invalid only after additional transformations, it may be too late to reverse the decision. In those cases, another strategy may need to be employed to avoid the warnings. One approach being considered is replacing such statements with calls to __builtin_trap() or __builtin_unreachable(), perhaps in conjunction with issuing a diagnostic very late in the process if the introduced path doesn't end up subsequently eliminated. The choice of the appropriate strategy will probably need to be controlled by a command line option as discussed in the video "Future directions for treatment of undefined behaviour in C - GNU Tools Cauldron 2018."

    More articles for C/C++ developers

    • How to install GCC 8 and Clang/LLVM 6 on Red Hat Enterprise Linux 7
    • Recommended compiler and linker flags for GCC—Improve warnings and code generation with the right flags.
    • Usability improvements in GCC 8
    • Getting started with Clang/LLVM
    • Detecting String Truncation with GCC 8
    • Implicit fall through detection with GCC 7—Detect missing break statements inside of a switch block. The warning is enabled with -Wimplicit-fallthrough. It is also one of the warnings that will be enabled if you use -Wextra.
    • Memory error detection using GCC 7
    • Diagnosing Function Pointer Security Flaws with a GCC plugin
    • Toward a Better Use of C11 Atomics – Part 1

    Recent Posts

    • How to run a fraud detection AI model on RHEL CVMs

    • How we use software provenance at Red Hat

    • Alternatives to creating bootc images from scratch

    • How to update OpenStack Services on OpenShift

    • How to integrate vLLM inference into your macOS and iOS apps

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue