Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Preparing Valgrind Memcheck for x86-64-v3

October 24, 2024
Mark Wielaard
Related topics:
C, C#, C++Linux
Related products:
Red Hat Enterprise Linux

Share:

    Various distributions have been experimenting with generating x86-64-v3 instructions for all compiled code. Although Valgrind already supported emulation of those instructions, there was still work to do to make the Memcheck tool produce correct diagnostics for memory issues.

    What is Valgrind?

    Valgrind is an instrumentation framework for building dynamic analysis tools that check C and C++ programs for errors. Memcheck is the default tool Valgrind uses when you don't ask for another tool (using --tool=). Memcheck keeps track of the validity and addressability of all memory a program uses. This means Memcheck can warn for usage of unaddressable memory or when program execution depends on values which were never defined. It also reports on memory leaks and allocated memory which is never freed.

    How does Valgrind work?

    Valgrind works by using dynamic binary instrumentation. Valgrind translates all instructions in a program into a intermediate representation, called VEX. A tool like Memcheck then instruments this intermediate representation, to track all memory operations. This transformed intermediate representation is then translated back into native instructions, which is what gets executed. Valgrind can be seen as a virtual machine with a just in time compiler which uses that native instruction stream as byte code.

    What is x86-64-v3?

    x86-64-v3 is an set of common x86-64 CPU features (instruction sets). It supports AVX and AVX2 instructions which add many new 256 bits vector operations. The fused multiply-add (FMA) instructions also work on 256 bit vectors, which combine multiplication and addition into a single operation that computes the intermediate result with infinite precision (we will see below how Valgrind got this "wrong"). And the BMI1 and BMI2 instruction sets that providing various bit manipulation instructions.

    For the next Red Hat Enterprise Linux (RHEL) 10 release (currently still in development) the GCC compiler will default to -march=x86-64-v3 which means programs will use all these instructions by default. See this article for more information. The current RHEL 9 uses x86-64-v2 as default.

    Although Valgrind already had support for all these x86-64-v3 instructions, we found several issues once the whole distribution was build with -march=x86-64-v3 as default. These have been fixed in the Valgrind 3.23.0 release.

    More accurate instruction emulation

    The fused multiply–add (FMA) instruction does the calculation of r = (a * b) + c in one instruction. This has two advantages. You get two operations for the price of one, instead of having to do a multiplication and an addition in separate instructions. And it enhances the accuracy of the whole operation by using only a single rounding of the end result, instead of having to round the result of the multiplication step and then the addition step separately.

    When FMA instructions were originally introduced, AMD and Intel both introduced a slightly different variant. FMA4 allowed the result register to be separate from the three input registers and while in FMA3, the result register had to be one of the input registers. See this page for more information.

    In the end, the FMA3 variant became the one supported by both AMD and Intel and is part of x86-64-v3. Valgrind did support both FMA3 and FMA4, but didn't keep track which variant the CPU it ran on supported. When translating either instruction it used a common generic implementation that did a multiplication and then an addition. This caused subtle rounding issues with floating point arithmetic which FMA was supposed to prevent.

    To make these FMA calculations precise, Valgrind now keeps track of the FMA3 or FMA4 flag in the CPUID. On FMA-capable hosts it now emits an VFMADD instruction. This makes the floating point operations as accurate as they are when the program is not run under Valgrind.

    Reverse engineering GCC optimizations on large vectors

    Sometimes the compiler is really clever and Valgrind Memcheck has to work extra hard to make sure the generated code is correct. This is especially true when GCC uses large vector operations as introduced by AVX and AVX2 to optimize string operations. Using vector registers and operations is attractive since it allows comparing 16 (for 128 bit vectors) or 32 (for 256 bit vectors) characters at once.

    One such tricky optimization is when GCC sees the strcmp function with one argument being a static constant string which is as large (or larger) than the vector size. GCC will generate code to compare the (start of) the strings by loading the strings into two vector registers and XORing them, using the VPXOR instruction, and then using the VPTEST instruction on the resulting vector, which will do a bitwise AND and sets the ZF flag if the result is all zeros (which indicates the strings were equal). The generated code looks like this:

      VMOVDQU  (%str1), %ymm1        # load str1 from memory into register vec1
      VMOVDQU  (%str2), %ymm2        # load str2 from memory into register vec2
      VPXOR    %ymm1,   %ymm2, %ymm3 # vec3 = vector1 xor vector2
      VPTEST   %ymm3,   %ymm3        # set ZF if binary and of vec3 is all zeros
      JE       equal1f               # jump to equal1f is ZF is set

    Assuming str1 is at least as big as the vector register wide, this a very efficient way of comparing with str2. And even if str2 is shorter than str1 it is a quick way to check the strings aren't equal. As long as the compiler can prove that the memory after the end of str2 can be loaded directly into the vector register (for example, when the string is placed on the stack). This works because a shorter string will have a \0 (zero) character at the end. This means at least that character will produce a non-zero XOR result and so the VPTEST will see at least one bit set in the result vector register. If there is at least one non-zero character the the ZF flag not to be set (whatever the chars after this zero character are).

    Although the above check is logically correct, it does create some challenges for Valgrind Memcheck. In the case one of the strings is shorter than the other, we first hit the issue that the bytes right after the end of string zero character might not technically be addressable (Memcheck tracks this very precisely). So normally, it wants to produce a warning that (partially) unaddressable memory is loaded into a register.

    But the above optimization depends on being able to read a little more bytes than needed. So there is the option --partial-loads-ok=yes (which is now the default). This option makes it so that such loads do not produce an address error. Instead, loaded bytes originating from illegal addresses are marked as uninitialized, and those corresponding to legal addresses are handled in the normal way. So now for such loads, Memcheck will mark the bytes in the vector register after the string zero terminator as undefined.

    This brings us to the second tricky issue to get right. We are now operating on vector registers with partial defined values. Valgrind Memcheck needs to do exact instrumentation to make sure the result is properly tracked as (un)defined. This is fairly simple for the result of the XOR of the two vector registers. The result vector is defined up to the first undefined byte in one of the input registers.

    When setting the ZF flag for the VPTEST instruction, you need to check whether any bit is set in the defined part of the result vector register; that is enough to make the flag value defined (and not set). This is because the result depends on all bits being zero, once you see any bit in the defined set being one, it doesn't really matter what the other (undefined) bits are. We also know this only matters for the strings being unequal because one of the strings is shorter. In that case there is at least the end of string zero terminating byte that is defined (and unequal to the byte in the longer string).

    Intercepting glibc dynamic linker/loader string optimizations

    Valgrind Memcheck intercepts various glibc memory and string functions (e.g., strcpy, strcmp, strlen, memcpy or memmove). It does this partially because it is hard to proof some of these functions, which are optimized hand written assembly, correct. And partially because Memcheck would like to check some pre-conditions on the functions, like whether memory arguments overlap.

    To do this, Valgrind sets the LD_PRELOAD environment variable when launching the program to load alternative, simple, instrumented versions of these string and memory functions. This loads the code before any library is loaded, which can then be intercepted when the program or a library uses any of those functions.

    This works for any such optimized string or memory function, whether using x86-64-v3 instructions or earlier vector instructions in glibc. Except for functions that the dynamic loader (ld.so) uses itself when loading the LD_PRELOAD libraries. ld.so contains its own implementation of these functions (since it is responsible for loading glibc, it cannot use the glibc functions directly itself).

    The ld.so versions used to not be built with x86-64-v3 optimized instructions though, so Valgrind Memcheck could just interpret the simpler version of these functions directly.

    Since we cannot use the LD_PRELOAD trick to load the alternative code into the process ,we needed to add an hardwire for this specific ld.so function. The hardwire is a simple implementation of that function that will need to be called instead of the original. The disadvantage of having to use an hardwire is that it is architecture-specific and that Valgrind has to look up the symbol addresses itself. Since these symbols are normally private to ld.so, that means Valgrind needs the full symbol table available. So ld.so cannot be stripped (to remove unnecessary/debug symbols). Luckily ld.so is fairly small, so not stripping the debug symbols doesn't make it much bigger than necessary.

    Using it all together

    When using a distribution that defaults to building all code for x86-64-v3, like the upcoming RHEL 10 beta, or when using -march=x86-64-v3 to build your own code, you want to be using the Valgrind 3.23.0 release current Fedora 40 has. Valgrind 3.23.0 will accurately execute the new vector code, even with GCC optimizations taking advantage of the new AVX and AVX2 instructions and it will intercept tricky memory and string operations so Memcheck can track undefined values in your code.

    Last updated: October 25, 2024

    Related Posts

    • Valgrind Memcheck: Different ways to lose your memory

    • Use Valgrind Memcheck with a custom memory manager

    • Memory error checking in C and C++: Comparing Sanitizers and Valgrind

    • How lazy debuginfo loading improves GDB and Valgrind

    • How to use Valgrind to track file descriptors

    Recent Posts

    • Ollama or vLLM? How to choose the right LLM serving tool for your use case

    • How to build a Model-as-a-Service platform

    • How Quarkus works with OpenTelemetry on OpenShift

    • Our top 10 articles of 2025 (so far)

    • The benefits of auto-merging GitHub and GitLab repositories

    What’s up next?

    Learn how to set up and use the Developer Sandbox for Red Hat OpenShift. With the Developer Sandbox, you experience hands-on learning resources without setup or configuration, and learn to develop quicker than ever before.

    Start the activity
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue