Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Using Valgrind's --trace-flags option

June 15, 2021
Alexandra Petlanova Hajkova
Related topics:
C, C#, C++Linux
Related products:
Red Hat Enterprise Linux

Share:

    Valgrind is a great tool not only for finding errors related to memory management in a program, but also for memory consumption analysis, performance profiling, issues related to multithreading, and more. In this article, I introduce Valgrind's undocumented --trace-flags option and explain how we improved Valgrind's accuracy in one area related to the AArch64 processor from Arm.

    A case of rounding errors

    Valgrind works as an abstraction layer between the application and the operating system. It disassembles the application's code and adds instrumentation to it depending on which of the Valgrind tools are used. To execute and analyze memory or register manipulations, Valgrind parses instructions and translates them into an intermediate representation (IR) called VEX. For example, the front end of Valgrind translates an ADD instruction in the application's assembly-language code to the Iop_Add IR. The Valgrind tools (memcheck, helgrind, and so on) instrument the IR, after which the Valgrind back end re-assembles the code.

    Although Valgrind's translation process is very complex, it works well. But sometimes it does make mistakes. For example, the following simple C language code run under Valgrind showed imprecision in the rounding of some floating-point operations (documented in this bug report):

    int main()
    
    double x = 1004.3;
    double y = 2.0;
    double r = pow(x, y);
    
    printf("r = %.10f\n", r); return 0;

    Compiled properly on the AArch64, this code should print the value of r as 1008618.4899999999. However, when run under Valgrind, it printed 1008618.490000000. The reason was that Valgrind lacked correct support for the AArch64 fused multiply add (FMADD) instruction, which was used in the pow function, causing the rounding issue.

    Fused multiply add

    The addition of a product between two operands is common enough to earn a special instruction on many processors. FMADD stands for floating-point fused multiply-add. The basic operation is:

    D(destination) = A(accumulator) + N * M

    There are 32-bit (float) and 64-bit (double) variants of the instruction.

    The rounding issues come in because doing A + N * M in one go gives slightly different results from doing (N * M) first and then adding A.

    Different processors recognize the prevalence of this operation with a variety of instructions. The PowerPC ppc64 and IBM s390x, like the AArch64, have scalar FMADD. But most other architectures have only vector instructions that are similar to FMADD. For instance, the Intel x86 offers a VFMADD instruction, which is similar to the AArch64's FMADD, but is a vector (single instruction/multiple data, or SIMD) instruction. Vector registers are large registers that store several numbers, allowing simultaneous operations to be performed on them all at once. For example, Intel's AVX-512 processor uses 512-bit registers. Scalar registers are much smaller, usually 32 or 64 bits long, and contain one scalar value.

    Because Valgrind needed to support scalar fused multiply-add instructions for the ppc64 and s390x, it already defined an IR for it called Iop_MAddF32. This VEX IR operation represents a 32-bit float fused multiply-add instruction. But the arm64 front end and back end for Valgrind didn't implement it yet. A team I worked on created a patch that adds Arm64 VEX front- and back-end support for Iop_MAdd/SubF32/64. Before this patch was added, FMADD was implemented in VEX as two IRs, Iop_Add and Iop_Mul, to represent one actual instruction. That caused the rounding errors.

    The front-end part of the patch replaced the use of Iop_Add and Iop_Mul with one Iop_Madd IR, which allows Valgrind to avoid the rounding error. The back end then turns the IR into the actual instructions again.

    To test our patch to Valgrind, we wrote assembly language code to make sure that Valgrind generates the Iop_MAdd IR instruction when it is supposed to. If an architecture supports scalar FMA instructions, the compiler will hopefully turn something like x = a + (b *c) into an efficient FMADD instruction instead of a multiplication and then an addition instruction. But it is easier to use inline assembly directly:

    asm("fmadd %s0, %s1, %s2, %s3\n;" : "=w"(dst) : "w"(x), "w"(y), "w"(z));

    Here, s is the name of the 32-bit SIMD/FP register, used in this case as a floating-point (FP) register.

    Here is a minimal test that could be compiled with the command gcc -g -o tst test.c:

    int
    main(int argc, char **argv)
    {
    float x = 55;
    float y = 0.69314718055994529;
    float z = 38.123094930796988;
    float dst;
    //32bit variant
    asm("fmadd %s0, %s1, %s2, %s3\n;" : "=w"(dst) : "w"(x), "w"(y), "w"(z));
    printf("%f = %f + %f * %f\n", dst, z, x, y);
    
    return 0;
    }

    The --trace-flags option

    For figuring out precisely what Valgrind does, its --trace-flags option is very useful. This option helps you spot problematic places in the Valgrind code, and is also useful for expert users who want to know what exactly Valgrind is handling in an application.

    The --trace-flags option is not documented in the Valgrind manual page, nor is it displayed with valgrind --help. However, you can see the options it offers in each of its flags by running valgrind --help-debug. Table 1 shows the flags and their effects.

    Table 1: Flags in Valgrind's --trace-flags option.
    Flag Effect
    10000000 Show conversion into IR
    01000000 Show after initial opt
    00100000 Show after instrumentation
    00010000 Show after second opt
    0000 1000 Show after tree building
    00000100 Show selecting insns
    00000010 Show after reg-alloc
    00000001 Show final assembly
    00000000 (all bits cleared) Show summary profile only

    Note: To get full details from --trace-flags, you also need to specify --trace-notbelow or --trace-notabove.

    With these values, you can see all the transformations performed by Valgrind and related instrumentation tools. But here we are interested only in the first "disassembly" and the final "assembly" steps. We will explore these next.

    How --trace-flags works

    Here, I'll describe how to use the --trace-flags option step by step, using as an example a trace of the FMADD activity that concerns us.

    Valgrind's first step, which is the conversion into IR, is tool-independent. But for the next step, showing the final assembly, it helps to not have any tool do instrumentation so that the final assembly is clearer. In our case, we do not care too much about the instrumentation and the optimizations it makes. Therefore, I add the --tool=none option, so that no tool (memcheck, by default) adds its own instructions. The resulting command is:

    $ ./vg-in-place -q --tool=none --trace-flags=10000000 --trace-notbelow=999999 ./tst 2>&1 | less

    The command produces many blocks of code that do not interest us. The block relevant for us is the main function in the ./tst module. To find the relevant block, we re-run the previous command, replacing the arbitrary value in --trace-notbelow=999999 with the SB (superblock) number displayed when main was called from the previous run:

    SB 1237 (evchecks 6200) [tid 1] 0x400634 main /root/valgrind/tst+0x400634

    The SB number for main is 1237. We use this number to skip all superblocks before main. Therefore, our new command is:

    $ ./vg-in-place -q --tool=none --trace-flags=10000000 --trace-notbelow=1237 ./tst 2>&1 | less

    We want to look for the fmadd in the output from the block that's relevant for us. The fmadd related block for before the patch situation used to be:

    (arm64) 0x400670: fmadd s0, s0, s1, s2
    
    ------ IMark(0x400670, 4, 0) ------
    t18 = Shr32(GET:I32(888),0x16:I8)
    t19 = Or32(And32(Shl32(t18,0x1:I8),0x2:I32),And32(Shr32(t18,0x1:I8),0x1:I32))
    t17 = AddF32(t19,GET:F32(352),MulF32(t19,GET:F32(320),GET:F32(336)))
    PUT(320) = V128{0x0000}
    PUT(320) = t17
    PUT(272) = 0x400674:I64

    With the FMADD support, the output changed to:

    (arm64) 0x400670: fmadd s0, s0, s1, s2
    ------ IMark(0x400670, 4, 0) ------
    t18 = Shr32(GET:I32(888),0x16:I8)
    t19 = Or32(And32(Shl32(t18,0x1:I8),0x2:I32),And32(Shr32(t18,0x1:I8),0x1:I32))
    t17 = MAddF32(t19,GET:F32(320),GET:F32(336),GET:F32(352))
    PUT(320) = V128{0x0000}
    PUT(320) = t17
    PUT(272) = 0x400674:I64
    

    Compare the actual addition instruction from before and after the application of the patch. Before, the addition was:

    t17 = AddF32(t19,GET:F32(352),MulF32(t19,GET:F32(320),GET:F32(336)))

    After applying the patch, the corresponding code is:

    t17 = MAddF32(t19,GET:F32(320),GET:F32(336),GET:F32(352))

    The trace shows us that MAddF32 is used instead of AddF32 and MulF32, as desired.

    Assembly code with the --trace-flags option

    As described earlier, tracing and profile control could be useful for viewing the final assembly language code. For this task, we'll use the four flags in the value 10000111:

    $ ./vg-in-place --tool=none --trace-flags=10000111 --trace-notbelow=1237 -q ./tst 2>&1 | less

    Looking through the output for MAddF32, we can see the following assembly code:

    -- t79 =
    MAddF32(Or32(And32(Shl32(t72,0x1:I8),0x2:I32),And32(Shr32(t72,0x1:I8),0x1:I32)),GET:F32(320),GET:F32(336),GET:F32(352))
    ldr %vD128(S-reg), 320(x21)
    ldr %vD129(S-reg), 336(x21)
    ldr %vD130(S-reg), 352(x21)
    
    ...
    
    msr fpcr, %vR139
    ffmadd %vD131(S-reg), %vD128(S-reg), %vD129(S-reg), %vD130(S-reg)
    mov(d) %vD79, %vD131

    R refers to a general-purpose register and D refers to an SIMD/FP register. The next-to-last line of this snippet shows that registers D128 through 131 were loaded and used for the fmadd. We can look at the MAddF32 instruction we saw earlier in the VEX IR:

    t17 = MAddF32(t19,GET:F32(320),GET:F32(336),GET:F32(352))
    

    and compare it to the resulting assembly code:

    ldr %vD128(S-reg), 320(x21)
    ldr %vD129(S-reg), 336(x21)
    ldr %vD130(S-reg), 352(x21)
    ffmadd %vD131(S-reg), %vD128(S-reg), %vD129(S-reg), %vD130(S-reg)
    

    The comparison tells us, for instance, that the first argument, GET:F32(320), that was loaded to the D128 SIMD register became the second operand in the fmadd. This was very helpful during our debugging because it revealed when the operands or their order was wrong. The example here demonstrates how informative and fine-grained the --trace-flags option is. We can look at the actual instruction emitted without having to care about register allocation, or one can look afterward to see the actual assembly code generated.

    Conclusion

    I hope this article has helped you to understand better how Valgrind works, how developers are improving it, and how you can use --trace-flags to discover precisely what your program and Valgrind do at a low level.

    Last updated: August 15, 2022

    Related Posts

    • Valgrind Memcheck: Different ways to lose your memory

    • Memory error checking in C and C++: Comparing Sanitizers and Valgrind

    Recent Posts

    • Unleashing multimodal magic with RamaLama

    • Integrate Red Hat AI Inference Server & LangChain in agentic workflows

    • Streamline multi-cloud operations with Ansible and ServiceNow

    • Automate dynamic application security testing with RapiDAST

    • Assessing AI for OpenShift operations: Advanced configurations

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue