Valgrind Memcheck: Different ways to lose your memory

Valgrind is an instrumentation framework for building dynamic analysis tools that check C and C++ programs for errors. Memcheck is the default tool Valgrind uses when you don't ask it for another tool. Other useful tools you can select (using valgrind tool=toolname) are:

  • cachegrind and callgrind, to do cache and call-graph function profiling
  • helgrind and drd, to do thread error and data-race detection
  • massif and dhat, to do dynamic heap usage analysis

Each of these tools deserves an article of its own, but here we will concentrate on Memcheck.

Detecting memory leaks with Valgrind Memcheck

Memcheck tracks all memory reads, writes, allocations, and deallocations in a C or C++ program. The tool can detect many different memory errors. For instance, it detects reads or writes before or after allocated memory blocks. It warns about the use of (partially) undefined values in conditional code or passing such values to system calls. It will also notify you about bad or double deallocation of memory blocks. But for now, we will discuss memory leak detection with Memcheck.

Generating a leak summary

When you run Valgrind on your program without any additional arguments, it produces a summary of the different kinds of leaks it has detected. For example, valgrind ./myprog might produce the following summary:

    LEAK SUMMARY:
      definitely lost: 48 bytes in 1 blocks
      indirectly lost: 24 bytes in 3 blocks
        possibly lost: 0 bytes in 0 blocks
      still reachable: 14 bytes in 1 blocks
           suppressed: 0 bytes in 0 blocks

Memcheck reports leaks in five categories: definitely lost, indirectly lost, possibly lost, still reachable, and suppressed. The first four categories indicate different kinds of memory blocks that weren't freed before the program ended. If you aren't interested in specific blocks, you can tell Valgrind not to report them (you'll see how shortly). The summary also shows you the number of bytes lost and how many blocks they are in, which tells you whether you are losing lots of small allocations, or a few large ones.

The following sections explain each category.

Definitely lost

The first category, definitely lost, is generally the most urgent kind of leak to track down, because there is no way to use or recover that memory. Let's look at an example of a small program that simply calls output_report a couple of times. That function prints a small banner and a number each time. As we will see, the memory we reserve for the report banner will be definitely lost (multiple times) when the program finishes:

 #include <stdlib.h>
 #include <stdio.h>
 #include <string.h>

 char *
 create_banner ()
 {
   const char *user = getenv ("USER");
   size_t len = 1 + 2 * 4 + strlen (user) + 1;
   char *b = malloc (len);
   sprintf (b, "\t|** %s **|", user);
   return b;
 }

 void
 output_report (int nr)
 {
   char *banner = create_banner ();
   puts (banner);
   printf ("Number: %d\n", nr);
   printf ("\n");
 }

 int
 main ()
 {
   for (int i = 1; i <= 3; i++)
     output_report (i);

   return 0;
 }

Compile this code with gcc -Wall -g -o definitely definitely.c and run it under Valgrind, asking for details with valgrind --leak-check=full ./definitely. Now, before the leak summary, Valgrind will show backtraces where the program allocated memory that was ultimately lost:

 42 bytes in 3 blocks are definitely lost in loss record 1 of 1
    at 0x4C29F33: malloc (vg_replace_malloc.c:309)
    by 0x4011C7: create_banner (definitely.c:10)
    by 0x401200: output_report (definitely.c:18)
    by 0x40124C: main (definitely.c:28)

Note that Memcheck found three leaks, which it reports as one loss record because they have identical backtraces. By default, it requires the whole backtrace to be the same in order to consider leaks similar enough to report together. If you want Memcheck to combine more leaks, you can use --leak-resolution=low or --leak-resolution=med to group leaks that have only two or four backtrace entries in common. This is useful if Memcheck reports lots of leaks with slightly different backtraces that you suspect are probably the same issue. You can then concentrate on the record with the highest number of bytes (or blocks) lost.

Still reachable

In the previous example, it is clear we should free banner after use. We could do that at the end of the output_report function by adding free (banner). Then, when running under Valgrind again, it will happily say All heap blocks were freed -- no leaks are possible.

But we are clever and see that the code reuses the same banner for each report. So, we define banner as a static top-level variable in our code and move the create_banner call to the main function, so that create_banneris called only once:

char *banner;

void
output_report (int nr)
{
  puts (banner);
  printf ("Number: %d\n", nr);
  printf ("\n");
}

int
main
{
  banner = create_banner ();
  for (int i = 1; i <= 3; i++)
    output_report (i);

  return 0;
}

Note how we again forget to call free, this time at the end of main. Now, when running under Valgrind, Memcheck will report still reachable: 14 bytes in 1 blocks and zero bytes lost in any other category.

But the output offers no details of loss records with backtraces for memory blocks that are still reachable, even though we ran with --leak-check=full. This is because Memcheck thinks the error is not very serious. The memory is still reachable, so the program could still be using it. In theory, you could free it at the end of the program, but all memory is freed at the end of the program anyway.

Although still reachable memory isn't a real issue in theory, you might still want to look into it. You might want to see whether you could free a given block earlier, which might lower memory usage for longer running programs. Or because you really like to see that statementAll heap blocks were freed -- no leaks are possible. To get the details you need, add --show-leak-kinds=reachable or --show-leak-kinds=all to the Valgrind command line (together with --leak-check=full). Now you will also get backtraces showing where still reachable memory blocks were allocated in your program.

Possibly lost

To explore the other categories of leaks, we change our program a little to include some lists of numbers to report. Each report will have a different list of numbers to report. The complete data structure is allocated at the start of the program. And for each set of numbers, we allocate a new block of numbers. To keep things simple (too simple, as Memcheck will point out) we keep just one pointer to the current numbers struct to be printed. Although we create three sets of numbers, we output only two reports:

#include <stdlib.h>
#include <stdio.h>

struct numbers
{
  int n;
  int *nums;
};

int n;
struct numbers *numbers;

void
create_numbers (struct numbers **nrs, int *n)
{
  *n = 3;
  *nrs = malloc ((sizeof (struct numbers) * 3));
  struct numbers *nm = *nrs;
  for (int i = 0; i < 3; i++)
    {
      nm->n = i + 1;
      nm->nums = malloc (sizeof (int) * (i + 1));
      for (int j = 0; j < i + 1; j++)
        nm->nums[j] = i + j;
      nm++;
    }
}

void
output_report ()
{ 
  puts ("numbers"); 
  for (int i = 0; i < numbers->n; i++)
    printf ("Number: %d\n", numbers->nums[i]);
  printf ("\n");
}

int
main ()
{ 
  create_numbers (&numbers, &n);
  for (int i = 0; i < 2; i++)
    {
      output_report ();
      numbers++;
    }
  return 0;
}

When we compile this program with gcc -Wall -g -o possibly possibly.c and then run it under Valgrind with valgrind --leak-check=full ./possibly, Valgrind reports possibly lost: 72 bytes in 4 blocks. And because we ran with --leak-check=full, it also reports the backtraces:

 24 bytes in 3 blocks are possibly lost in loss record 1 of 2
    at 0x4C29F33: malloc (vg_replace_malloc.c:309)
    by 0x4011C3: create_numbers (possibly.c:22)
    by 0x40128F: main (possibly.c:41)
 
 48 bytes in 1 blocks are possibly lost in loss record 2 of 2
    at 0x4C29F33: malloc (vg_replace_malloc.c:309)
    by 0x401185: create_numbers (possibly.c:17)
    by 0x40128F: main (possibly.c:41)

Memcheck calls this memory possibly lost because it can still see how to access the blocks of memory. The numbers pointer points to the third block of numbers. If we kept some extra information, we could theoretically count backward to the beginning of this block of memory and access the rest of the information, or deallocate the whole block and the other memory it points to.

But Memcheck thinks this is most likely a mistake. And in our example, as in most such cases, Memcheck is right. When walking a data structure without keeping a reference to the structure itself, we can never reuse or free the structure. We should have used the numbers pointer as a base and used an (array) index to pass the current record as output_report (&numbers[i]). Then, Memcheck would have reported the data blocks as still reachable. (There is still a memory leak, but not a severe one, because there is a direct pointer to the memory and it could easily be freed.)

Indirectly lost

In the previous example Memcheck reported a possibly lost block because the numbers pointer was still pointing inside an allocated block. We might be tempted to fix it by simply clearing the pointer after the output_report calls by doing numbers = NULL; to indicate that there is no current numbers list to report. But then we have also just lost the last pointer to our memory data blocks. We should have freed the memory first, but we can't do it now because we don't have a pointer to the start of the data structure anymore:

int
main ()
{
  create_numbers (&numbers, &n);
  for (int i = 0; i < 2; i++)
    {
      output_report ();
      numbers++;
    }
  numbers = NULL;
  return 0;
}

Now Memcheck will report the memory as definitely lost. And because the memory block contained pointers to other memory blocks, those blocks are reported as indirectly lost. If we run with --leak-check=full we see a backtrace for the main numbers memory block:

 72 (48 direct, 24 indirect) bytes in 1 blocks are definitely lost in loss record 2 of 2
    at 0x4C29F33: malloc (vg_replace_malloc.c:309)
    by 0x401185: create_numbers (possibly.c:17)
    by 0x40128F: main (possibly.c:41)

 LEAK SUMMARY:
    definitely lost: 48 bytes in 1 blocks
    indirectly lost: 24 bytes in 3 blocks
      possibly lost: 0 bytes in 0 blocks
    still reachable: 0 bytes in 0 blocks
         suppressed: 0 bytes in 0 blocks

Note how there are no backtraces for the indirectly lost blocks. This is because Memcheck believes you will probably fix that when you fix the definitely lost block. If you do free the definitely lost block, but not the blocks of memory that were indirectly pointed to, next time you run your partially fixed program under Valgrind, Memcheck will report those indirectly lost blocks as definitely lost (and now with a backtrace). So by iteratively fixing the definitely lost memory leaks, you will eventually fix all indirectly lost memory leaks.

If you cannot immediately find the definitely lost block that caused some indirectly lost blocks, it might be informative to see the backtraces for where the indirectly lost blocks were created. When using --leak-check=full you can do that by adding --show-leak-kinds=reachable or --show-leak-kinds=all to the valgrind command line.

Suppressed

By default, Memcheck counts definitely lost and possibly lost blocks as errors with --leak-check=full. It will also show where those blocks were allocated. It doesn't regard indirectly lost blocks or still reachable lost blocks as errors by default. And it won't show backtraces for where those still reachable or indirectly lost blocks were allocated, unless explicitly asked to  do so with --show-leak-kinds=all.

Indirectly lost blocks will disappear (or turn into definitely lost blocks) when you resolve the definitely lost issues. Without definitely lost blocks, there can be no indirectly lost blocks. For reachable blocks, it might still make sense to see whether you can deallocate them early, in order to lower memory usage of your program. Or explicitly free them at the end of your program to make sure all memory is really accounted for and cleaned up.

But there might be reasons for not fixing all memory leaks. They might occur in a library you are using that cannot easily be replaced. Or you might be convinced that a possibly lost block isn't really an error. If, in the original definitely lost example, you decide not to fix the issue and to keep the memory leak, you might want to generate a suppression so Memcheck won't complain about this particular block again. You can do this easily by running valgrind --leak-check=full --gen-suppressions=all ./definitely which generates an example suppression:

{
   insert_a_suppression_name_here
   Memcheck:Leak
   match-leak-kinds: definite
   fun:malloc
   fun:create_banner
   fun:output_report
   fun:main
}

You can put that into a file (say, local.supp), replacing insert_a_suppression_name_here with something descriptive such as small leak in create_banner. Now, when you run valgrind --suppressions=./local.supp --leak-check=full ./definitely, the leak will be suppressed:

 LEAK SUMMARY:
    definitely lost: 0 bytes in 0 blocks
    indirectly lost: 0 bytes in 0 blocks
      possibly lost: 0 bytes in 0 blocks
    still reachable: 0 bytes in 0 blocks
         suppressed: 42 bytes in 3 blocks

There won't be any more output for any of the suppressed blocks. But if you want to see which suppressions were used, you can add --show-error-list=yes (or -s) to the valgrind command line. That option makes Valgrind show the suppression name, suppression file, line number, and how many bytes and blocks were suppressed by that suppression rule:

used_suppression:
  1 small leak in create_banner ./local.supp:2 suppressed: 42 bytes in 3 blocks

Test suite integration

When you have resolved all memory leak issues, or when you have suppressions for those you don't care about, you might want to integrate Valgrind into your test suite to catch any new memory leaks early. If you use --error-exitcode=<number>, Valgrind will change the program's exit code to the given number when an error (memory leak) is detected. You can also use --quiet (or -q) to make Valgrind silent, so that it doesn't interfere with the normal stdout and stderr of the program, except for error output, so that you can compare the program output as usual.

Remember that by default Memcheck regards only definitely lost and possibly lost memory blocks as errors. You can change that by using --errors-for-leak-kinds=set. If you are interested in getting an error only for definitely lost blocks, you can use --errors-for-leak-kinds=definite. When your test programs always free all memory blocks, including still reachable blocks, you can use --errors-for-leak-kinds=definite,possibly,reachable or --errors-for-leak-kinds=all. Note that --errors-for-leak-kinds=set, which works together with --error-exitcode=number and the above mentioned --show-leak-kinds=set option, which determines which backtraces to show, are independent. But in general you will want them to be the same, so that you will always get a backtrace for a memory error.

So, a good way to run your tests is valgrind -q --error-exitcode=99 --leak-check=full ./testprog. If you have any local suppressions, you can add --suppressions=local.supp. And if you really want all your test cases to be totally free from any kind of memory leak, add --show-leak-kinds=all --errors-for-leak-kinds=all.

Last updated: April 22, 2021