Siddhesh Poyarekar

Recent Posts

Malloc systemtap probes: an example

gnu logoOne feedback I got from my blog post on Understanding malloc behavior using Systemtap userspace probes was that I should have included an example script to explain how this works. Well, better late than never, so here’s an example script. This script prints some diagnostic information during a program run and also logs some information to print out a summary at the end. I’ll go through the script a few related probes at a time.

global sbrk, waits, arenalist, mmap_threshold = 131072, heaplist

Continue reading “Malloc systemtap probes: an example”

Share

Improving math performance in glibc

Update: Vincent Lefèvre helpfully pointed out that I had linked to the incorrect Worst Cases paper. That link is now fixed.

Update 2: Dan Courcy pointed out that my equation in the “Multiplying zeroes” section had an error, which I have now fixed.

Mathematical function implementations usually have to trade off between speed of computation and the accuracy of the result. This is especially true for transcendentals (i.e. the exponential and trigonometric functions), where results often have to be computed to a fairly large precision to get last bit accuracy in a result that is to be stored in an IEEE-754 double variable.

gnu logoTranscendental functions in glibc are implemented as multiple phases. The first phases of computation use a lookup table of pre-computed values and a polynomial approximation, a combination of which gives an accurate result for a majority of inputs. If it is found that the lookup table may not give an accurate result the next, slower phase is employed. This phase uses a multiple precision representation to compute results to precisions of up to 768 bits before rounding the result to double. As expected, this kills performance; the slowest path for pow for example is a few thousand times slower than the table lookup phase.

I looked into this problem not very long ago and tried to improve performance of the multiple precision phase so that it is not as bad as it currently is. The result of that was an improvement of about 8 times in the performance of the slowest path of the pow function. Other transcendentals got similar improvements since the fixes were mostly in the generic multiple precision code. These improvements went into glibc-2.18 upstream. We’ll take a look at what these changes were but first, a quick look at the multiple precision number for context on the changes.

Continue reading “Improving math performance in glibc”

Share

Understanding malloc behavior using Systemtap userspace probes

The malloc family of functions are critical for almost every serious application program. Its performance characteristics often have a big impact on the performance of applications. Given that the default malloc implementation needs to have consistent performance for all general cases, it makes available a number of tunables that can help developers tweak its behavior to suit their programs.

About two years ago I had written an article on the Red Hat Customer Portal that described the high level design of the GNU C Library memory allocator and also introduced the reader to various magic environment variables that malloc understands to change its behavior. The behavior documented in that article and the tricks to tweak malloc behavior hold just as true for RHEL-7, which is based on upstream glibc 2.17 as they did for RHEL-6, which is based on upstream glibc 2.12.

Continue reading “Understanding malloc behavior using Systemtap userspace probes”

Share