Deploying debuginfod servers for your developers
In an earlier article, Aaron Merey introduced the new elfutils
debuginfo-server daemon. With this software now integrated and released into elfutils 0.178 and coming to distros near you, it’s time to consider why and how to set up such a service for yourself and your team.
debuginfod exists to distribute ELF or DWARF debugging information, plus associated source code, for a collection of binaries. If you need to run a debugger like
gdb, a trace or probe tool like
systemtap, binary analysis tools like
pahole, or binary rewriting libraries like
dyninst, you will eventually need
debuginfo that matches your binaries. The
debuginfod client support in these tools enables a fast, transparent way of fetching this data on the fly, without ever having to stop, change to root, run all of the right
yum debuginfo-install commands, and try again. Debuginfo lets you debug anywhere, anytime.
We hope this opening addresses the “why.” Now, onto the “how.”
Basic server operation
For clients to be able to download content, you need one or more
debuginfod servers, each with access to all of the potentially needed
debuginfo. Ideally, you should run
debuginfod servers as close as possible to the machines holding a copy of those build artifacts.
If you build your own software, then its build and source trees are in one location. To run a copy of
debuginfod on your build machines:
$ debuginfod -F /path/to/build/tree1 /path/to/build/tree2
debuginfod will periodically rescan all of these trees and make available all of the executables and debugging data there, plus the source files referenced from there. If you rebuild your code, the index will catch up soon (see the
If you build your own software all the way into RPMs, then run a copy of
debuginfod with the parent directories containing the RPM files:
$ debuginfod -R /path/to/rpm/tree1 /path/to/rpm/tree2
debuginfod will periodically rescan all of these trees and make available all of the executables and the debugging files inside the RPMs. This tool matches
-debugsource files automatically.
Naturally, you can do both with one
debuginfod process: Just add those arguments together.
If you need to debug software that’s a part of your Linux distribution, you have a bit of a quandary. Until distributions set up public
debuginfod servers, we have to fend for ourselves. Luckily, doing this is not too difficult. After all, you just need a machine where the distro’s relevant packages have been installed—or even just downloaded:
$ mkdir distro-rpms ; cd distro-rpms $ debuginfod -R .
and repeat as needed:
$ yumdownloader PACKAGE-N-V-R $ yumdownloader --debuginfo PACKAGE-N-V-R
with all of the wildcards and retention that your disk will permit.
If you are running a Red Hat Satellite server in-house, or an informally managed mirror of distro packages, you can run
debuginfod against those systems’ package archives in situ. There’s no need to install (
rpm -i), filter, or reorganize them in any artificial way. Just let a copy of
debuginfod scan the directories.
Everything you need to grow your career.
With your free Red Hat Developer program membership, unlock our library of cheat sheets and ebooks on next-generation application development.SIGN UP
OK, you now have one or more servers running, and they can be scanning the same or different trees of
debuginfo material. How do we get the clients to talk to them? The simple and obvious solution is to enumerate all of the servers you know of:
$ export DEBUGINFOD_URLS="http://host1:8002/ http://host2:8002/ ...." $ gdb ... etc.
For every lookup, the client will send a query to all of the servers at once, and the first one that reports back with the requested information will “win.”
While this tactic works, there are a couple of downsides. First, one has to propagate this list of URLs to every client. Second, there is no opportunity to centrally cache content, so each client has to download content separately from the origin server (in HTTP terminology). There is a simple fix: federation.
debuginfod server can also act as a client. If the server can’t answer a query from its local index and has been configured with a list of upstream
$DEBUGINFO_URLS, then it will forward the request to the upstream servers. It will then cache the result of a positive response and then relay it back. The next request to the same object will be served from the cache (subject to cache retention constraints), instead.
This behavior lets you configure a federated hierarchy of
debuginfod servers. Doing so allows the concentration of configuration files and localizes caching. Each of your per-build-system
debuginfods can then be configured with a list of its higher-level peers. You can even have
debuginfod servers that don’t scan any local directories at all, but function purely as upstream relays. Make sure the federation is a tree or directed-acylic-graph. Cycles would be bad.
You now have one or more servers running, and clients depending on them. What about keeping them running well? There are a couple of practical issues to worry about.
One is resource usage during and after indexing. Initial
debuginfod indexing is intense on the CPU and storage. It must momentarily stream-decompress RPMs, and parse every ELF or DWARF file. The index database is a tightly formatted SQLite file, but it can grow to around 1% of the size of normal compressed RPMs. If this aspect is not a problem, then don’t worry about this next paragraph.
If indexing time and space for a very large set of archives is excessive, it can be helpful to run
debuginfod with file filters. Its
-X options let you specify regular expressions for file names that it should include or exclude. Say that, for example, your archive has multiple and different intermingled architectures or different major distro versions of files, and you only want to track a subset. You can use these options to force
debuginfod to skip files whose names don’t match the patterns:
$ debuginfod -I '\.el\.x86_64' -X 'python' -R /path
If your server has a lot of cores, consider splitting up the scan path into numerous subpaths, because
debuginfod starts one or two threads per path given on the command line. Actual concurrency is carefully managed, so you can be carefree when giving large path lists. So, use:
$ debuginfod -R /path/*
$ debuginfod -R /path
If your ELF, DWARF, or RPM archive is very large, you might consider sharding the scanning task between multiple copies of
debuginfod,each running near the storage server. You can use wildcards plus include and exclude paths to give each
debuginfod process only a subset of the data. We discussed above how
debuginfod servers can federate. Use that facility to create a single front-end
debuginfod that scans nothing, but delegates queries to the entire stableful of shards.
Running network servers in a shell by hand is a fine old-school method for playing around. For serious deployments, though, you will want your
debuginfod server to be managed by a supervisory system. Because
debuginfod runs so nicely in a plain shell, it runs just as nicely as a
systemd service or inside a container. A sample
systemd configuration comes with elfutils, and we plan to publish dockerfiles or container images with which you can run debuginfod inside Red Hat OpenShift, or another orchestration service.
Once the server is running, it’s good to monitor it to keep it running. Textual logs go to standard output and error streams, where tools like
systemd journal or OpenShift can collect the text. Add more
-v verbosity options to generate more detailed traces. In addition to this textual data,
debuginfod serves a
/metrics web API URL, which is a Prometheus export-formatted quantitative data source. This URL provides internal statistics about what the server’s threads are up to. It would not be hard to wire up alerting systems or other programs to detect various types of anomalies.
Security becomes a concern as soon as a
debuginfod service is provided across trust boundaries, such as on the internet and to the public. The man page offers a plethora of caution about the measures required for such a service to be safe for the users as well as the service operator. It’s not rocket science, but ordinary HTTP frontend protections such as TLS encryption and load control are a must, such as using an HAProxy installation. It is also important to limit
debuginfod indexing to trustworthy (non-hostile) binaries.
What does the future hold? We’d like to support Debian format packages soon, so our friends in that ecosystem can also take full advantage. We would be delighted to assist Linux distributions in operating public
debuginfod services for their users and are already prototyping this service in Fedora koji. We also envision more manageability features, and perhaps integration with source version control systems. We also welcome suggestions from our early adopters—you!
We hope this article was helpful in motivating you and helping you set up your own
debuginfod services. Please contact us on the firstname.lastname@example.org mailing list.