In an earlier article, Aaron Merey introduced the new elfutils debuginfo-server
daemon. With this software now integrated and released into elfutils 0.178 and coming to distros near you, it's time to consider why and how to set up such a service for yourself and your team.
Recall that debuginfod
exists to distribute ELF or DWARF debugging information, plus associated source code, for a collection of binaries. If you need to run a debugger like gdb
, a trace or probe tool like perf
or systemtap
, binary analysis tools like binutils
or pahole
, or binary rewriting libraries like dyninst
, you will eventually need debuginfo
that matches your binaries. The debuginfod
client support in these tools enables a fast, transparent way of fetching this data on the fly, without ever having to stop, change to root, run all of the right yum debuginfo-install
commands, and try again. Debuginfo lets you debug anywhere, anytime.
We hope this opening addresses the "why." Now, onto the "how."
Basic server operation
For clients to be able to download content, you need one or more debuginfod
servers, each with access to all of the potentially needed debuginfo
. Ideally, you should run debuginfod
servers as close as possible to the machines holding a copy of those build artifacts.
If you build your own software, then its build and source trees are in one location. To run a copy of debuginfod
on your build machines:
$ debuginfod -F /path/to/build/tree1 /path/to/build/tree2
Then, debuginfod
will periodically rescan all of these trees and make available all of the executables and debugging data there, plus the source files referenced from there. If you rebuild your code, the index will catch up soon (see the -t
parameter).
If you build your own software all the way into RPMs, then run a copy of debuginfod
with the parent directories containing the RPM files:
$ debuginfod -R /path/to/rpm/tree1 /path/to/rpm/tree2
Then, debuginfod
will periodically rescan all of these trees and make available all of the executables and the debugging files inside the RPMs. This tool matches -debuginfo
and -debugsource
files automatically.
Naturally, you can do both with one debuginfod
process: Just add those arguments together.
If you need to debug software that's a part of your Linux distribution, you have a bit of a quandary. Until distributions set up public debuginfod
servers, we have to fend for ourselves. Luckily, doing this is not too difficult. After all, you just need a machine where the distro's relevant packages have been installed—or even just downloaded:
$ mkdir distro-rpms ; cd distro-rpms $ debuginfod -R .
and repeat as needed:
$ yumdownloader PACKAGE-N-V-R $ yumdownloader --debuginfo PACKAGE-N-V-R
with all of the wildcards and retention that your disk will permit.
If you are running a Red Hat Satellite server in-house, or an informally managed mirror of distro packages, you can run debuginfod
against those systems' package archives in situ. There's no need to install (rpm -i
), filter, or reorganize them in any artificial way. Just let a copy of debuginfod
scan the directories.
Client configuration
OK, you now have one or more servers running, and they can be scanning the same or different trees of debuginfo
material. How do we get the clients to talk to them? The simple and obvious solution is to enumerate all of the servers you know of:
$ export DEBUGINFOD_URLS="http://host1:8002/ http://host2:8002/ ...." $ gdb ... etc.
For every lookup, the client will send a query to all of the servers at once, and the first one that reports back with the requested information will "win."
While this tactic works, there are a couple of downsides. First, one has to propagate this list of URLs to every client. Second, there is no opportunity to centrally cache content, so each client has to download content separately from the origin server (in HTTP terminology). There is a simple fix: federation.
Each debuginfod
server can also act as a client. If the server can't answer a query from its local index and has been configured with a list of upstream $DEBUGINFO_URLS
, then it will forward the request to the upstream servers. It will then cache the result of a positive response and then relay it back. The next request to the same object will be served from the cache (subject to cache retention constraints), instead.
This behavior lets you configure a federated hierarchy of debuginfod
servers. Doing so allows the concentration of configuration files and localizes caching. Each of your per-build-system debuginfods
can then be configured with a list of its higher-level peers. You can even have debuginfod
servers that don't scan any local directories at all, but function purely as upstream relays. Make sure the federation is a tree or directed-acylic-graph. Cycles would be bad.
Server management
You now have one or more servers running, and clients depending on them. What about keeping them running well? There are a couple of practical issues to worry about.
One is resource usage during and after indexing. Initial debuginfod
indexing is intense on the CPU and storage. It must momentarily stream-decompress RPMs, and parse every ELF or DWARF file. The index database is a tightly formatted SQLite file, but it can grow to around 1% of the size of normal compressed RPMs. If this aspect is not a problem, then don't worry about this next paragraph.
If indexing time and space for a very large set of archives is excessive, it can be helpful to run debuginfod
with file filters. Its -I
and -X
options let you specify regular expressions for file names that it should include or exclude. Say that, for example, your archive has multiple and different intermingled architectures or different major distro versions of files, and you only want to track a subset. You can use these options to force debuginfod
to skip files whose names don't match the patterns:
$ debuginfod -I '\.el[78]\.x86_64' -X 'python' -R /path
If your server has a lot of cores, consider splitting up the scan path into numerous subpaths, because debuginfod
starts one or two threads per path given on the command line. Actual concurrency is carefully managed, so you can be carefree when giving large path lists. So, use:
$ debuginfod -R /path/*
instead of
$ debuginfod -R /path
If your ELF, DWARF, or RPM archive is very large, you might consider sharding the scanning task between multiple copies of debuginfod,
each running near the storage server. You can use wildcards plus include and exclude paths to give each debuginfod
process only a subset of the data. We discussed above how debuginfod
servers can federate. Use that facility to create a single front-end debuginfod
that scans nothing, but delegates queries to the entire stableful of shards.
Running network servers in a shell by hand is a fine old-school method for playing around. For serious deployments, though, you will want your debuginfod
server to be managed by a supervisory system. Because debuginfod
runs so nicely in a plain shell, it runs just as nicely as a systemd
service or inside a container. A sample systemd
configuration comes with elfutils, and we plan to publish dockerfiles or container images with which you can run debuginfod inside Red Hat OpenShift, or another orchestration service.
Once the server is running, it's good to monitor it to keep it running. Textual logs go to standard output and error streams, where tools like systemd
journal or OpenShift can collect the text. Add more -v
verbosity options to generate more detailed traces. In addition to this textual data, debuginfod
serves a /metrics
web API URL, which is a Prometheus export-formatted quantitative data source. This URL provides internal statistics about what the server's threads are up to. It would not be hard to wire up alerting systems or other programs to detect various types of anomalies.
Security becomes a concern as soon as a debuginfod
service is provided across trust boundaries, such as on the internet and to the public. The man page offers a plethora of caution about the measures required for such a service to be safe for the users as well as the service operator. It's not rocket science, but ordinary HTTP frontend protections such as TLS encryption and load control are a must, such as using an HAProxy installation. It is also important to limit debuginfod
indexing to trustworthy (non-hostile) binaries.
Looking ahead
What does the future hold? We'd like to support Debian format packages soon, so our friends in that ecosystem can also take full advantage. We would be delighted to assist Linux distributions in operating public debuginfod
services for their users and are already prototyping this service in Fedora koji. We also envision more manageability features, and perhaps integration with source version control systems. We also welcome suggestions from our early adopters—you!
We hope this article was helpful in motivating you and helping you set up your own debuginfod
services. Please contact us on the elfutils-devel@sourceware.org mailing list.