Software development can be a complex field, but it’s also a fulfilling one that offers tremendous opportunities for growth and collaboration. This blog post will take you through my journey as a software engineering intern on the Performance Tools team at Red Hat, and my experiences as I delve into the realms of C++, Linux, ELF, and open-source development. The focal point of this experience was the development of a new tool – the srcfiles tool within the elfutils project. This venture not only taught me valuable technical skills but also helped me grow as a software engineer and appreciate the power of open-source development.
The Journey Begins:
One of my first tasks on the team was the development of a new standalone tool to the Elfutils collection. The tool’s main purpose was to take an input file in ELF (Executable and Linkable Format) format and list all the source files that were used to compile it. ELF is a widely used file format for executable or binary files, so this tool would have many potential applications for developers.
To accomplish this task, I had to learn how file information is stored and accessed in the DWARF/ELF file formats and how the elfutils library can handle and modify that information. I did this by reading up on documentation, reading code and asking my team questions. I communicated regularly with my team to ensure that we were on the same page about the feature’s goals and specifications. These conversations often reduced confusion and became a crucial moment of learning for me.
Once I had a comfortable grasp on these concepts and a firm understanding of the requirements of this tool, I set out implementing it. It can often be difficult knowing where to start when developing something new, but I found taking a methodical approach to be ideal. I would write down what steps I thought would be necessary to take the input and transform it to the desired output, often through notes and diagrams. I would then do research about how to complete these steps, which includes looking within the Elfutils library to see if a similar feature or function was developed previously. These pieces of code would serve as a source of inspiration and helped ensure I stay faithful to the coding style of Elfutils. Breaking down the problem like this helped make it more manageable and easy to understand.
How it works:
The tool can take an executable, a core dump, a process, or even the running kernel as input. It then parses the input as a DWFL object (an Elfutils abstraction for easy manipulation), and then converts it to a DWARF object using Elfutils functions. The DWARF object will contain data about the binary, including the source files used to compile the binary or object. Without going into the details of DWARF, the tool extracts the names and paths of all the source files through further processing. In scenarios where the input binary lacks this information, the tool can retrieve it from a public server with the help of debuginfod. The names and paths are then stored in an ordered set, which is sent to the standard output as text. To test the feature, I created a script that ran the srcfiles tool on its own binary.
Enhancements and Challenges:
As the srcfiles tool evolved, discussions on the public IRC channel continued alongside its development. Various team members suggested the implementation of critical features that were then integrated to enhance the tool’s utility. This led to the introduction of new options such as --null to delimit the output with null instead of newline for easier integration with other tools, --cu-only to only show direct compilation units, and --verbose to provide a more detailed output. These options helped provide flexibility and adaptability to the output format, ensuring that the tool catered to diverse use cases.
The natural next step in the development of this feature would be to allow the users to download the source files of the binary they are analyzing. This would be possible by leveraging the capabilities of debuginfod to attain the files and libarchive to archive them in a zip file. Additionally, a backup would be in place to parse local files for the source files if debuginfod is inaccessible.
However, the development process was not without its challenges. After implementing the zip feature, an unexpected issue with debuginfod surfaced, causing all files in the zip archive to be empty. This led to a deeper investigation of the undocumented issue by myself and the team, which revealed that the issue was with debuginfod itself, not srcfiles. Specifically, it failed to return the requested source files in the expected manner. This discovery marked a pivotal point in the development process, demanding a collaborative effort to understand and subsequently resolve the undocumented issue.
Hidden bugs, often elusive without triggering explicit errors, also gradually revealed themselves during the development process via the test scripts and buildbots. These included the filtering of undesirable source files with empty or generic name fields, which were being included in the output. Solving this problem was relatively simple and just involved skipping the undesired strings. A more covert bug was found when I noticed that some source file were missing from the zip archive, specifically ones with the same name but different directory paths and contents, such as /usr/include/bits/stdlib.h and /usr/include/stdlib.h. I solved it by including the full absolute file paths instead of just the names to the set to differentiate between similarly named source files.
Significance of the feature:
Users are likely to appreciate the usefulness of this tool because it aids in dependency analysis and streamlining the identification of dependencies within binaries. Additionally, it facilitates efficient packaging, distribution, and containerization by providing a comprehensive list or archive of source files. The tool may also be useful in debugging and profiling activities, offering a quick overview of source code hierarchy. It simplifies the setup of virtual environments and fosters open-source collaboration by automating the retrieval and sharing of source files associated with ELF binaries. One potential use that has been suggested in another open source space would be to enhance the license review process in open-source projects, aiding in the extraction and documentation of license information.
Lessons from Development:
Navigating through the challenges encountered in the development of this feature unveiled important lessons for me in software engineering.
- Deliberate Pace and Strategic Planning: The importance of taking a deliberate and measured approach to development was a recurrent theme. Breaking down the problem into manageable components allowed for a more systematic and effective resolution of challenges.
- Thorough Manual Testing: Before jumping into automated test scripts, manual testing should be prioritized. This hands-on approach helped me uncover hidden bugs and make sure the code was strong and reliable. By taking the time to test things manually first, we set a solid foundation for a more dependable development process. This helps us avoid hidden bugs that may not be uncovered by test scripts in the future.
- Debugging Tools Proficiency: Adeptness with tools like such as strace to record function calls, GDB to trace the objects in the code and valgrind to find memory leaks became pivotal, aiding in resolving challenges, improving error messages, and honing a critical eye for code logic.
- Questioning and Learning: The development journey emphasized the importance of maintaining consistent communication with my team. Asking questions and learning proved foundational, highlighting the value of seeking help and extracting insights from every challenge faced. This mindset fostered a dynamic and adaptive development environment and ensured that issues are resolved quickly while also turning them into valuable learning opportunities.
Adaptability and Problem-Solving as Development Hallmarks:
I learned that the ability to adapt and problem-solve in the face of unexpected challenges became a hallmark of the development process, and is what ultimately allowed for this project to be completed. This lesson underscores the importance of being flexible and resourceful when confronted with unforeseen obstacles, highlighting the crucial role of adaptability and creative problem-solving in successful project completion.
Usage and Invocation:
Install the elfutils collection using your operating system’s package manager, for example for RHEL/Fedora:
sudo yum install elfutils
Then srcfiles can be invoked through "eu-srcfiles."
Examples of Usage:
List all source files for a binary:
$ eu-srcfiles -e /bin/ls
/usr/include/asm-generic/int-ll64.h
/usr/include/assert.h
/usr/include/bits/byteswap.h
/usr/include/bits/dirent.h
/usr/include/bits/getopt_core.h
...
List all compilation units (CU) names for a given process (including shared libraries):
eu-srcfiles -c -p $$
List source files of a binary based on its buildid, using debuginfod:
binary=$(debuginfod-find executable 9c22d8d9e42bd051ffdc1064fdfd456ba781c629)
eu-srcfiles -c -e $binary
List the source files of a kernel image:
eu-srcfiles -e /boot/vmlinuz-$(uname -r)
Zip all the source files for a binary:
eu-srcfiles -z -e /bin/ls > ls.zip
Final Thoughts:
Developing the srcfiles tool was a rewarding and challenging project that involved working with ELFUTILS, debugging tools, and open-source communities. This blog post provides a detailed account of the various steps and obstacles that I encountered during the development process, as well as the skills and knowledge that I acquired and enhanced along the way. I hope that this blog post serves as a useful resource and a source of motivation for other developers and interns in their software engineering careers.