Profile-guided optimization (PGO) is a now-common compiler technique for improving the compilation process. In PGO (sometimes pronounced “pogo”), an administrator uses the first version of the binary to collect a profile, through instrumentation or sampling, then uses that information to guide the compilation process.
Profile-guided optimization can help developers make better decisions, for instance, concerning inlining or block ordering. In some cases, it can also lead to using obsolete profile information to guide compilation. For reasons that I will explain, this feature can benefit large projects. It also puts the burden on the compiler implementation to detect and handle inconsistencies.
This article focuses on how the Clang compiler implements PGO, and specifically, how it instruments binaries. We will look at what happens when Clang instruments source code during the compilation step to collect profile information during execution. Then, I’ll introduce a real-world bug that demonstrates the pitfalls of the current approach to PGO.
Continue reading “Profile-guided optimization in Clang: Dealing with modified sources”
The Python interpreter shipped with Red Hat Enterprise Linux (RHEL) 8 is version 3.6, which was released in 2016. While Red Hat is committed to supporting the Python 3.6 interpreter for the lifetime of Red Hat Enterprise Linux 8, it is becoming a bit old for some use cases.
Continue reading Red Hat Enterprise Linux 8.2 brings faster Python 3.8 run speeds
In the previous article, I discussed the benefits of C and C++ language restrictions in optimized code. In this second half, I present a variety of programming language exemptions and compiler extensions that developers can use to get around aliasing restrictions more or less safely. I will also discuss the common pitfalls of aliasing, both resulting from the extensions as well as from misuses of standard language constructs, and illustrate common problems these pitfalls might cause.
Continue reading “The joys and perils of aliasing in C and C++, Part 2”
The developer experience in Red Hat OpenShift Container Platform 4.4’s web console now includes a metrics-based dashboard. With this dashboard, you can access insights into your application metrics instead of relying on external tools. This Tech Preview feature is available in the Developer perspective’s Monitoring section, providing access to the dashboard, metrics, and events. Monitoring information is also available on the resource side panel, accessible within the Topology and Workloads views.
Continue reading OpenShift 4.4: Getting insights into your application
Many front-end developers are discovering the benefits of contract-first development. With this approach, front- and back-end developers use OpenAPI to collaboratively design an API specification. Once the initial specification is done, front-end developers can use API definitions and sample data to develop discrete user interface (UI) components. Defining a single OpenAPI spec improves cross-team collaboration, and API definitions empower front-end developers to design our initial workflows without relying on the back end.
Continue reading Contract-first development: Create a mock back end for realistic data interactions with React
When examining Linux firewall performance, there is a second aspect to packet processing—namely, the cost of firewall setup manipulations. In a world of containers, distinct network nodes spawn quickly enough for firewall ruleset adjustment delay to become a significant factor. At the same time, rulesets tend to become huge given the number of containers even a moderately specced server might host.
In the past, considerable effort was put into legacy
iptables to speed up the handling of large rulesets. With the recent push upstream and downstream to establish
iptables-nft as the standard variant, a reassessment of this quality is in order. To see how bad things really are, I created a bunch of benchmarks to run with both variants and compare the results.
Continue reading “Optimizing iptables-nft large ruleset performance in user space”
Storage prices are decreasing, while business demands are growing, and companies are storing more data than ever before. Following this growth pattern, demand grows for monitoring and data protection involving software-defined storage. Downtimes have a high cost that can directly impact business continuity and cause irreversible damage to organizations. Aftereffects include loss of assets and information; interruption of services and operations; law, regulation, or contract violations; along with the financial impacts from losing customers and damaging a company’s reputation.
Gartner estimates that a minute of downtime costs enterprise organizations $5,600, and an hour costs over $300,000.
On the other hand, in a DevOps context, it’s essential to think about continuous monitoring, which is a proactive approach to monitoring throughout the full application’s life cycle and that of its components. This approach helps identify the root cause of possible problems and then quickly and proactively prevent performance issues or future outages. In this article, you will learn how to implement Ceph storage monitoring using the enterprise open source tool Zabbix.
Continue reading “Ceph storage monitoring with Zabbix”
Red Hat OpenShift offers horizontal pod autoscaling (HPA) primarily for CPUs, but it can also perform memory-based HPA, which is useful for applications that are more memory-intensive than CPU-intensive. In this article, I demonstrate how to use OpenShift’s memory-based horizontal pod autoscaling feature (tech preview) to autoscale your pods if the demands on memory increase. The test performed in this article might not necessarily reflect a real application. The tests only aim to demonstrate memory-based HPA in the simplest way possible.
Continue reading “Testing memory-based horizontal pod autoscaling on OpenShift”
Developers think of their programs as a serial sequence of operations running as written in the original source code. However, program source code is just a specification for computations. The compiler analyzes the source code and determines if changes to the specified operations will yield the same visible results but be more efficient. It will eliminate operations that are ultimately not visible, and rearrange operations to extract more parallelism and hide latency. These differences between the original program’s source code and the optimized binary that actually runs might be visible when inspecting the execution of the optimized binary via tools like GDB and SystemTap.
To aid with the debugging and instrumentation of binaries the compiler generates debug information to map between the source code and executable binary. The debug information includes which line of source code each machine instruction is associated with, where the variables are located, and how to unwind the stack to get a backtrace of function calls. However, even with the compiler generating this information, a number of non-intuitive effects might be observed when instrumenting a compiler-optimized binary:
Continue reading “Possible issues with debugging and inspecting compiler-optimized binaries”
The first part of this miniseries about Shenandoah GC in JDK 14 covered self-fixing barriers. This article discusses concurrent roots processing and concurrent class unloading, both of which aim to reduce GC pause time by moving GC work from the pause to a concurrent phase.
Continue reading Shenandoah GC in JDK 14, Part 2: Concurrent roots and class unloading