Performance

Customize the compilation process with Clang: Making compromises

Customize the compilation process with Clang: Making compromises

In this two-part series, we’re looking at the Clang compiler and various ways of customizing the compilation process. These articles are an expanded version of the presentation, called Merci le Compilo, which was given at CPPP in June.

In part one, we looked at specific options for customization. And, in this article, we’ll look at some examples of compromises and tradeoffs involved in different approaches.

Continue reading “Customize the compilation process with Clang: Making compromises”

Share
Customize the compilation process with Clang: Optimization options

Customize the compilation process with Clang: Optimization options

When using C++, developers generally aim to keep a high level of abstraction without sacrificing performance. That’s the famous motto “costless abstractions.” Yet the C++ language actually doesn’t give a lot of guarantees to developers in terms of performance. You can have the guarantee of copy-elision or compile-time evaluation, but key optimizations like inlining, unrolling, constant propagation or, dare I say, tail call elimination are subject to the goodwill of the standard’s best friend: the compiler.

This article focuses on the Clang compiler and the various flags it offers to customize the compilation process. I’ve tried to keep this from being a boring list, and it certainly is not an exhaustive one.

Continue reading “Customize the compilation process with Clang: Optimization options”

Share
Probing golang runtime using SystemTap

Probing golang runtime using SystemTap

I recently saw an article from Uber Engineering describing an issue they were having with an increase in latency. The Uber engineers suspected that their code was running out of stack space causing the golang runtime to issue a stack growth, which would introduce additional latency due to memory allocation and copying. The engineers ended up modifying the golang runtime with additional instrumentation to report these stack growths to confirm their suspicions. This situation is a perfect example of where SystemTap could have been used.

SystemTap is a tool that can be used to perform live analysis of a running program. It is able to interrupt normal control flow and execute code specified by a SystemTap script, which can allow users to temporarily modify a running program without having to change the source and recompile.

Continue reading “Probing golang runtime using SystemTap”

Share
Project Loom: Lightweight Java threads

Project Loom: Lightweight Java threads

Building responsiveness applications is a never-ending task. With the rise of powerful and multicore CPUs, more raw power is available for applications to consume. In Java, threads are used to make the application work on multiple tasks concurrently. A developer starts a Java thread in the program, and tasks are assigned to this thread to get processed. Threads can do a variety of tasks, such as read from a file, write to a database, take input from a user, and so on.

In this article, we’ll explain more about threads and introduce Project Loom, which supports high-throughput and lightweight concurrency in Java to help simplify writing scalable software.

Continue reading “Project Loom: Lightweight Java threads”

Share
2 tips to make your C++ projects compile 3 times faster

2 tips to make your C++ projects compile 3 times faster

In this article, I will demonstrate how to speed up your compilation times by distributing compilation load using a distcc server container.  Specifically, I’ll show how to set up and use containers running a distcc server to distribute the compilation load over a heterogeneous cluster of nodes (development laptop, old desktop PC, and a Mac). To improve the speed of recompilation, I will use ccache.

Continue reading “2 tips to make your C++ projects compile 3 times faster”

Share
Using Quiver with AMQ on Red Hat OpenShift Container Platform

Using Quiver with AMQ on Red Hat OpenShift Container Platform

As part of the Red Hat UKI Professional Services team, I have worked with several customers who are implementing AMQ Broker on Red Hat OpenShift Container Platform (OCP). One question customers typically ask is, “How do we validate that the AMQ configuration is correct for our scenario?” Previously, I would have suggested one of the following:

These tools can give you indicators around:

  • Is the broker up and running? That is, can it receive/publish messages for this configuration?
  • Can the broker handle a certain performance characteristic? That is, what is my minimum publish rate per second for this configuration?
  • And much more.

The problem with these tools is that you cannot choose the client technology. This could lead to real-world differences and limited technology choices, which in turn might lead you down the wrong technology path. In other words:

  • Do you get the same performance from JMeter versus the AMQ clients you would use in production? Are you comparing like for like? Apples with apples?

So, what do I think is the answer? Quiver [1]. In this article, I’ll provide an overview and demo of using Quiver with Red Hat AMQ on Red Hat OpenShift.  If you’re looking for more information on Red Hat AMQ and how it can help, check out this webinar.

Continue reading “Using Quiver with AMQ on Red Hat OpenShift Container Platform”

Share
How to use the Linux perf tool to count software events

How to use the Linux perf tool to count software events

The Linux perf tool was originally written to allow access to the performance monitoring hardware that counts hardware events, such as instructions executed, processor cycles, and cache misses. However, it can also be used to count software events, which can be useful in gauging how frequently some part of the system software is executed.

Recently someone at Red Hat asked whether there was a way to get a count of system calls being executed on the system. The kernel has a predefined software trace point, raw_syscalls:sys_enter, which collects that exact information; it counts each time a system call is made. To use the trace point events, the perf command needs to be run as root.

Continue reading “How to use the Linux perf tool to count software events”

Share
Speed up SystemTap scripts with statistical aggregates

Speed up SystemTap scripts with statistical aggregates

A common question that SystemTap can be used to answer involves how often particular events occur on the system. Some events, such as system calls, can happen frequently and the goal is to make the SystemTap script as efficient as possible.

Using the statistical aggregate in the place of regular integers is one way to improve the performance of SystemTap scripts. The statistical aggregates record data on a per-CPU basis to reduce the amount of coordination required between processors, allowing information to be recorded with less overhead. In this article, I’ll show an example of how to reduce overhead in SystemTap scripts.

Continue reading “Speed up SystemTap scripts with statistical aggregates”

Share