Performance

MIR: A lightweight JIT compiler project

MIR: A lightweight JIT compiler project

For the past three years, I’ve been participating in adding just-in-time compilation (JIT) to CRuby. Now, CRuby has the method-based just-in-time compiler (MJIT), which improves performance for non-input/output-bound programs.

The most popular approach to implementing a JIT is to use LLVM or GCC JIT interfaces, like ORC or LibGCCJIT. GCC and LLVM developers spend huge effort to implement the optimizations reliably, effectively, and to work on a lot of targets. Using LLVM or GCC to implement JIT, we can just utilize these optimizations for free. Using the existing compilers was the only way to get JIT for CRuby in the short time before the Ruby 3.0 release, which has the goal of improving CRuby performance by three times.

So, CRuby MJIT utilizes GCC or LLVM, but what is unique about this JIT?

Continue reading “MIR: A lightweight JIT compiler project”

Share
Deploying debuginfod servers for your developers

Deploying debuginfod servers for your developers

In an earlier article, Aaron Merey introduced the new elfutils debuginfo-server daemon. With this software now integrated and released into elfutils 0.178 and coming to distros near you, it’s time to consider why and how to set up such a service for yourself and your team.

Recall that debuginfod exists to distribute ELF or DWARF debugging information, plus associated source code, for a collection of binaries. If you need to run a debugger like gdb, a trace or probe tool like perf or systemtap, binary analysis tools like binutils or pahole, or binary rewriting libraries like dyninst, you will eventually need debuginfo that matches your binaries. The debuginfod client support in these tools enables a fast, transparent way of fetching this data on the fly, without ever having to stop, change to root, run all of the right yum debuginfo-install commands, and try again. Debuginfo lets you debug anywhere, anytime.

We hope this opening addresses the “why.” Now, onto the “how.”

Continue reading “Deploying debuginfod servers for your developers”

Share
An upside-down approach to GCC optimizations

An upside-down approach to GCC optimizations

Many traditional optimizations in the compiler work from a top-down approach, which starts at the beginning of the program and works toward the bottom. This allows the optimization to see the definition of something before any uses of it, which simplifies most evaluations. It’s also the natural way we process things. In this article, we’ll look at a different approach and a new project called Ranger, which attempts to turn this problem upside down.

Continue reading “An upside-down approach to GCC optimizations”

Share
Customize the compilation process with Clang: Making compromises

Customize the compilation process with Clang: Making compromises

In this two-part series, we’re looking at the Clang compiler and various ways of customizing the compilation process. These articles are an expanded version of the presentation, called Merci le Compilo, which was given at CPPP in June.

In part one, we looked at specific options for customization. And, in this article, we’ll look at some examples of compromises and tradeoffs involved in different approaches.

Continue reading “Customize the compilation process with Clang: Making compromises”

Share
Customize the compilation process with Clang: Optimization options

Customize the compilation process with Clang: Optimization options

When using C++, developers generally aim to keep a high level of abstraction without sacrificing performance. That’s the famous motto “costless abstractions.” Yet the C++ language actually doesn’t give a lot of guarantees to developers in terms of performance. You can have the guarantee of copy-elision or compile-time evaluation, but key optimizations like inlining, unrolling, constant propagation or, dare I say, tail call elimination are subject to the goodwill of the standard’s best friend: the compiler.

This article focuses on the Clang compiler and the various flags it offers to customize the compilation process. I’ve tried to keep this from being a boring list, and it certainly is not an exhaustive one.

Continue reading “Customize the compilation process with Clang: Optimization options”

Share
Probing golang runtime using SystemTap

Probing golang runtime using SystemTap

I recently saw an article from Uber Engineering describing an issue they were having with an increase in latency. The Uber engineers suspected that their code was running out of stack space causing the golang runtime to issue a stack growth, which would introduce additional latency due to memory allocation and copying. The engineers ended up modifying the golang runtime with additional instrumentation to report these stack growths to confirm their suspicions. This situation is a perfect example of where SystemTap could have been used.

SystemTap is a tool that can be used to perform live analysis of a running program. It is able to interrupt normal control flow and execute code specified by a SystemTap script, which can allow users to temporarily modify a running program without having to change the source and recompile.

Continue reading “Probing golang runtime using SystemTap”

Share
Project Loom: Lightweight Java threads

Project Loom: Lightweight Java threads

Building responsiveness applications is a never-ending task. With the rise of powerful and multicore CPUs, more raw power is available for applications to consume. In Java, threads are used to make the application work on multiple tasks concurrently. A developer starts a Java thread in the program, and tasks are assigned to this thread to get processed. Threads can do a variety of tasks, such as read from a file, write to a database, take input from a user, and so on.

In this article, we’ll explain more about threads and introduce Project Loom, which supports high-throughput and lightweight concurrency in Java to help simplify writing scalable software.

Continue reading “Project Loom: Lightweight Java threads”

Share
2 tips to make your C++ projects compile 3 times faster

2 tips to make your C++ projects compile 3 times faster

In this article, I will demonstrate how to speed up your compilation times by distributing compilation load using a distcc server container.  Specifically, I’ll show how to set up and use containers running a distcc server to distribute the compilation load over a heterogeneous cluster of nodes (development laptop, old desktop PC, and a Mac). To improve the speed of recompilation, I will use ccache.

Continue reading “2 tips to make your C++ projects compile 3 times faster”

Share