Create a scalable REST API with Falcon and RHSCL

APIs are critical to automation, integration and developing cloud-native applications, and it’s vital they can be scaled to meet the demands of your user-base. In this article, we’ll create a database-backed REST API based on the Python Falcon framework using Red Hat Software Collections (RHSCL), test how it performs, and scale-out in response to a growing user-base.

Continue reading Create a scalable REST API with Falcon and RHSCL

Share

What are BPF Maps and how are they used in stapbpf

Compared to SystemTap’s default backend, one of stapbpf’s most distinguishing features is the absence of a kernel module runtime. The BPF machinery inside the kernel instead mostly handles its runtime. Therefore it would be very helpful if BPF provided us with a way for states to be maintained across multiple invocations of BPF programs and for userspace programs to be able to communicate with BPF programs. This is accomplished by BPF maps. In this blog post, I will introduce BPF maps and explain their role in stapbpf’s implementation.

Continue reading What are BPF Maps and how are they used in stapbpf

Share

Introducing stapbpf – SystemTap’s new BPF backend

SystemTap 3.2 includes an early prototype of SystemTap’s new BPF backend (stapbpf). It represents a first step towards leveraging powerful new tracing and performance analysis capabilities recently added to the Linux kernel. In this post, I will compare the translation process of stapbpf with the default backend (stap) and compare some differences in functionality between these two backends.

Continue reading “Introducing stapbpf – SystemTap’s new BPF backend”

Share

Profiling NodeJS applications with Linux Performance Tools

Using Linux Perf Tools

The Performance Analysis Tool for Linux (perfis a powerful tool to profile applications. It works by using a mix of hardware counters (is fast) and software counters, all provided by the Linux Performance Counter (LPC) subsystem that takes charge of the complex task of wrapping the CPU counters for the different type of CPUs. So you can have access to a very efficient way to get information of running processes through their C API or a convenient command in this case (perf).

This command gives you access to a great variety of system and process level events but in this entry, I will use it to investigate CPU bounded issues.  

Continue reading “Profiling NodeJS applications with Linux Performance Tools”

Share

The need for speed and the kernel datapath – recent improvements in UDP packets processing

Networking hardware is becoming crazily fast, 10Gbs NICs are entry-level for server h/w, 100Gbs cards are increasingly popular and 200Gbs are already surfacing. While the Linux kernel is striving to cope with such speeds with large packets and all kind of aggregation, ISPs are requesting much more demanding workload with NFV and line rate packet processing even for 64 bytes packets.

Is everything lost and are we all doomed to rely on some kernel bypass solution? Possibly, but let’s first inspect what is really the current status for packet processing in the kernel data path, with a perspective look at the relevant history and the recent improvements.

We will focus on UDP packets reception: UDP flood is a common tool to stress the networking stack allowing arbitrary small packets and defeating packet aggregation (GRO), in place for other protocols.

Continue reading “The need for speed and the kernel datapath – recent improvements in UDP packets processing”

Share

Towards Faster Ruby Hash Tables

Hash tables are an important part of dynamic programming languages. They are widely used because of their flexibility, and their performance is important for the overall performance of numerous programs. Ruby is not an exception. In brief, Ruby hash tables provide the following API:

  • insert an element with given key if it is not yet on the table or update the element value if it is on the table
  • delete an element with given key from the table
  • get the value of an element with given key if it is in the table
  • the shift operation (remove the earliest element inserted into the table)
  • traverse elements in their inclusion order, call a given function and depending on its return value, stop traversing or delete the current element and continue traversing
  • get the first N or all keys or values of elements in the table as an array
  • copy the table
  • clear the table

Continue reading “Towards Faster Ruby Hash Tables”

Share

Docker project: Can you have overlay2 speed and density with devicemapper? Yep.

It’s been a while since our last deep-dive into the Docker project graph driver performance.  Over two years, in fact!  In that time, Red Hat engineers have made major strides in improving container storage:

All of that, in the name of providing enterprise-class stability, security and supportability to our valued customers.

As discussed in our previous blog, there are a particular set of behaviors and attributes to take into account when choosing a graph driver.  Included in those are page cache sharing, POSIX compliance and SELinux support.

Reviewing the technical differences between a union filesystem and devicemapper graph driver as it relates to performance, standards compliance and density, a union filesystem such as overlay2 is fast because

  • It traverses less kernel and devicemapper code on container creation (devicemapper-backed containers get a unique kernel device allocated at startup).
  • Containers sharing the same base image startup faster because of warm page cache
  • For speed/density benefits, you trade POSIX compliance and SELinux (well, not for long!)

There was no single graph driver that could give you all these attributes at the same time — until now.

How we can make devicemapper as fast as overlay2

With the industry move towards microservices, 12-factor guidelines and dense multi-tenant platforms, many folks both inside Red Hat as well as in the community have been discussing read-only containers.  In fact, there’s been a –read-only option to both the Docker project, and kubernetes for a long time.  What this does is create a mount point as usual for the container, but mount it read-only as opposed to read-write.  Read-only containers are an important security improvement as well as they reduce the container’s attack surface.  More details on this can be found in a blog post from Dan Walsh last year.

When a container is launched in this mode, it can no longer write to locations it may expect to (i.e. /var/log) and may throw errors because of this.  As discussed in the Processes section of 12factor.net, re-architected applications should store stateful information (such as logs or web assets) in a stateful backing service.  Attaching a persistent volume that is read-write fulfills this design aspect:  the container can be restarted anywhere in the cluster, and its persistent volume can follow it.

In other words, for applications that are not completely stateless an ideal deployment would be to couple read-only containers with read-write persistent volumes.  This gets us to a place in the container world that the HPC (high performance/scientific computing) world has been at for decades:  thousands of diskless, read-only NFS-root booted nodes that mount their necessary applications and storage over the network at boot time.  No matter if a node dies…boot another.  No matter if a container dies…start another.

Continue reading “Docker project: Can you have overlay2 speed and density with devicemapper? Yep.”

Share