Introduction
Today I want to write about the options available to limit resources in use for running performance tests in a shared environment. A very powerful tool for this is cgroups [1] - a Linux kernel feature that allows limiting the resource usage (CPU, memory, disk I/O, etc..) of a collection of processes.
Nowadays it is easy with virtual machines or container technologies, like Docker, which is using cgroups by the way, to compartmentilize applications and make sure that they are not eating resources that have not been allocated to them. But you may face, as I did, situations where these technologies are not available and if your application happens to run on a Linux distribution like RHEL 6 and later you can directly rely on cgroups for this purpose.
I was visiting a customer who was in the process of migrating his servers to a virtualisation platform. At that point in time they had a virtualised production platform, an integration platform where the tests were to be performed, was based on physical hardware having specs, which were not aligned with the production platform the application was running on. A server of the integration platform was indeed used to run two applications, split into two different VMs in production. But I wasn't either able or willing to create load for both applications. The performance tests were designed for only one of them. If I was to run them just like that I would not have results representative of the production platform. The customer was indeed putting the virtualised environment into question as they were seeing degraded performance compared to the physical environment used for the integration tests. I decided to use Linux capabilities to limit CPU access and make my tests an acceptable simulation of the production. By doing this I helped the customer regaining trust in their new virtualisation environment.
Another scenario, where one may use cgroups is for evaluating the resource specifications required by an application. It is then easy on the same server to perform tests with one, two, four or sixteen CPUs (provided you have them available). You could tell me that the same can easily be achieved with the hypervisor of a virtualised environment and you would be right. But if you don't have access to it? The process for getting the person having the rights to apply your changes may take days rather than minutes.
So let's come to the point now and look at how easy it is to configure cgroups for this purpose after I have enumerated a few important caveats:
- You need sudo rights to change the cgroups configuration
- Be careful if you already have cgroups configured on your system. It is easy to destroy your existing configuration. Don't blame me for that. Know what you are doing.
If one of these points prevent you of using cgroups, taskset described later in this blog entry may be a good alternative. Though, it only applies to CPU affinity whereas cgroups can be used to limit other resource types as well.
Installation
Cgroup is already installed on RHEL 7.
On RHEL 6 the libcgroup package is the easiest way to work with cgroups and its installation is straightforward: yum install libcgroup
However this package is deprecated on RHEL 7 since it can easily create conflicts with the default cgroup hierarchy.
Configuration
Cgroups are organized hierarchically, like processes, and child cgroups inherit some of the attributes of their parents. A subsystem (also called resource controller) represents a single resource, such as CPU time or memory. Note that the configuration differs between RHEL 6 and 7. See Red Hat Enterprise Linux 6/7 Resource Management Guide [2] and [2'] for more information.
RHEL 7
By default systemd automatically creates a hierarchy of slice, scope and service units. Also systemd automatically mounts hierarchies for important resource controllers in the /sys/fs/cgroup/ directory.
Services are started by Systemd whereas scope groups are externally created processes. Slices organize a hierarchy in which scopes and services are placed. Processes are attached to services and scopes, not to slices. A service is a process or a group of processes, which are started by systemd, whereas a scope is a process or a group of processes, which are created externally (fork).
So let's start a new process. The command systemd-run allows to start a new process and to attach it to a specific transient unit. If the unit does not exist it will get created. In the same way a slice can be created with the option --slice. Per default the command creates a new service. The option --scope allows the creation of a scope.
We have now a new slice and it is possible to visualise it:
fredslice.slice
Loaded: loaded
Active: active since Sat 2015-08-15 09:04:09 CEST; 9min ago
CGroup: /fredslice.slice
└─fredunit.scope
└─4277 /bin/top
In the same way it is possible to display the new scope:
fredunit.scope - /bin/top
Loaded: loaded (/run/systemd/system/fredunit.scope; static)
Drop-In: /run/systemd/system/fredunit.scope.d
└─90-Description.conf, 90-SendSIGHUP.conf, 90-Slice.conf
Active: active (running) since Sat 2015-08-15 09:04:09 CEST; 8min ago
CGroup: /fredslice.slice/fredunit.scope
└─4277 /bin/top
Transient cgroups are stored in /run/systemd/system.
Let look at our new process:
10:hugetlb:/
9:perf_event:/
8:blkio:/
7:net_cls:/
6:freezer:/
5:devices:/
4:memory:/
3:cpuacct,cpu:/
2:cpuset:/
1:name=systemd:/fredslice.slice/fredunit.scope
Cgroups use subsystems (also called resource controllers) to represent a single resource, such as CPU time or memory. cpuset [3] provides a mechanism for assigning a set of CPUs and Memory nodes to a set of tasks.
If the cgroup is not already mounted it can easily be done:
We can now navigate to the cpuset hierarchy and create a new cpuset called fred:
# sudo mkdir fred
It is possible to move to the newly created cpuset and to configure it, for instance to allocate the 1st CPU to it
# echo 0 | sudo tee -a cpuset.cpus
We will need a memory node as well
And we can attach our process to the newly created cpuset
Let look at the process configuration again. The cpuset fred is now attached to it:
10:hugetlb:/
9:perf_event:/ bin
8:blkio:/
7:net_cls:/
6:freezer:/
5:devices:/
4:memory:/
3:cpuacct,cpu:/
2:cpuset:/fred
1:name=systemd:/fredslice.slice/fredunit.scope
For tests we may also want to limit the amount of memory available to processes. Therefore we can use the memory limit setting:
And this now appears at the process level:
10:hugetlb:/
9:perf_event:/
8:blkio:/
7:net_cls:/
6:freezer:/
5:devices:/
4:memory:/fredslice.slice/fredunit.scope
3:cpuacct,cpu:/
2:cpuset:/fred
1:name=systemd:/fredslice.slice/fredunit.scope
After reloading the systemd daemon there is now an additional file 90-MemoryLimit.conf in the cgroup:
# systemctl status fredunit.scope
fredunit.scope - /bin/top
Loaded: loaded
Drop-In: /run/systemd/system/fredunit.scope.d
└─90-Description.conf, 90-MemoryLimit.conf, 90-SendSIGHUP.conf, 90-Slice.conf
Active: active (running) since Sat 2015-08-15 12:07:08 CEST; 19min ago
CGroup: /fredslice.slice/fredunit.scope
└─14869 /bin/top
This file contains the memory limit
[Scope]
MemoryLimit=1073741824
Looks good!
We have achieved our aim of limiting a process to a single CPU and to 1G of memory.
Using the exact same mechanisms it is possible to start a shell. The good thing is that every process started from this shell will then inherit the cgroup configuration.
To start the shell and to set its memory limit:
# sudo systemctl set-property --runtime fredunit.scope MemoryLimit=1G
# sudo systemctl daemon-reload
We can control that the shell process has been bound to the cgroup and that the memory limitation has been applied:
systemctl status fredunit.scope
fredunit.scope - /bin/sh
Loaded: loaded
Drop-In: /run/systemd/system/fredunit.scope.d
└─90-Description.conf, 90-MemoryLimit.conf, 90-SendSIGHUP.conf, 90-Slice.conf
Active: active (running) since Sat 2015-08-15 12:37:30 CEST; 3min 0s ago
CGroup: /fredslice.slice/fredunit.scope
└─16842 /bin/sh
Again we add the shell process to the tasks of the fred cpuset we have created earlier and it gets reflected in the cgroup attached to the process:
# more /proc/16842/cgroup
10:hugetlb:/
9:perf_event:/
8:blkio:/
7:net_cls:/
6:freezer:/
5:devices:/
4:memory:/fredslice.slice/fredunit.scope
3:cpuacct,cpu:/
2:cpuset:/fred
1:name=systemd:/fredslice.slice/fredunit.scope
From the newly created shell we can start an application:
And we can control in another shell that the new process has inherited the cgroup configuration:
fredunit.scope - /bin/sh
Loaded: loaded
Drop-In: /run/systemd/system/fredunit.scope.d
└─90-Description.conf, 90-MemoryLimit.conf, 90-SendSIGHUP.conf, 90-Slice.conf
Active: active (running) since Sat 2015-08-15 12:37:30 CEST; 5min ago
CGroup: /fredslice.slice/fredunit.scope
├─16842 /bin/sh
└─17163 top# cat /proc/17163/cgroup
10:hugetlb:/
9:perf_event:/
8:blkio:/
7:net_cls:/
6:freezer:/
5:devices:/
4:memory:/fredslice.slice/fredunit.scope
3:cpuacct,cpu:/
2:cpuset:/fred
1:name=systemd:/fredslice.slice/fredunit.scope
Awesome!
RHEL 6
I haven't tested the settings for RHEL 6 but by using libcgroup cgroups can be configured in /etc/cgconfig.conf.
The default /etc/cgconfig.conf file installed with the libcgroup package creates and mounts an individual hierarchy for each subsystem, and attaches the subsystems to these hierarchies. You can mount subsystems and define groups with different accesses to CPU and memory as follows:
cpuset = /cgroup/cpuset;
memory = /cgroup/memory;
}# no limitation group
group nolimit {
cpuset {
# No alternate memory nodes if the system is not NUMA
cpuset.mems="0";
# Make my 2 CPU cores available to tasks
cpuset.cpus="0,1";
}
memory {
# Allocate my 4 GB of memory to tasks
memory.limit_in_bytes="4G";
}
}# group with limitation
group limited {
cpuset {
# No alternate memory nodes if the system is not NUMA
cpuset.mems="0";
# Make only one of my CPU cores available to tasks
cpuset.cpus="0";
}
memory {
# Allocate at most 2 GB of memory to tasks
memory.limit_in_bytes="2G";
}
}
You must restart the cgconfig service for the changes in /etc/cgconfig.conf to take effect. Note that restarting this service causes the entire cgroup hierarchy to be rebuilt, which removes any previouly existing cgroups.
To start a process in a control group: cgexec -g controllers:path_to_cgroup command arguments
It is also possible to add the --sticky option before the command to keep any child processes in the same cgroup.
Taskset as an alternative
If you don't have the rights for using cgroups or you only want to limit the number of CPU cores used by your application it is still possible to do it with taskset. You would then rely on the JVM settings for memory limitation if you want to test a java application. This is obviously not the same but it may be an acceptable approximation. As you can read in the "man" description taskset [4] is used to set or retrieve the CPU affinity of a running process given its PID or to launch a new COMMAND with a given CPU affinity. CPU affinity is a scheduler property that "bonds" a process to a given set of CPUs on the system. The Linux scheduler will honor the given CPU affinity and the process will not run on any other CPUs.
For instance if you have one java process running taskset -p provides information on the current affinity mask:
frederic 971 1 99 17:56 pts/0 00:00:09 java...
# taskset -p 971
pid 971's current affinity mask: 3
With taskset -cp you can then get the list of processors that your process is bound to:
pid 971's current affinity list: 0,1
How to interpret the result of these two commands?
Let look at what "man" says: "The CPU affinity is represented as a bitmask, with the lowest order bit corresponding to the first logical CPU and the highest order bit corresponding to the last logical CPU."
I have two CPUs 0x01 and 0x10, which are allocated to my process, which matches the mask 0x11 and this is 3 in base 16.
Let take another example that may explain it better. If you have 8 processors, they will be numbered this way:
00000001
00000010
00000100
00001000
00010000
00100000
01000000
10000000
if your process has access to the 8 of them, the mask will be 11111111, which is 0x1 + 0x2 + 0x4 + 0x8 + 0x10 + 0x20 + 0x40 + 0x80 = 0xff in base 16.
if your process has only access to the 1st and 2nd one, as it was the case for me the mask is 11 or 3 in base 16.
now if the process has only access to the 1st and the 8th one, the mask would be 10000001, which is 0x51 in base 16.
Note my process runs in a virtual machine, which I have allocated two CPUs to. It is not that I should through my laptop to the garbage. ;o)
Well, now that we know what we are doing we can allocate the process to a single CPU, the first one for instance.
taskset -p is the command to use. To have the process bound to the first processor:
pid 971's current affinity mask: 3
pid 971's new affinity mask: 1
if you wanted to run a new command rather than operating on an existing one, which is actually the default behavior it is not more complicate: taskset [arguments]
Et voilà!
This blog post was originally published at: https://bricks4bridges.wordpress.com/2015/08/16/cgroups-for-perf-tests/
[1] https://en.wikipedia.org/wiki/Cgroups
[2] https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu_and_memory-use_case.html
[2'] https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/pdf/Resource_Management_Guide/Red_Hat_Enterprise_Linux-7-Resource_Management_Guide-en-US.pdf
[3] https://www.kernel.org/doc/Documentation/cgroups/cpusets.txt
[4] https://en.wikipedia.org/wiki/Processor_affinity
PS: I hurried up for writing this blog entry as it may get quickly deprecated with the rocket adoption of containers ;-)
Last updated: February 23, 2024