It’s been a while since our last deep-dive into the Docker project graph driver performance. Over two years, in fact! In that time, Red Hat engineers have made major strides in improving container storage:
- Introduced the docker-storage-setup package to help make configuring devicemapper-based storage a snap.
- Introduced full support for overlay FS in RHEL7.2+ when used with containers
- Introduced overlay2 as Tech Preview mode
- Gotten SELinux support to both overlay and overlay2 merged into upstream kernel 4.9
- Added a warning message for when folks are using loop-lvm
All of that, in the name of providing enterprise-class stability, security and supportability to our valued customers.
As discussed in our previous blog, there are a particular set of behaviors and attributes to take into account when choosing a graph driver. Included in those are page cache sharing, POSIX compliance and SELinux support.
Reviewing the technical differences between a union filesystem and devicemapper graph driver as it relates to performance, standards compliance and density, a union filesystem such as overlay2 is fast because
- It traverses less kernel and devicemapper code on container creation (devicemapper-backed containers get a unique kernel device allocated at startup).
- Containers sharing the same base image startup faster because of warm page cache
- For speed/density benefits, you trade POSIX compliance and SELinux (well, not for long!)
There was no single graph driver that could give you all these attributes at the same time — until now.
How we can make devicemapper as fast as overlay2
With the industry move towards microservices, 12-factor guidelines and dense multi-tenant platforms, many folks both inside Red Hat as well as in the community have been discussing read-only containers. In fact, there’s been a –read-only option to both the Docker project, and kubernetes for a long time. What this does is create a mount point as usual for the container, but mount it read-only as opposed to read-write. Read-only containers are an important security improvement as well as they reduce the container’s attack surface. More details on this can be found in a blog post from Dan Walsh last year.
When a container is launched in this mode, it can no longer write to locations it may expect to (i.e. /var/log) and may throw errors because of this. As discussed in the Processes section of 12factor.net, re-architected applications should store stateful information (such as logs or web assets) in a stateful backing service. Attaching a persistent volume that is read-write fulfills this design aspect: the container can be restarted anywhere in the cluster, and its persistent volume can follow it.
In other words, for applications that are not completely stateless an ideal deployment would be to couple read-only containers with read-write persistent volumes. This gets us to a place in the container world that the HPC (high performance/scientific computing) world has been at for decades: thousands of diskless, read-only NFS-root booted nodes that mount their necessary applications and storage over the network at boot time. No matter if a node dies…boot another. No matter if a container dies…start another.
Continue reading “Docker project: Can you have overlay2 speed and density with devicemapper? Yep.”