February 2018 - A completely revised and updated version of this article has been published. See A Practical Introduction to Container Terminology. The update includes coverage of container technologies beyond docker, such as CRI-O, rkt, lxc/lxd, and as well information on the Open Container Initiative (OCI).
January 13th, 2016
Background
When discussing an architecture for containerization, it’s important to have a solid grasp on the related vocabulary. One of the challenges people have is that many of the following terms are used interchangeably... often causing quite a bit of confusion for newcomers.
- Container
- Image
- Container Image
- Image Layer
- Index
- Registry
- Repository
- Tag
- Base Image
- Platform Image
- Layer
The goal of this article is to clarify these terms, so that we can speak the same language and develop solutions and architectures leveraging the value of containers. Note that I am going to assume that you know how to run basic docker commands, but if you need a primer, I recommend starting with: A Practical Introduction to Docker Containers.
Vocabulary
Repository
When using the Docker command, a repository is what is specified on the command line, not an image. In the following command, “rhel7” is the repository.
docker pull rhel7
This is actually expanded automatically to:
docker pull registry.access.redhat.com/rhel7:latest
This can be confusing, and many people refer to this as an image or a container image. In fact, the docker images sub-command is what is used to list the locally available repositories. Conceptually, these repositories can be thought about as container images, but it’s important to realize that these repositories are actually made up of layers.
docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE registry.access.redhat.com/rhel7 latest 6883d5422f4e 4 weeks ago 201.7 MB registry.access.redhat.com/rhel latest 6883d5422f4e 4 weeks ago 201.7 MB registry.access.redhat.com/rhel6 latest 05c3d56ba777 4 weeks ago 166.1 MB registry.access.redhat.com/rhel6/rhel latest 05c3d56ba777 4 weeks ago 166.1 MB ...
When we specify the repository on the command line, the Docker daemon is doing some extra work for you. The Docker daemon (not the client tool) is configured with a list of servers to search. In our example above, the damone will search for the “rhel7” repository on each of the configured servers.
In the above command, only the repository name was specified, but it’s also possible to specify a full URL with the Docker client. To highlight this, let’s start with dissecting a full URL.
Another way you will often see this specified is:
REGISTRY/NAMESPACE/REPOSITORY[:TAG]
The full URL is made up of a standard server name, a namespace, and optionally a tag. There are actually many permutations of how to specify a URL and as you explore the Docker ecosystem, you will find that many pieces are optional. The following commands are all valid and all pull some permutation of the same repository:
docker pull registry.access.redhat.com/rhel7/rhel:latest docker pull registry.access.redhat.com/rhel7/rhel docker pull registry.access.redhat.com/rhel7 docker pull rhel7/rhel:latest
Namespace
A namespace is a tool for separating groups of repositories. On the public DockerHub, the namespace is typically the username of the person sharing the image, but can also be a group name, or a logical name.
Red Hat uses the namespace to separate groups of repositories based on products listed on the Red Hat Federated Registry server. Here are some example results returned by registry.access.redhat.com. Notice, the last result is actually listed on other registry server. This is because Red Hat works to also list repositories on our partner’s registry serves:
registry.access.redhat.com/rhel7/rhel registry.access.redhat.com/openshift3/mongodb-24-rhel7 registry.access.redhat.com/rhscl/mongodb-26-rhel7 registry.access.redhat.com/rhscl_beta/mongodb-26-rhel7 registry-mariadbcorp.rhcloud.com/rhel7/mariadb-enterprise-server:10.0
Notice, that sometimes the full URL does not need specified. In this case, there is a default repository for a given namespace. If a user only specifies the fedora namespace, the latest tag from the default repository will be pulled to the local server.
docker pull fedora
Image Layer
Repositories are often referred to as images or container images, but actually they are made up of one or more layers. Image layers in a repository are connected together in a parent-child relationship. Each image layer represents changes between itself and the parent layer.
Below, we are going to inspect the layers of a repository on the local container host. First let’s check out what image layers are available in the Red Hat Enterprise Linux 7 repository. Notice that each layer has tag and a Universally Unique Identifier (UUID).
Since, Docker 1.7, there is no native tooling to inspect image layers in a local cache, but with the help of a tool called dockviz, you can quickly inspect all of the layers in a local repository. The following command will returned shortened versions of the UUID that are typically unique enough to work with on a single machine. If you need to the full UUID, use the --no-trunc option.
docker run --rm --privileged -v /var/run/docker.sock:/var/run/docker.sock nate/dockviz images -t
├─2332d8973c93 Virtual Size: 187.7 MB │ └─ea358092da77 Virtual Size: 187.9 MB │ └─a467a7c6794f Virtual Size: 187.9 MB │ └─ca4d7b1b9a51 Virtual Size: 187.9 MB │ └─4084976dd96d Virtual Size: 384.2 MB │ └─943128b20e28 Virtual Size: 386.7 MB │ └─db20cc018f56 Virtual Size: 386.7 MB │ └─45b3c59b9130 Virtual Size: 398.2 MB │ └─91275de1a5d7 Virtual Size: 422.8 MB │ └─e7a97058d51f Virtual Size: 422.8 MB │ └─d5c963edfcb2 Virtual Size: 422.8 MB │ └─5cfc0ce98e02 Virtual Size: 422.8 MB │ └─7728f71a4bcd Virtual Size: 422.8 MB │ └─0542f67da01b Virtual Size: 422.8 MB Tags: docker.io/registry:latest
Notice, that the “docker.io/registry” repositorie is actually made up of many images layers. More importantly, notice that a user could potentially “run” a container based off of any one of these layers. The following command is perfectly valid, though not guaranteed to have been test or work:
docker run -it 45b3c59b9130 bash
This is because when the image builder creates a new image, a new layer is created under certain condition. First, if the image builder is building the image manually, each “commit” creates a new layer. If the image builder is building an image with a Dockerfile, each directive in the file creates a new layer. It is useful to have visibility into what has changed in a container repository between each layer.
Base Image
Simply put, a base image is an image that has no parent layer. Typically, a base image contains a fresh copy of an operating system. Base images normally include the tools (yum, rpm, apt-get) necessary to install packages or update the image included in them.
These special base images can be created yourself, but are typically produced and published by open source projects and vendors like Red Hat. Provenance and trust of these base images is critical.
The sole purpose of a base image is to provide a starting place for creating your derivative images. When using a Dockerfile, the choice of which base image you are using is explicit:
FROM rhel7
Tag
Even though a user can run a container from any of the image layers, they shouldn’t necessarily do that. When an image builder creates a new repository, they will typically label the best image layers to use. These are called tags and typically map to versions of software contained in the repository.
To remotely view the available tags available in a repository, run the following command (the jq utility makes the output a lot more readable):
curl -s registry.access.redhat.com/v1/repositories/rhel7/tags | jq { "7.0-21": "e1f5733f050b2488a17b7630cb038bfbea8b7bdfa9bdfb99e63a33117e28d02f", "7.0-23": "bef54b8f8a2fdd221734f1da404d4c0a7d07ee9169b1443a338ab54236c8c91a", "7.0-27": "8e6704f39a3d4a0c82ec7262ad683a9d1d9a281e3c1ebbb64c045b9af39b3940", "7.1-11": "d0a516b529ab1adda28429cae5985cab9db93bfd8d301b3a94d22299af72914b", "7.1-12": "275be1d3d0709a06ff1ae38d0d5402bc8f0eeac44812e5ec1df4a9e99214eb9a", "7.1-16": "82ad5fa11820c2889c60f7f748d67aab04400700c581843db0d1e68735327443", "7.1-24": "c4f590bbcbe329a77c00fea33a3a960063072041489012061ec3a134baba50d6", "7.1-4": "10acc31def5d6f249b548e01e8ffbaccfd61af0240c17315a7ad393d022c5ca2", "7.1-6": "65de4a13fc7cf28b4376e65efa31c5c3805e18da4eb01ad0c8b8801f4a10bc16", "7.1-9": "e3c92c6cff3543d19d0c9a24c72cd3840f8ba3ee00357f997b786e8939efef2f", "7.2": "6c3a84d798dc449313787502060b6d5b4694d7527d64a7c99ba199e3b2df834e", "7.2-2": "58958c7fafb7e1a71650bc7bdbb9f5fd634f3545b00ec7d390b2075db511327d", "7.2-35": "6883d5422f4ec2810e1312c0e3e5a902142e2a8185cd3a1124b459a7c38dc55b", "7.2-38": "6c3a84d798dc449313787502060b6d5b4694d7527d64a7c99ba199e3b2df834e", "latest": "6c3a84d798dc449313787502060b6d5b4694d7527d64a7c99ba199e3b2df834e" }
To pull all of the available tags to the local container host and then inspect them, run the following commands. Notice that each of the tags maps to a version of RHEL embedded in the particular layer. Understanding this, can help you pull the desired layer to, for example, meet an OS requirement.
docker pull -a rhel7
docker images -a | grep rhel7 registry.access.redhat.com/rhel7 7.2 6c3a84d798dc 6 days ago 201.7 MB registry.access.redhat.com/rhel7 7.2-38 6c3a84d798dc 6 days ago 201.7 MB registry.access.redhat.com/rhel7 latest 6c3a84d798dc 6 days ago 201.7 MB registry.access.redhat.com/rhel7 7.2-35 6883d5422f4e 4 weeks ago 201.7 MB registry.access.redhat.com/rhel7 7.1-24 c4f590bbcbe3 5 weeks ago 158.2 MB registry.access.redhat.com/rhel7 7.1-16 82ad5fa11820 12 weeks ago 158.3 MB registry.access.redhat.com/rhel7 7.2-2 58958c7fafb7 3 months ago 201.6 MB registry.access.redhat.com/rhel7 7.1-12 275be1d3d070 4 months ago 158.3 MB registry.access.redhat.com/rhel7 7.1-11 d0a516b529ab 4 months ago 158.2 MB registry.access.redhat.com/rhel7 7.1-9 e3c92c6cff35 5 months ago 158.2 MB registry.access.redhat.com/rhel7 7.1-6 65de4a13fc7c 7 months ago 154.9 MB registry.access.redhat.com/rhel7 7.0-27 8e6704f39a3d 10 months ago 145.1 MB registry.access.redhat.com/rhel7 7.1-4 10acc31def5d 10 months ago 154.1 MB registry.access.redhat.com/rhel7 7.0-23 bef54b8f8a2f 18 months ago 147 MB registry.access.redhat.com/rhel7 7.0-21 e1f5733f050b 18 months ago 140.2 MB
Registry Server
A registry server, is essentially a fancy file server that is used store Docker repositories. Typically, the registry server is specified as a normal DNS name and optionally a port number to connect to. Much of the value in the Docker ecosystem comes from the ability to push and pull repositories from registry servers.
When a Docker daemon does not have a locally cached copy of a repository, it will automatically pull it from a registry server. By default, Red Hat enterprise Linux is configured to pull repositories from registry.access.redhat.com first, then it will try the docker.io (Docker Hub).
It is important to stress, that there is implicit trust in the registry server. You must determine how much you trust the content provided by the registry and you may want to allow or block certain registries. In addition to security, there are other concerns such as users having access to licensed software and compliance issues. The simplicity with which Docker allows users to pull software makes it critical that you trust upstream content.
In Red Hat Enterprise Linux, the default docker registry is configurable. Specific registry servers can be added or blocked in RHEL7 and RHEL7 Atomic by modifying the configuration file:
vi /etc/sysconfig/docker
In RHEL7 and RHEL 7 Atomic, Red Hat’s registry server is configured out of the box:
ADD_REGISTRY='--add-registry registry.access.redhat.com'
As a matter of security, it may be useful to block public Docker repositories such as DockerHub:
# BLOCK_REGISTRY='--block-registry'
Container Host
Once an image (aka repository) is pulled from a registry server, to the local container host, it is said to be in the local cache.
Determining which repositories are synchronized to the local cache can be determined with the following command:
[root@rhel7 ~]# docker images -a
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE registry.access.redhat.com/rhel7 latest 6883d5422f4e 3 weeks ago 201.7 MB
Graph Driver
Every time a container is created on a container host, all of the dependent image layers are used together read only. Another read/write layer is then added so that you may write data like a normal process. The graph driver is the piece of software that maps the different image layers in the repository to the local storage. The local storage can be a filesystem, or block storage depending on the driver. Drivers include: aufs, devicemapper, btrfs, zfs, and overlayfs. Determining which graph driver you are using can be done with the docker info command:
[root@rhel7 ~]# docker info
... Storage Driver: devicemapper Pool Name: docker-253:1-884266-pool Pool Blocksize: 65.54 kB Backing Filesystem: xfs Data file: /dev/loop0 Metadata file: /dev/loop1 Data Space Used: 3.037 GB Data Space Total: 107.4 GB Data Space Available: 2.56 GB Metadata Space Used: 2.707 MB Metadata Space Total: 2.147 GB Metadata Space Available: 2.145 GB Udev Sync Supported: true Deferred Removal Enabled: false Data loop file: /var/lib/docker/devicemapper/devicemapper/data Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata Library Version: 1.02.107-RHEL7 (2015-10-14)
...
Conclusion
People often use the words container, image, container image and repository interchangeably and the docker sub-commands don’t make a distinction between an image and a repository. The commands are quite easy to use, but once architecture discussions start, it’s important to understand that a repository is really the central data structure.
It’s also quite easy to misunderstand the difference between a namespace, repository, image layer, and tag. Each of these has an architectural purpose. While different vendors, and users are using them for different purposes, they are tools in our toolbox.
The goal of this article is to leave you with the ability to command this nomenclature so that more sophisticated architectures can be created. For example, imagine that you have just been charged with building an infrastructure that limits, based on role, which namespaces, repositories, and even which image layers and tags can be pushed and pulled from based on business rules….
For further reading, check out the Architecting Containers series:
- Architecting Containers Part 1: Why Understanding User Space vs. Kernel Space Matters
- Architecting Containers Part 2: Why the User Space Matters
- Architecting Containers Part 3: How the User Space Affects Your Application
As always, if you have comments or questions, please leave a message below.
February 2018 - A completely revised and updated version of this article has been published. See A Practical Introduction to Container Terminology. The update includes coverage of container technologies beyond docker, such as CRI-O, rkt, lxc/lxd, and as well information on the Open Container Initiative (OCI).
Last updated: February 22, 2024