A Practical Introduction to Docker Container Terminology

February 2018 - A completely revised and updated version of this article has been published.  See A Practical Introduction to Container Terminology. The update includes coverage of container technologies beyond docker, such as CRI-O, rkt, lxc/lxd, and as well information on the Open Container Initiative (OCI).

 


 

January 13th, 2016

Background

When discussing an architecture for containerization, it’s important to have a solid grasp on the related vocabulary. One of the challenges people have is that many of the following terms are used interchangeably... often causing quite a bit of confusion for newcomers.

  • Container
  • Image
  • Container Image
  • Image Layer
  • Index
  • Registry
  • Repository
  • Tag
  • Base Image
  • Platform Image
  • Layer

 

The goal of this article is to clarify these terms, so that we can speak the same language and develop solutions and architectures leveraging the value of containers. Note that I am going to assume that you know how to run basic docker commands, but if you need a primer, I recommend starting with: A Practical Introduction to Docker Containers.

Vocabulary

 

Repository

When using the Docker command, a repository is what is specified on the command line, not an image. In the following command, “rhel7” is the repository.

docker pull rhel7

 

This is actually expanded automatically to:

docker pull registry.access.redhat.com/rhel7:latest

 

This can be confusing, and many people refer to this as an image or a container image. In fact, the docker images sub-command is what is used to list the locally available repositories. Conceptually, these repositories can be thought about as container images, but it’s important to realize that these repositories are actually made up of layers.

docker images
REPOSITORY                                  TAG                     IMAGE ID                CREATED                 VIRTUAL SIZE
registry.access.redhat.com/rhel7            latest                  6883d5422f4e            4 weeks ago             201.7 MB
registry.access.redhat.com/rhel             latest                  6883d5422f4e            4 weeks ago             201.7 MB
registry.access.redhat.com/rhel6            latest                  05c3d56ba777            4 weeks ago             166.1 MB
registry.access.redhat.com/rhel6/rhel       latest                  05c3d56ba777            4 weeks ago             166.1 MB
...

 

When we specify the repository on the command line, the Docker daemon is doing some extra work for you. The Docker daemon (not the client tool) is configured with a list of servers to search. In our example above, the damone will search for the “rhel7” repository on each of the configured servers.

In the above command, only the repository name was specified, but it’s also possible to specify a full URL with the Docker client. To highlight this, let’s start with dissecting a full URL.


Another way you will often see this specified is:

REGISTRY/NAMESPACE/REPOSITORY[:TAG]

 

The full URL is made up of a standard server name, a namespace, and optionally a tag. There are actually many permutations of how to specify a URL and as you explore the Docker ecosystem, you will find that many pieces are optional. The following commands are all valid and all pull some permutation of the same repository:

docker pull registry.access.redhat.com/rhel7/rhel:latest
docker pull registry.access.redhat.com/rhel7/rhel
docker pull registry.access.redhat.com/rhel7
docker pull rhel7/rhel:latest

 

Namespace

A namespace is a tool for separating groups of repositories. On the public DockerHub, the namespace is typically the username of the person sharing the image, but can also be a group name, or a logical name.

Red Hat uses the namespace to separate groups of repositories based on products listed on the Red Hat Federated Registry server. Here are some example results returned by registry.access.redhat.com. Notice, the last result is actually listed on other registry server. This is because Red Hat works to also list repositories on our partner’s registry serves:

registry.access.redhat.com/rhel7/rhel
registry.access.redhat.com/openshift3/mongodb-24-rhel7
registry.access.redhat.com/rhscl/mongodb-26-rhel7
registry.access.redhat.com/rhscl_beta/mongodb-26-rhel7
registry-mariadbcorp.rhcloud.com/rhel7/mariadb-enterprise-server:10.0

 

Notice, that sometimes the full URL does not need specified. In this case, there is a default repository for a given namespace. If a user only specifies the fedora namespace, the latest tag from the default repository will be pulled to the local server.

docker pull fedora

 

Image Layer

Repositories are often referred to as images or container images, but actually they are made up of one or more layers. Image layers in a repository are connected together in a parent-child relationship. Each image layer represents changes between itself and the parent layer.

Below, we are going to inspect the layers of a repository on the local container host. First let’s check out what image layers are available in the Red Hat Enterprise Linux 7 repository. Notice that each layer has tag and a Universally Unique Identifier (UUID).

Since, Docker 1.7, there is no native tooling to inspect image layers in a local cache, but with the help of a tool called dockviz, you can quickly inspect all of the layers in a local repository. The following command will returned shortened versions of the UUID that are typically unique enough to work with on a single machine. If you need to the full UUID, use the --no-trunc option.

docker run --rm --privileged -v /var/run/docker.sock:/var/run/docker.sock nate/dockviz images -t
├─2332d8973c93 Virtual Size: 187.7 MB
│ └─ea358092da77 Virtual Size: 187.9 MB
│   └─a467a7c6794f Virtual Size: 187.9 MB
│         └─ca4d7b1b9a51 Virtual Size: 187.9 MB
│           └─4084976dd96d Virtual Size: 384.2 MB
│             └─943128b20e28 Virtual Size: 386.7 MB
│               └─db20cc018f56 Virtual Size: 386.7 MB
│                 └─45b3c59b9130 Virtual Size: 398.2 MB
│                   └─91275de1a5d7 Virtual Size: 422.8 MB
│                     └─e7a97058d51f Virtual Size: 422.8 MB
│                       └─d5c963edfcb2 Virtual Size: 422.8 MB
│                         └─5cfc0ce98e02 Virtual Size: 422.8 MB
│                           └─7728f71a4bcd Virtual Size: 422.8 MB
│                             └─0542f67da01b Virtual Size: 422.8 MB Tags: docker.io/registry:latest

 

Notice, that the “docker.io/registry” repositorie is actually made up of many images layers. More importantly, notice that a user could potentially “run” a container based off of any one of these layers. The following command is perfectly valid, though not guaranteed to have been test or work:

docker run -it 45b3c59b9130 bash

 

This is because when the image builder creates a new image, a new layer is created under certain condition. First, if the image builder is building the image manually, each “commit” creates a new layer. If the image builder is building an image with a Dockerfile, each directive in the file creates a new layer. It is useful to have visibility into what has changed in a container repository between each layer.

Base Image

Simply put, a base image is an image that has no parent layer. Typically, a base image contains a fresh copy of an operating system. Base images normally include the tools (yum, rpm, apt-get) necessary to install packages or update the image included in them.

These special base images can be created yourself, but are typically produced and published by open source projects and vendors like Red Hat. Provenance and trust of these base images is critical.

The sole purpose of a base image is to provide a starting place for creating your derivative images. When using a Dockerfile, the choice of which base image you are using is explicit:

FROM rhel7

 

Tag

Even though a user can run a container from any of the image layers, they shouldn’t necessarily do that. When an image builder creates a new repository, they will typically label the best image layers to use. These are called tags and typically map to versions of software contained in the repository.

To remotely view the available tags available in a repository, run the following command (the jq utility makes the output a lot more readable):

curl -s registry.access.redhat.com/v1/repositories/rhel7/tags | jq
{
  "7.0-21": "e1f5733f050b2488a17b7630cb038bfbea8b7bdfa9bdfb99e63a33117e28d02f",
  "7.0-23": "bef54b8f8a2fdd221734f1da404d4c0a7d07ee9169b1443a338ab54236c8c91a",
  "7.0-27": "8e6704f39a3d4a0c82ec7262ad683a9d1d9a281e3c1ebbb64c045b9af39b3940",
  "7.1-11": "d0a516b529ab1adda28429cae5985cab9db93bfd8d301b3a94d22299af72914b",
  "7.1-12": "275be1d3d0709a06ff1ae38d0d5402bc8f0eeac44812e5ec1df4a9e99214eb9a",
  "7.1-16": "82ad5fa11820c2889c60f7f748d67aab04400700c581843db0d1e68735327443",
  "7.1-24": "c4f590bbcbe329a77c00fea33a3a960063072041489012061ec3a134baba50d6",
  "7.1-4": "10acc31def5d6f249b548e01e8ffbaccfd61af0240c17315a7ad393d022c5ca2",
  "7.1-6": "65de4a13fc7cf28b4376e65efa31c5c3805e18da4eb01ad0c8b8801f4a10bc16",
  "7.1-9": "e3c92c6cff3543d19d0c9a24c72cd3840f8ba3ee00357f997b786e8939efef2f",
  "7.2": "6c3a84d798dc449313787502060b6d5b4694d7527d64a7c99ba199e3b2df834e",
  "7.2-2": "58958c7fafb7e1a71650bc7bdbb9f5fd634f3545b00ec7d390b2075db511327d",
  "7.2-35": "6883d5422f4ec2810e1312c0e3e5a902142e2a8185cd3a1124b459a7c38dc55b",
  "7.2-38": "6c3a84d798dc449313787502060b6d5b4694d7527d64a7c99ba199e3b2df834e",
  "latest": "6c3a84d798dc449313787502060b6d5b4694d7527d64a7c99ba199e3b2df834e"
}

 

To pull all of the available tags to the local container host and then inspect them, run the following commands. Notice that each of the tags maps to a version of RHEL embedded in the particular layer. Understanding this, can help you pull the desired layer to, for example, meet an OS requirement.

docker pull -a rhel7
docker images -a | grep rhel7
registry.access.redhat.com/rhel7            7.2                     6c3a84d798dc            6 days ago              201.7 MB
registry.access.redhat.com/rhel7            7.2-38                  6c3a84d798dc            6 days ago              201.7 MB
registry.access.redhat.com/rhel7            latest                  6c3a84d798dc            6 days ago              201.7 MB
registry.access.redhat.com/rhel7            7.2-35                  6883d5422f4e            4 weeks ago             201.7 MB
registry.access.redhat.com/rhel7            7.1-24                  c4f590bbcbe3            5 weeks ago             158.2 MB
registry.access.redhat.com/rhel7            7.1-16                  82ad5fa11820            12 weeks ago            158.3 MB
registry.access.redhat.com/rhel7            7.2-2                   58958c7fafb7            3 months ago            201.6 MB
registry.access.redhat.com/rhel7            7.1-12                  275be1d3d070            4 months ago            158.3 MB
registry.access.redhat.com/rhel7            7.1-11                  d0a516b529ab            4 months ago            158.2 MB
registry.access.redhat.com/rhel7            7.1-9                   e3c92c6cff35            5 months ago            158.2 MB
registry.access.redhat.com/rhel7            7.1-6                   65de4a13fc7c            7 months ago            154.9 MB
registry.access.redhat.com/rhel7            7.0-27                  8e6704f39a3d            10 months ago           145.1 MB
registry.access.redhat.com/rhel7            7.1-4                   10acc31def5d            10 months ago           154.1 MB
registry.access.redhat.com/rhel7            7.0-23                  bef54b8f8a2f            18 months ago           147 MB
registry.access.redhat.com/rhel7            7.0-21                  e1f5733f050b            18 months ago           140.2 MB

Registry Server

A registry server, is essentially a fancy file server that is used store Docker repositories. Typically, the registry server is specified as a normal DNS name and optionally a port number to connect to. Much of the value in the Docker ecosystem comes from the ability to push and pull repositories from registry servers.


When a Docker daemon does not have a locally cached copy of a repository, it will automatically pull it from a registry server. By default, Red Hat enterprise Linux is configured to pull repositories from registry.access.redhat.com first, then it will try the docker.io (Docker Hub).

It is important to stress, that there is implicit trust in the registry server. You must determine how much you trust the content provided by the registry and you may want to allow or block certain registries. In addition to security, there are other concerns such as users having access to licensed software and compliance issues. The simplicity with which Docker allows users to pull software makes it critical that you trust upstream content.

In Red Hat Enterprise Linux, the default docker registry is configurable. Specific registry servers can be added or blocked in RHEL7 and RHEL7 Atomic by modifying the configuration file:

vi /etc/sysconfig/docker

 

In RHEL7 and RHEL 7 Atomic, Red Hat’s registry server is configured out of the box:

ADD_REGISTRY='--add-registry registry.access.redhat.com'

 

As a matter of security, it may be useful to block public Docker repositories such as DockerHub:

# BLOCK_REGISTRY='--block-registry'

 

Container Host

Once an image (aka repository) is pulled from a registry server, to the local container host, it is said to be in the local cache.

Determining which repositories are synchronized to the local cache can be determined with the following command:

[root@rhel7 ~]# docker images -a
REPOSITORY                             TAG                     IMAGE ID                CREATED                 VIRTUAL SIZE
registry.access.redhat.com/rhel7   latest                  6883d5422f4e            3 weeks ago             201.7 MB

 

Graph Driver

Every time a container is created on a container host, all of the dependent image layers are used together read only. Another read/write layer is then added so that you may write data like a normal process. The graph driver is the piece of software that maps the different image layers in the repository to the local storage. The local storage can be a filesystem, or block storage depending on the driver. Drivers include: aufs, devicemapper, btrfs, zfs, and overlayfs. Determining which graph driver you are using can be done with the docker info command:

[root@rhel7 ~]# docker info
...
Storage Driver: devicemapper
 Pool Name: docker-253:1-884266-pool
 Pool Blocksize: 65.54 kB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 3.037 GB
 Data Space Total: 107.4 GB
 Data Space Available: 2.56 GB
 Metadata Space Used: 2.707 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.145 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.107-RHEL7 (2015-10-14)

...

Conclusion

People often use the words container, image, container image and repository interchangeably and the docker sub-commands don’t make a distinction between an image and a repository. The commands are quite easy to use, but once architecture discussions start, it’s important to understand that a repository is really the central data structure.

It’s also quite easy to misunderstand the difference between a namespace, repository, image layer, and tag. Each of these has an architectural purpose. While different vendors, and users are using them for different purposes, they are tools in our toolbox.


The goal of this article is to leave you with the ability to command this nomenclature so that more sophisticated architectures can be created. For example, imagine that you have just been charged with building an infrastructure that limits, based on role, which namespaces, repositories, and even which image layers and tags can be pushed and pulled from based on business rules….

For further reading, check out the Architecting Containers series:

As always, if you have comments or questions, please leave a message below.


February 2018 - A completely revised and updated version of this article has been published.  See A Practical Introduction to Container Terminology. The update includes coverage of container technologies beyond docker, such as CRI-O, rkt, lxc/lxd, and as well information on the Open Container Initiative (OCI).

Last updated: February 22, 2024