Containers are often the unit of deployment in modern applications. An application is built into one or more container images using Docker or Podman, and then those images are deployed into production.
This article dives into the discussions that went into creating the Building Good Containers section of the Node.js reference architecture. That section focuses on how the container is built, versus how to structure an application for deployment in a container. Other sections in the reference architecture, like Health Checks and Logging, cover how to structure an application for cloud-native deployments.
Read the series so far:
- Part 1: Overview of the Node.js reference architecture
- Part 2: Logging in Node.js
- Part 3: Code consistency in Node.js
- Part 4: GraphQL in Node.js
- Part 5: Building good containers
- Part 6: Choosing web frameworks
- Part 7: Code coverage
- Part 8: Typescript
- Part 9: Securing Node.js applications
What makes a good production container?
Before we dive into the recommendations for building good containers, what do we mean by a "good" container in the first place? What this means to the Red Hat and IBM team members is that the container:
- Applies best practices for security.
- Is a reasonable size.
- Avoids common pitfalls with running a process in a container.
- Can take advantage of the resources provided to it.
- Includes what’s needed to debug production issues when they occur.
While the relative priority between these can differ across teams, these were generally important based on our experience.
What base images to start with?
In most cases, teams build their containers based on a pre-existing image that includes at least the operating system (OS) and commonly also includes the runtime—in our case, Node.js. In order to build good containers, it is important to start on solid footing by choosing a base container that is well maintained, is scanned and updated when vulnerabilities are reported, keeps up with new versions of the runtime, and (if required by your organization) has commercial support. The reference architecture includes two sections that talk about containers: Container images and Commercially Supported Containers. Most of the teams within Red Hat and IBM are already using or moving toward using the Node.js Red Hat Universal Base Images (UBI) for Node.js deployments.
Apply security best practices
The first thing we talked about with respect to building good containers is making sure we applied security best practices. The two recommendations that came from these discussions were:
- Build containers so that your application runs as non-root.
- Avoid reserved (privileged) ports (1–1023) inside the container.
The reason for building containers so that your application runs as non-root is well-documented, and we found it was a common practice across the team members. For a good article that dives into the details, see Processes In Containers Should Not Run As Root.
Why should you avoid using reserved (privileged) ports (1-1023)? Docker or Kubernetes will just map the port to something different anyway, right? The problem is that applications not running as root normally cannot bind to ports 1-1023, and while it might be possible to allow this when the container is started, you generally want to avoid it. In addition, the Node.js runtime has some limitations that mean if you add the privileges needed to run on those ports when starting the container, you can no longer do things like set additional certificates in the environment. Since the ports will be mapped anyway, there is no good reason to use a reserved (privileged) port. Avoiding them can save you trouble in the future.
A real-world example: A complicated migration
Using reserved (privileged) ports inside a container led to a complicated migration process for one of our teams when they later wanted to move to a new base container that was designed to run applications as non-root.
The team had many microservices all using the same set of internal ports, and they wanted to be able to slowly update and deploy individual microservices without having to modify the configurations outside of the container. Using different ports internally would have meant they would have to maintain the knowledge of which microservices used which ports internally, and that would make the configuration more complex and harder to maintain. The problem was that with the new base image, the microservices could no longer bind to the internal privileged port they had been using before.
The team thought, "Okay, so let's just use iptables or some other way to redirect so that even when the application binds to a port above 1023, Kubernetes still sees the service as exposed on the original privileged port." Unfortunately, that's not something that developers are expected to do in containers, and base containers don't include the components for port forwarding!
Next, they said, "Okay, let's give the containers the privileges required so that a non-root user can connect to the privileged port." Unfortunately, due to the issue in Node.js, that led to not being able to set additional certificates that they needed. In the end, the team found a way to migrate, but it was a lot more complicated than if they had not been using privileged ports.
Keep containers to a reasonable size
A common question is, "Why does container size matter?" The expectation is that with good layering and caching, the total size of a container won't end up being an issue. While that can often be true, environments like Kubernetes make it easy for containers to spin up and down and do so on different machines. Each time this happens on a new machine, you end up having to pull down all of the components. The same happens for new deployments if you updated all of the layers starting at the OS (maybe to address CVEs).
The net is that while we've not seen complaints or had problems in our deployments with respect to the size on disk, the compressed size that might need to be transferred to a machine has led our teams to strive to minimize container size.
A common practice we discussed was multi-stage builds, where you build in a larger base container and then copy the application artifacts to a smaller deployment image. The document Use multi-stage builds provides a good overview of how to do that.
Support efficient iterative development
The discussions on keeping container sizes reasonable also resulted in a few additional recommendations from our experience that I was unaware of before. (The process of putting together the reference architecture has been a great learning experience all around.)
The first was to use the
.dockerignore file. Once I thought about it, it made a lot of sense, as I'd run into one of the issues it addresses a number of times. If you test locally and do an
npm install, you end up with the
node_modules directory locally. When you run your Docker file, it will take longer, as it copies that directory over even though it won't necessarily be used in the build step (and if it is, that could mess things up). Assuming you are using a multi-stage build, it won't affect your final image size, but it does affect the speed of development as you iterate.
The second recommendation was to use a dependency image. For many applications, the build time is dominated by the time it takes to build the dependencies. If you break out your pipeline so that you build a dependency image and then layer your application into that image, the process of updating and testing the application can be much faster. This is because, for most of the iterations, you will not have updated the dependencies and can skip the slower rebuild of the dependency layer.
Build containers that can take advantage of the resources provided
The nice thing about using containers is that it decouples the application, microservice, etc., from the physical resources on which it will be deployed. It also means that the resources available to the container might change. Kubernetes, Docker, and Podman all provide ways to change the available resources when a container is started. If you don't plan or think about this in advance, you can end up with a container that overuses or underuses the resources available to it, resulting in poorer performance than expected.
In our discussions, we found that teams had developed patterns to start Node.js applications within containers such that they could leverage the amount of memory made available when the container was deployed. The reference architecture shares this pattern as good practice so that your application leverages the available amount of resources. Since Node.js is "approximately" single-threaded, we had not found the need to pass through available CPU resources to the same extent.
Be ready to debug production issues when they occur
When things go wrong in production, you often need additional tools to help investigate what is going on. While we did not have a common set of tools to recommend from across our teams at this point, there was consensus that it is best practice to include key tools that you might need for problem investigation. This is one reason why we've been working in the Node.js project to pull some diagnostic tools into core (such as
node-report, the ability to generate heap dumps, and the sampling heap profiler).
Avoid common pitfalls when running a process in a container
Running a Node.js process in a container is different from running on a full operating system. This results in a couple of common pitfalls related to signals, child processes, and zombies, in no particular order. Our teams ran into a number of these challenges, which resulted in the recommendations to use a process manager and avoid the use of
npm start. There is not much to add here (the reference architecture provides helpful resources for further reading), other than to say these are real-world issues that one or more of our teams have run into.
Building good containers can result in both faster development cycles and better deployments with fewer problems. In this article, we've shared some of the discussion and background that resulted in the recommendations in the Building Good Containers section of the Node.js reference architecture.
We hope you find these recommendations useful. While you wait for the next installment in the Introduction to the Node.js reference architecture series, you can check out the GitHub project to explore sections that might be covered in future articles.
If you want to learn more about what Red Hat is up to on the Node.js front, you can also explore the Node.js topic page.Last updated: September 12, 2022