Load balancing, threading, and scaling in Node.js

Introduction to the Node.js reference architecture, Part 16

Many applications require more computation than can be handled by a single thread, CPU, process, or machine. This installment of the ongoing Node.js reference architecture series covers the team's experience on how to satisfy the need for larger computational resources in your Node.js application.

Follow the series:

Part 1: Overview of the Node.js reference architecture
Part 2: Logging in Node.js
Part 3: Code consistency in Node.js
Part 4: GraphQL in Node.js
Part 5: Building good containers
Part 6: Choosing web frameworks
Part 7: Code coverage
Part 8: Typescript
Part 9: Securing Node.js applications
Part 10: Accessibility
Part 11: Typical development workflows
Part 12: npm development
Part 13: Problem determination
Part 14: Testing
Part 15: Transaction handling
Part 16: Load balancing, threading, and scaling
Part 17: CI/CD best practices in Node.js
Part 18: Wrapping up

But Node.js is single threaded?

Node.js is said to be single-threaded. While not entirely true, it reflects that most work is done on a single thread running the event loop. The asynchronous nature of JavaScript means that Node.js can handle a larger number of concurrent requests on that single thread. If that is the case, then why are we even talking about threading?

While by default a Node.js process operates in a single-threaded model, current versions of Node.js support worker threads that you can use to start additional threads of execution, each with their own event loop.

In addition, Node.js applications are often made up of multiple different microservices and multiple copies of each microservice, allowing the overall solution to leverage many concurrent threads available in a single computer or across a group of computers.

The reality is that applications based on Node.js can and do leverage multiple threads of execution over one or more computers. How to balance this work across threads, processes, and computers and scale it in times of increased demand is an important topic for most deployments.

Keep it simple

The team's experience is that, when possible, applications should be designed so that a request to a microservice running in a container will need no more than a single thread of execution to complete in a reasonable time. If that is not possible, then worker threads are the recommended approach versus multiple processes running in a single container as there will be lower complexity and less overhead communicating between multiple threads of execution.

Worker threads are also likely appropriate for desktop-type applications where it is known that you cannot scale beyond the resources of a single machine, and it is preferable to have the application show up as a single process instead of many individual processes.

Long-running requests

The team had a very interesting discussion around longer-running requests. Sometimes, you need to do computation that will take a while to complete, and you cannot break up that work.

The discussion centered around the following question: If we have a separate microservice that handles longer running requests and it's okay for all requests of that type to be handled sequentially, can we just run those on the main thread loop?

Most often, the answer turns out to be no because even in that case, you typically have other APIs like health and readiness APIs that need to respond in a reasonable amount of time when the microservice is running. If you have any request that is going to take a substantial amount of time versus completing quickly or blocking asynchronously so other work can execute on the main thread, you will need to use worker threads.

Load balancing and scaling

For requests that are completed in a timely manner, you might still need more CPU cycles than a single thread can provide in order to keep up with a larger number of requests. When implementing API requests in Node.js, they are most often designed to have no internal state, and multiple copies can be executed simultaneously. Node.js has long supported running multiple processes to allow concurrent execution of the requests through the Cluster API.

As you have likely read in other parts of the Node.js reference architecture, most modern applications run in containers, and often, those containers are managed through tools like Kubernetes. In this context, the team recommends delegating load balancing and scaling to the highest layer possible instead of using the Cluster API. For example, if you deploy the application to Kubernetes, use the load balancing and scaling built into Kubernetes. In our experience, this has been just as efficient or more efficient than trying to manage it at a lower level through tools like the Cluster API.

Threads versus processes

A common question is whether it is better to scale using threads or processes. Multiple threads within a single machine can typically be exploited within a single process or by starting multiple processes. Processes provide better isolation, but also lower opportunities to share resources and make communication between threads more costly. Using multiple threads within a process might be able to scale more efficiently within a single process, but it has the hard limit of only being able to scale to the resources provided by a single machine.

As described in earlier sections, the team's experience is that using worker threads when needed but otherwise leaving load balancing and scaling to management layers outside of the application itself (for example, Kubernetes) results in the right balance between the use of threads and processes across the application.

Learn more about Node.js reference architecture

I hope that this quick overview of the load balancing, scaling and multithreading part of the Node.js reference architecture, along with the team discussions that led to that content, has been helpful, and that the information shared in the architecture helps you in your future implementations.

We cover new topics regularly for the Node.js reference architecture series. In the next installment, we discuss continuous integration/continuous delivery (CI/CD) in the Node.js landscape and discuss the guidelines recommended by the Node.js reference architecture team.

We invite you to visit the Node.js reference architecture repository on GitHub, where you will see the work we have done and future topics. To learn more about what Red Hat is up to on the Node.js front, check out our Node.js page.

Last updated: January 9, 2024

Report a website issue

Load balancing, threading, and scaling in Node.js

Share:

But Node.js is single threaded?

Keep it simple

Long-running requests

Load balancing and scaling

Threads versus processes

Learn more about Node.js reference architecture

Products

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue