Many applications require more computation than can be handled by a single thread, CPU, process, or machine. This installment of the ongoing Node.js reference architecture series covers the team's experience on how to satisfy the need for larger computational resources in your Node.js application.
Follow the series:
- Part 1: Overview of the Node.js reference architecture
- Part 2: Logging in Node.js
- Part 3: Code consistency in Node.js
- Part 4: GraphQL in Node.js
- Part 5: Building good containers
- Part 6: Choosing web frameworks
- Part 7: Code coverage
- Part 8: Typescript
- Part 9: Securing Node.js applications
- Part 10: Accessibility
- Part 11: Typical development workflows
- Part 12: npm development
- Part 13: Problem determination
- Part 14: Testing
- Part 15: Transaction handling
- Part 16: Load balancing, threading, and scaling
But Node.js is single threaded?
While by default a Node.js process operates in a single-threaded model, current versions of Node.js support worker threads that you can use to start additional threads of execution, each with their own event loop.
In addition, Node.js applications are often made up of multiple different microservices and multiple copies of each microservice, allowing the overall solution to leverage many concurrent threads available in a single computer or across a group of computers.
The reality is that applications based on Node.js can and do leverage multiple threads of execution over one or more computers. How to balance this work across threads, processes, and computers and scale it in times of increased demand is an important topic for most deployments.
Keep it simple
The team's experience is that, when possible, applications should be designed so that a request to a microservice running in a container will need no more than a single thread of execution to complete in a reasonable time. If that is not possible, then worker threads are the recommended approach versus multiple processes running in a single container as there will be lower complexity and less overhead communicating between multiple threads of execution.
Worker threads are also likely appropriate for desktop-type applications where it is known that you cannot scale beyond the resources of a single machine, and it is preferable to have the application show up as a single process instead of many individual processes.
The team had a very interesting discussion around longer-running requests. Sometimes, you need to do computation that will take a while to complete, and you cannot break up that work.
The discussion centered around the following question: If we have a separate microservice that handles longer running requests and it's okay for all requests of that type to be handled sequentially, can we just run those on the main thread loop?
Most often, the answer turns out to be no because even in that case, you typically have other APIs like health and readiness APIs that need to respond in a reasonable amount of time when the microservice is running. If you have any request that is going to take a substantial amount of time versus completing quickly or blocking asynchronously so other work can execute on the main thread, you will need to use worker threads.
Load balancing and scaling
For requests that are completed in a timely manner, you might still need more CPU cycles than a single thread can provide in order to keep up with a larger number of requests. When implementing API requests in Node.js, they are most often designed to have no internal state, and multiple copies can be executed simultaneously. Node.js has long supported running multiple processes to allow concurrent execution of the requests through the Cluster API.
As you have likely read in other parts of the Node.js reference architecture, most modern applications run in containers, and often, those containers are managed through tools like Kubernetes. In this context, the team recommends delegating load balancing and scaling to the highest layer possible instead of using the Cluster API. For example, if you deploy the application to Kubernetes, use the load balancing and scaling built into Kubernetes. In our experience, this has been just as efficient or more efficient than trying to manage it at a lower level through tools like the Cluster API.
Threads versus processes
A common question is whether it is better to scale using threads or processes. Multiple threads within a single machine can typically be exploited within a single process or by starting multiple processes. Processes provide better isolation, but also lower opportunities to share resources and make communication between threads more costly. Using multiple threads within a process might be able to scale more efficiently within a single process, but it has the hard limit of only being able to scale to the resources provided by a single machine.
As described in earlier sections, the team's experience is that using worker threads when needed but otherwise leaving load balancing and scaling to management layers outside of the application itself (for example, Kubernetes) results in the right balance between the use of threads and processes across the application.
Learn more about Node.js reference architecture
I hope that this quick overview of the load balancing, scaling and multithreading part of the Node.js reference architecture, along with the team discussions that led to that content, has been helpful, and that the information shared in the architecture helps you in your future implementations.
We plan to cover new topics regularly for the Node.js reference architecture series. Until the next installment, we invite you to visit the Node.js reference architecture repository on GitHub, where you will see the work we have done and future topics.
To learn more about what Red Hat is up to on the Node.js front, check out our Node.js page.