Don't block the Event Loop

If you are writing any complex applications, reading this article will help you write more high-performance and safe applications.

In this article, we will learn how Node.js handles workloads. Also, what kinds of code are processed by the Event Loop or the Worker Pool?

Summary

Node.js processes JavaScript code in the Event Loop and provides a Worker Pool to handle expensive tasks like I/O. Node.js has good scalability, sometimes better than heavyweight approaches like the Apache server. The scalability of Node.js lies in its use of a small number of threads to handle many clients. By using fewer threads, Node.js can save more time and system memory compared to using threads (memory, context). However, since Node.js only has a few threads, we need to structure our applications to utilize them optimally.

Remember, Node.js performs well when each client-related task at any given time is "small". That means the code segments at any given time are processed as fast as possible.

This applies to both callback functions and processes being executed on the Worker Pool.

Why should I avoid blocking the Event Loop and the Worker Pool?

Node.js uses a small number of threads to handle multiple connections. In Node.js, there are two types of threads: the Event Loop (also known as the main loop, main thread, event thread, etc.) and a group of n threads in the Worker Pool.

When a thread takes a long time to execute a callback, it is called "blocking". While a thread is blocked, it cannot handle requests from any other clients. This causes 2 issues:

Performance: If heavy operations are frequently performed on either type of thread, the server's throughput (request/s) will be affected.
Security: If certain inputs can cause performance bottlenecks, the server will be completely paralyzed. This is known as a DDos attack.

A quick overview of Node.js

Node.js uses an Event-Driven architecture: it has an Event Loop to coordinate and a Worker Pool for expensive I/O tasks.

Which code runs on the Event Loop?

When starting Node.js applications, it first completes the initialization phase, imports modules, and "registers" callback functions for events. The Node.js application then enters the Event Loop, responding to client requests by executing the appropriate callback functions. These callback functions are executed synchronously and may "register" asynchronous requests to continue processing after completion. The callback functions for these asynchronous requests are also executed on the Event Loop.

The Event Loop also responds to non-blocking asynchronous requests performed by its own callback functions, such as network I/O.

In summary, the Event Loop executes the JavaScript callback functions "registered" for events and handles non-blocking asynchronous requests like network I/O.

Which code runs on the Worker Pool?

The Worker Pool in Node.js is implemented in libuv.

Node.js uses the Worker Pool to handle "expensive" tasks. This includes I/O tasks that the operating system does not provide a non-blocking version for, as well as special tasks that require multiple CPUs.

Here are the API modules that use the Worker Pool:

I/O:
- DNS: dns.lookup(), dns.lookupService().
- File system: All file system APIs except fs.FSWatcher() and synchronous APIs that use libuv's thread pool.
CPU:
- Crypto: crypto.pbkdf2(), crypto.scrypt(), crypto.randomBytes(), crypto.randomFill(), crypto.generateKeyPair().
- Zlib: All zlib APIs except synchronous APIs that use libuv's thread pool.

In many Node.js applications, these APIs are the only source of tasks for the Worker Pool. Applications and modules that use C++ add-ons can submit other tasks to the Worker Pool.

How does Node.js decide which code to run next?

In summary, the Event Loop and the Worker Pool maintain queues for pending events and tasks.

In reality, the Event Loop does not actually maintain a queue. Instead, it has a set of file descriptors that it requests the operating system to monitor, using mechanisms like epoll (Linux), kqueue (OSX), event ports (Solaris), or IOCP (Windows). These file descriptors correspond to network sockets, any files it is watching, etc. When the operating system notifies that one of these file descriptors is ready, the Event Loop translates it into the appropriate event and calls the associated callbacks.

On the other hand, the Worker Pool uses an actual queue that contains tasks waiting to be processed. A Worker fetches a task from this queue and processes it until completion. When a Worker completes a task, it emits a "At least one task has completed" event to inform the Event Loop.

What does this mean for application design?

In systems that create one thread per client like Apache, each client waiting to be served is assigned a separate thread. If a thread processing a client takes a long time, the operating system interrupts it and allows another client to execute. Thus, the operating system ensures that clients requesting a small amount of work are not delayed by clients requesting more work.

Since Node.js handles multiple clients with fewer threads, if one thread blocks processing a client's request, other pending requests may not be processed until it is completed. Therefore, treating clients fairly is the responsibility of our application. This means that you should not spend too much time on any one request from a client.

This is part of the reason why Node.js can be good at scalability, but it also means we have a responsibility to ensure fairness.

Summary

In this article, we discussed how the Event Loop and the Worker Pool handle I/O tasks. It explained why we should not block the Event Loop and what the Event Loop processes, as well as what the Worker Pool handles. In the next article, we will explore what happens if we "block" the Event Loop and the code segments that can cause the Event Loop to be blocked.

Don't block the Event Loop - Part 1

Summary

Why should I avoid blocking the Event Loop and the Worker Pool?

A quick overview of Node.js

Which code runs on the Event Loop?

Which code runs on the Worker Pool?

How does Node.js decide which code to run next?

What does this mean for application design?

Summary