Node.js Architecture - Single thread, call stack, synchronous and asynchronous I/O in Node.js

Issue

Nowadays, many programming languages support synchronous programming. This means that the code will be executed sequentially, from top to bottom, from left to right. It finishes executing one part of the code before moving on to the next part.

An example in Go: a simple program that prints the words "Hello" and "World" with a 2-second delay between them:

package main

import (
    "fmt"
    "time"
)

func main() {
    fmt.Printf("Hello")
    time.Sleep(2 * time.Second)
    fmt.Printf("World")
}

However, in JavaScript or Node.js, there are functions that can be executed synchronously or asynchronously. Asynchronous means that the code may not immediately return a result, and the result will be returned at some point in time. Node.js continues running the subsequent code after that.

Another example is a Node.js program that seems to print the words "Hello" and "World" in order, but the result is "World Hello" instead:

setTimeout(function() {
  console.log("Hello");
}, 0);

console.log("World");

Because setTimeout is an asynchronous function, Node.js "takes note" to execute that code, but the result is not immediately available. In the example above, the result is returned after printing the word "World".

So how does this "asynchronous" behavior affect Node.js? Let's find out in this article!

Single Thread

You may have heard that JavaScript or Node.js is single-threaded. This means that your JavaScript code runs in a single thread. If that's the case, then tasks such as file I/O, making external HTTP requests, etc. would have to be done sequentially, which would result in slow response times!

Imagine you are writing an API server with an endpoint that contains JavaScript code, and each request takes an average of 5 seconds to complete. When the second request comes, it has to wait for 5 seconds before it can continue processing!?

This may or may not be true in some cases. It is true when all the code inside is synchronous functions, but when there are asynchronous functions mixed in, the waiting time will not be 5 seconds. This is because Node.js handles asynchronous functions using the Event Loop with the help of the Event Queue and the Thread Pool. To avoid overwhelming you with too many concepts, let me introduce them one by one.

In Node.js, there are synchronous and asynchronous functions. Basic statements such as if-else, switch-case, loops, JSON.parse, etc. are executed synchronously. Synchronous functions play the role of built-ins in Node.js, such as readFileSync, gzipSync, etc. Asynchronous functions include readFile, gzip, HTTP requests through the HTTP module, and even third-party libraries that interact with files, databases, etc.

Here is a general diagram of the components in Node.js:

You can see that Node.js consists of three main components: Chrome's V8, the Node.js Standard Library, and LibUv. V8 is where JavaScript code is executed. The Node.js Standard Library provides additional libraries that V8 cannot handle, such as file operations, HTTP requests, etc. The third component is LibUv.

Call Stack

The call stack is where JavaScript code is pushed in order to be executed. This means that your code is pushed onto the call stack to determine the order of execution. At any given time, only one piece of code is being processed.

To illustrate this further, let's take a look at an example of the code that converts Celsius to Fahrenheit:

const add = (a, b) => a + b;
const multiply = (a, b) => a * b;

const addCofficient = (val) => multiply(val, 1.8);
const addConst = (val) => add(val, 32);

const convertCtoF = (val) => {
  let result = val;
  result = addCofficient(result);
  result = addConst(result);
  return result;
};

convertCtoF(100);

In the above example, we call the convertCtoF function, which calls the addCofficient and addConst functions, both of which in turn call the multiply and add functions.

The execution order of these functions in the call stack is described in the following diagram:

We can see that convertCtoF is pushed into the call stack first, followed by addCofficient. Since addCofficient calls the multiply function, it is pushed on top of the stack. When there are no more functions inside, it starts executing the operations starting from the top of the stack. This is also known as the First In Last Out (FILO) algorithm, so we call the call stack a stack.

If an error or exception occurs during execution, the error traceback will display the Error Stack Trace, which shows the location of the error. Because the functions are pushed onto the call stack in order, the error traceback can easily trace where they are in the program.

For example, let's modify the addConst function by changing the second parameter in the add function to a variable that doesn't exist in the program:

const addConst = (val) => add(val, number);

When running the program, an error will be thrown, including the cause and the location of the error:

ReferenceError: number is not defined
   at addConst:5:32
   at convertCtoF:10:12
   at eval:14:1

=> This means that number is not defined, at line 5, starting from column 32, in the convertCtoF function at line 10, starting from column 12…

You may have heard that JavaScript runs on a single thread, but if that's the case, wouldn't it be slow? Or some may say that Node.js is fundamentally multi-threaded!? It sounds contradictory, doesn't it? On one hand, we say that JavaScript is single-threaded, and on the other hand, we say that Node.js is based on JavaScript and yet it is multi-threaded. So what is the truth? Is Node.js single-threaded or not?

The answer is yes, Node.js is single-threaded, but it cleverly handles time-consuming tasks in another place (LibUv), and that place handles tasks in a multi-threaded manner!

I/O Tasks

I/O tasks in Node.js typically refer to operations such as file read/write or network-related activities like making HTTP requests. In a real-world server program, I/O tasks are common and you probably use them frequently. These tasks take a considerable amount of time to process because they are related to factors such as file size, network bandwidth, or server processing speed.

In Node.js, I/O consists of two types: synchronous and asynchronous.

Synchronous I/O

Let's consider an example of reading files:

const pdf = fs.readFileSync(file.pdf);
console.log("pdf size", pdf.size);
const doc = fs.readFileSync(file.doc);
console.log("doc size", doc.size);

Reading a file is a time-consuming task. readFileSync is a synchronous function, which means that the file.pdf is read first before the file.doc is read and the result is printed to the console. The processing time for the two tasks is described in the following diagram:

We can see that the time to read file.pdf is 3ms, file.doc is 3ms, and the total time we have to wait for all the tasks to complete is 6ms.

6ms may be fast in this example, but imagine if the file sizes increase, resulting in 30 seconds of reading time for each file? Then the call stack would be blocked, which means that during the time of reading the file, no other code would be processed. The code would run in the order: Read file -> Print -> Read file -> Print sequentially.

Asynchronous I/O

Now let's modify the above code a bit by replacing the readFileSync function with readFile:

const pdf = fs.readFile(file.pdf);
console.log("pdf size", pdf.size);
const doc = fs.readFile(file.doc);
console.log("doc size", doc.size);

readFile is an asynchronous function. An asynchronous function does not immediately return a result, but instead returns it at some point in time. If you run the code above, you will see a result that looks like this:

pdf size undefined
doc size undefined

That's because the result of the pdf and doc variables is not immediately available, so any attempt to access their size property will not yield any result.

To solve this problem for asynchronous functions, callbacks are a useful method. In simple terms, a callback is a function that is called after an asynchronous function has a result.

It may be challenging to imagine, but let me give you an example that is easy to understand. I will modify the above code a bit:

fs.readFile(file.pdf)
  .then(pdf => console.log("pdf size", pdf.size));

fs.readFile(file.doc)
  .then(doc => console.log("doc size", doc.size));

then is used as a way to provide a callback for the asynchronous function. Alternatively, the callback can also be passed as a second parameter to the readFile function like this:

fs.readFile(file.pdf, function(err, pdf) {
  console.log("pdf size", pdf.size);
})

By replacing the readFileSync function with the readFile function, the processing time is significantly reduced because the file reading is done almost in parallel in a place we called Thread Pool. Please refer to the following diagram for a clearer understanding:

These are the benefits that asynchronous brings to Node.js. Time-consuming I/O tasks are offloaded to the Thread Pool, preventing them from occupying too much processing time in the call stack and causing program congestion.

A clear example to illustrate the difference between synchronous and asynchronous is when you have worked with other synchronous languages like PHP or Golang, where database queries are executed sequentially. However, in Node.js, they are asynchronous, and you have to use callbacks or Promises to catch the results at some point in time.

mysql.query("select * from user where id = 1", function (err, result) {
  console.log("user": user);
});

Conclusion

Node.js is a single-threaded environment, which means that at any given time, only one piece of code is being executed. However, this doesn't make Node.js slow, as it uses asynchronous functions with the help of the Event Loop.

Node.js is made up of three main components: V8, which handles JavaScript code execution, LibUv, which provides the Event Loop for handling asynchronous operations.

So how does Node.js handle asynchronous tasks? I will explain this in more detail in the next article.