Resource Limitations for Services Using PM2

Daily short news for you

Zed is probably the most user-centric developer community on the planet. Recently, they added an option to disable all AI features in Zed. While many others are looking to integrate deeper and do more with AI Agents. Truly a bold move 🤔

You Can Now Disable All AI Features in Zed

» Read more
Today I have tried to walk a full 8k steps in one session to show you all. As expected, the time spent walking reached over 1 hour and the distance was around 6km 🤓

Oh, in a few days it will be the end of the month, which means it will also mark one month since I started the habit of walking every day with the goal of 8k steps. At the beginning of next month, I will summarize and see how it goes.

» Read more
It's been a long time since I've read such a heartfelt article The many, many, many JavaScript runtimes of the last decade. In the article, the author recounts the journey of the development of the JavaScript runtime environment, noting that today we have and are seeing JavaScript present in many areas. Additionally, the author also touches on a lot of peripheral knowledge; just a little reading opens up so many new discoveries 🤓

» Read more

The Problem

Hello readers of 2coffee.dev, it's been a while since we last met. A week or two ago, I encountered quite an interesting problem while deploying a system. I initially thought of not writing it down, but then I realized that someone might face the same situation, so I diligently wrote it out. It's also a record to remember and share with everyone.

The system I am in charge of has a rather old service that was deployed based on pm2 using GCP's VM infrastructure. It's called old because it has been running for a long time, even before I took over, and there haven't been any updates since. All functions are in a stable phase, only maintaining for a certain user base. It wouldn't be a problem if the number of users had not suddenly increased recently, or perhaps due to some reason, the number of users with complex logic occasionally causes the system to become overloaded. CPU spikes, RAM increases to a certain level... Boom! The server crashes.

This VM is only allocated a modest amount of resources: 1 CPU and 2GB RAM. So when the CPU or RAM suddenly spikes, it will freeze without being able to SSH into it. Realizing the issue, I immediately set out to find a solution. Initially, I could upgrade the server resources, but practical experience has shown that this is ineffective; the server still "hangs" at an unpredictable time. I can't fix the error right away because resources are limited and we still have many other tasks that need more priority. At this point, the most feasible thing I could think of was to limit the resource usage for this service.

Limiting Memory

Fortunately, pm2 has a memory usage limitation feature. When this limit is set, each time the process uses memory up to the limit, it automatically restarts to free up memory. Memory overflow is very dangerous in a VM because it causes the server to freeze, making it very difficult to perform any operations, including SSH into the server to troubleshoot.

The setup is very simple. Just run a command.

pm2 start api.js --max-memory-restart 300M

With --max-memory-restart being the memory limit. Every 30 seconds, pm2 will scan and restart the service if needed.

I thought that limiting memory would solve everything, but upon further monitoring, another problem arose: CPU also spiked.

Limiting CPU

PM2 does not have the feature to limit CPU resources for a service. If you want to impose a limit, you need to find another tool or supporting tool. For example, if using Docker, there are already resource configuration settings available. Very convenient. After a while of searching, I found cpulimit, which is a standalone tool that helps limit CPU resources for a process.

Each service in pm2 runs in a process. When you type pm2 ls, you will see a column titled PID, corresponding to the Process ID of that service. When using the command ps -fp PID, you will see detailed information of the process.

Using cpulimit is relatively simple. After installation, use the command.

$ cpulimit -p PID -l 80 -b

With PID being the PID of the process to limit, -l being the maximum CPU level, and -b to run the process in the background. cpulimit keeps CPU usage from exceeding the established limit, thus during peak hours, the server may process slower than usual.

I thought that after setting both limits, I could sleep well, but no, a new problem arose.

New Problem

Each time pm2 restarts the service, the PID of the process changes. Normally, one would try to fix the PID, but that is impossible as it is allocated randomly. cpulimit can be configured by PID, but it can also be configured based on a few criteria like the executable file path; however, none of my trials were successful. Just when I thought I was at an impasse, I remembered that pm2 has an advanced feature called PM2 API.

The PM2 API is a set of APIs from pm2 that allows interference with this process management tool. One of its capabilities is to listen to events of processes running on pm2. Simply put, it can be considered as a hook. Each emitted event can be listened to and execute related tasks. Applied to this case, each time the service restarts, listen and rerun the cpulimit command to set the limit again.

The implementation is straightforward; readers can refer to the js file I wrote as follows.

const pm2 = require("pm2");  
const { spawn } = require("child_process");  
const fs = require("node:fs");  

const PM_CONFIGURATIONS = [{ pm_id: 1, cpu_limit: "80" }];  

pm2.connect((err) => {  
  if (err) {  
    console.error("PM2 connect error:", err);  
    process.exit(2);  
  }  

  pm2.launchBus((err, bus) => {  
    console.log("PM2 launchBus");  

    if (err) {  
      console.error("PM2 launchBus error:", err);  
      process.exit(2);  
    }  

    bus.on("process:event", (data) => {  
      // Only consider start or restart events  
      if (!["start", "restart", "online"].includes(data.event)) return;  

      let pid = null;  
      const { pm_id, name } = data.process;  
      pid = data.process.pid;  
      if (!pid) {  
        // Get pid from pm_pid_path log file  
        const pm_pid_path = data.process.pm_pid_path;  
        const pm_pid = fs.readFileSync(pm_pid_path, "utf8");  
        pid = pm_pid;  
      }  
      console.log(`Event=${data.event} name=${name} pm_id=${pm_id} pid=${pid}`);  

      // Find corresponding configuration  
      const config = PM_CONFIGURATIONS.find((config) => config.pm_id === pm_id);  

      if (config) {  
        // Apply cpulimit if configuration found  
        console.log(`→ Applying cpulimit ${config.cpu_limit}% for PID=${pid}`);  
        spawn("cpulimit", ["-p", pid, "-l", config.cpu_limit, "-b"]);  
      } else {  
        // Do nothing if configuration not found  
        console.log(`→ Skipping pm_id=${pm_id}`);  
      }  
    });  
  });  
});

PM_CONFIGURATIONS contains configuration information of services to be listened to so that each time it restarts, it executes to find the new assigned PID for it and uses the cpulimit command to limit CPU.

Conclusion

Through this article, I have shared how to optimize and control resource usage of services on PM2 in a resource-limited environment. First, limiting memory with the --max-memory-restart parameter helps minimize the risk of memory overflow and server crashes, ensuring the service automatically restarts when necessary. However, when the issue of high CPU usage arises, an additional solution is to use the cpulimit tool to limit CPU usage for each specific process. Nevertheless, the changing PID each time the service restarts presents a new challenge.

To overcome this, I utilized the PM2 API to automatically listen for events such as service start or restart, thus updating the PID and reassigning cpulimit automatically. This is not only a practical approach but also a useful suggestion for those facing similar issues. I hope this article will help you in managing resource-related issues on pm2.

Premium

The secret stack of Blog

As a developer, are you curious about the technology secrets or the technical debts of this blog? All secrets will be revealed in the article below. What are you waiting for, click now!

View all

Resource Limitations for Services Using PM2

The Problem

Limiting Memory

Limiting CPU

New Problem

Conclusion

Upgrade to Premium

Premium

Premium Plus

Premium