Resource Limitations for Services Using PM2

Resource Limitations for Services Using PM2

Daily short news for you
  • These past few days, I've been redesigning the interface for the note-taking app OpenNotas. It's quite strange to think about why I chose DaisyUI back then 😩

    » Read more
  • Previously, there was a mention of openai/codex - a type of agent that runs conveniently in the Terminal from OpenAI, especially since it is open source and they have now added support for other providers instead of just using the chatgpt model as before.

    Recently, Anthropic also introduced Claude Code which is quite similar to Codex, except it is not open source and you are required to use their API. Since I don't have money to experiment, I've only heard that people in the programming community praise it a lot, and it might even be better than Cursor. On the flip side, there's the risk of burning a hole in your wallet at any moment 😨

    » Read more
  • For a long time, I have been thinking about how to increase brand presence, as well as users for the blog. After much contemplation, it seems the only way is to share on social media or hope they seek it out, until...

    Wearing this shirt means no more worries about traffic jams, the more crowded it gets, the more fun it is because hundreds of eyes are watching 🤓

    (It really works, you know 🤭)

    » Read more

The Problem

Hello readers of 2coffee.dev, it's been a while since we last met. A week or two ago, I encountered quite an interesting problem while deploying a system. I initially thought of not writing it down, but then I realized that someone might face the same situation, so I diligently wrote it out. It's also a record to remember and share with everyone.

The system I am in charge of has a rather old service that was deployed based on pm2 using GCP's VM infrastructure. It's called old because it has been running for a long time, even before I took over, and there haven't been any updates since. All functions are in a stable phase, only maintaining for a certain user base. It wouldn't be a problem if the number of users had not suddenly increased recently, or perhaps due to some reason, the number of users with complex logic occasionally causes the system to become overloaded. CPU spikes, RAM increases to a certain level... Boom! The server crashes.

This VM is only allocated a modest amount of resources: 1 CPU and 2GB RAM. So when the CPU or RAM suddenly spikes, it will freeze without being able to SSH into it. Realizing the issue, I immediately set out to find a solution. Initially, I could upgrade the server resources, but practical experience has shown that this is ineffective; the server still "hangs" at an unpredictable time. I can't fix the error right away because resources are limited and we still have many other tasks that need more priority. At this point, the most feasible thing I could think of was to limit the resource usage for this service.

Limiting Memory

Fortunately, pm2 has a memory usage limitation feature. When this limit is set, each time the process uses memory up to the limit, it automatically restarts to free up memory. Memory overflow is very dangerous in a VM because it causes the server to freeze, making it very difficult to perform any operations, including SSH into the server to troubleshoot.

The setup is very simple. Just run a command.

pm2 start api.js --max-memory-restart 300M  

With --max-memory-restart being the memory limit. Every 30 seconds, pm2 will scan and restart the service if needed.

I thought that limiting memory would solve everything, but upon further monitoring, another problem arose: CPU also spiked.

Limiting CPU

PM2 does not have the feature to limit CPU resources for a service. If you want to impose a limit, you need to find another tool or supporting tool. For example, if using Docker, there are already resource configuration settings available. Very convenient. After a while of searching, I found cpulimit, which is a standalone tool that helps limit CPU resources for a process.

Each service in pm2 runs in a process. When you type pm2 ls, you will see a column titled PID, corresponding to the Process ID of that service. When using the command ps -fp PID, you will see detailed information of the process.

Using cpulimit is relatively simple. After installation, use the command.

$ cpulimit -p PID -l 80 -b  

With PID being the PID of the process to limit, -l being the maximum CPU level, and -b to run the process in the background. cpulimit keeps CPU usage from exceeding the established limit, thus during peak hours, the server may process slower than usual.

I thought that after setting both limits, I could sleep well, but no, a new problem arose.

New Problem

Each time pm2 restarts the service, the PID of the process changes. Normally, one would try to fix the PID, but that is impossible as it is allocated randomly. cpulimit can be configured by PID, but it can also be configured based on a few criteria like the executable file path; however, none of my trials were successful. Just when I thought I was at an impasse, I remembered that pm2 has an advanced feature called PM2 API.

The PM2 API is a set of APIs from pm2 that allows interference with this process management tool. One of its capabilities is to listen to events of processes running on pm2. Simply put, it can be considered as a hook. Each emitted event can be listened to and execute related tasks. Applied to this case, each time the service restarts, listen and rerun the cpulimit command to set the limit again.

The implementation is straightforward; readers can refer to the js file I wrote as follows.

const pm2 = require("pm2");  
const { spawn } = require("child_process");  
const fs = require("node:fs");  

const PM_CONFIGURATIONS = [{ pm_id: 1, cpu_limit: "80" }];  

pm2.connect((err) => {  
  if (err) {  
    console.error("PM2 connect error:", err);  
    process.exit(2);  
  }  

  pm2.launchBus((err, bus) => {  
    console.log("PM2 launchBus");  

    if (err) {  
      console.error("PM2 launchBus error:", err);  
      process.exit(2);  
    }  

    bus.on("process:event", (data) => {  
      // Only consider start or restart events  
      if (!["start", "restart", "online"].includes(data.event)) return;  

      let pid = null;  
      const { pm_id, name } = data.process;  
      pid = data.process.pid;  
      if (!pid) {  
        // Get pid from pm_pid_path log file  
        const pm_pid_path = data.process.pm_pid_path;  
        const pm_pid = fs.readFileSync(pm_pid_path, "utf8");  
        pid = pm_pid;  
      }  
      console.log(`Event=${data.event} name=${name} pm_id=${pm_id} pid=${pid}`);  

      // Find corresponding configuration  
      const config = PM_CONFIGURATIONS.find((config) => config.pm_id === pm_id);  

      if (config) {  
        // Apply cpulimit if configuration found  
        console.log(`→ Applying cpulimit ${config.cpu_limit}% for PID=${pid}`);  
        spawn("cpulimit", ["-p", pid, "-l", config.cpu_limit, "-b"]);  
      } else {  
        // Do nothing if configuration not found  
        console.log(`→ Skipping pm_id=${pm_id}`);  
      }  
    });  
  });  
});  

PM_CONFIGURATIONS contains configuration information of services to be listened to so that each time it restarts, it executes to find the new assigned PID for it and uses the cpulimit command to limit CPU.

Conclusion

Through this article, I have shared how to optimize and control resource usage of services on PM2 in a resource-limited environment. First, limiting memory with the --max-memory-restart parameter helps minimize the risk of memory overflow and server crashes, ensuring the service automatically restarts when necessary. However, when the issue of high CPU usage arises, an additional solution is to use the cpulimit tool to limit CPU usage for each specific process. Nevertheless, the changing PID each time the service restarts presents a new challenge.

To overcome this, I utilized the PM2 API to automatically listen for events such as service start or restart, thus updating the PID and reassigning cpulimit automatically. This is not only a practical approach but also a useful suggestion for those facing similar issues. I hope this article will help you in managing resource-related issues on pm2.

Premium
Hello

The secret stack of Blog

As a developer, are you curious about the technology secrets or the technical debts of this blog? All secrets will be revealed in the article below. What are you waiting for, click now!

As a developer, are you curious about the technology secrets or the technical debts of this blog? All secrets will be revealed in the article below. What are you waiting for, click now!

View all

Subscribe to receive new article notifications

or
* The summary newsletter is sent every 1-2 weeks, cancel anytime.

Comments (0)

Leave a comment...