Resource Limitations for Services Using PM2

Resource Limitations for Services Using PM2

Daily short news for you
  • Continuing to update on the lawsuit between the Deno group and Oracle over the name JavaScript: It seems that Deno is at a disadvantage as the court has dismissed the Deno group's complaint. However, in August, they (Oracle) must be held accountable for each reason, acknowledging or denying the allegations presented by the Deno group in the lawsuit.

    JavaScript™ Trademark Update

    » Read more
  • This time last year, I was probably busy running. This year, I'm overwhelmed with work and have lost interest. But sitting too much has made my belly grow, getting all bloated and gaining weight. Well, I’ll just try to walk every day to relax my muscles and mind a bit 😮‍💨

    The goal is over 8k steps 👌

    » Read more
  • Just a small change on the Node.js homepage has stirred the community. Specifically, when you visit the homepage nodejs.org, you will see a button "Get security support for Node.js 18 and below" right below the "Download" button. What’s notable is that it leads to an external website outside of Node.js, discussing a service that provides security solutions for older Node.js versions, which no longer receive security updates. It even stands out more than the Download button.

    The community has condemned this action, stating that it feels a bit "excessive," and suggested consulting them before making such decisions. On the Node side, they argue that this is appropriate as it is from a very significant sponsoring partner. As of now, the link still exists. Let's wait to see what happens next.

    » Read more

The Problem

Hello readers of 2coffee.dev, it's been a while since we last met. A week or two ago, I encountered quite an interesting problem while deploying a system. I initially thought of not writing it down, but then I realized that someone might face the same situation, so I diligently wrote it out. It's also a record to remember and share with everyone.

The system I am in charge of has a rather old service that was deployed based on pm2 using GCP's VM infrastructure. It's called old because it has been running for a long time, even before I took over, and there haven't been any updates since. All functions are in a stable phase, only maintaining for a certain user base. It wouldn't be a problem if the number of users had not suddenly increased recently, or perhaps due to some reason, the number of users with complex logic occasionally causes the system to become overloaded. CPU spikes, RAM increases to a certain level... Boom! The server crashes.

This VM is only allocated a modest amount of resources: 1 CPU and 2GB RAM. So when the CPU or RAM suddenly spikes, it will freeze without being able to SSH into it. Realizing the issue, I immediately set out to find a solution. Initially, I could upgrade the server resources, but practical experience has shown that this is ineffective; the server still "hangs" at an unpredictable time. I can't fix the error right away because resources are limited and we still have many other tasks that need more priority. At this point, the most feasible thing I could think of was to limit the resource usage for this service.

Limiting Memory

Fortunately, pm2 has a memory usage limitation feature. When this limit is set, each time the process uses memory up to the limit, it automatically restarts to free up memory. Memory overflow is very dangerous in a VM because it causes the server to freeze, making it very difficult to perform any operations, including SSH into the server to troubleshoot.

The setup is very simple. Just run a command.

pm2 start api.js --max-memory-restart 300M  

With --max-memory-restart being the memory limit. Every 30 seconds, pm2 will scan and restart the service if needed.

I thought that limiting memory would solve everything, but upon further monitoring, another problem arose: CPU also spiked.

Limiting CPU

PM2 does not have the feature to limit CPU resources for a service. If you want to impose a limit, you need to find another tool or supporting tool. For example, if using Docker, there are already resource configuration settings available. Very convenient. After a while of searching, I found cpulimit, which is a standalone tool that helps limit CPU resources for a process.

Each service in pm2 runs in a process. When you type pm2 ls, you will see a column titled PID, corresponding to the Process ID of that service. When using the command ps -fp PID, you will see detailed information of the process.

Using cpulimit is relatively simple. After installation, use the command.

$ cpulimit -p PID -l 80 -b  

With PID being the PID of the process to limit, -l being the maximum CPU level, and -b to run the process in the background. cpulimit keeps CPU usage from exceeding the established limit, thus during peak hours, the server may process slower than usual.

I thought that after setting both limits, I could sleep well, but no, a new problem arose.

New Problem

Each time pm2 restarts the service, the PID of the process changes. Normally, one would try to fix the PID, but that is impossible as it is allocated randomly. cpulimit can be configured by PID, but it can also be configured based on a few criteria like the executable file path; however, none of my trials were successful. Just when I thought I was at an impasse, I remembered that pm2 has an advanced feature called PM2 API.

The PM2 API is a set of APIs from pm2 that allows interference with this process management tool. One of its capabilities is to listen to events of processes running on pm2. Simply put, it can be considered as a hook. Each emitted event can be listened to and execute related tasks. Applied to this case, each time the service restarts, listen and rerun the cpulimit command to set the limit again.

The implementation is straightforward; readers can refer to the js file I wrote as follows.

const pm2 = require("pm2");  
const { spawn } = require("child_process");  
const fs = require("node:fs");  

const PM_CONFIGURATIONS = [{ pm_id: 1, cpu_limit: "80" }];  

pm2.connect((err) => {  
  if (err) {  
    console.error("PM2 connect error:", err);  
    process.exit(2);  
  }  

  pm2.launchBus((err, bus) => {  
    console.log("PM2 launchBus");  

    if (err) {  
      console.error("PM2 launchBus error:", err);  
      process.exit(2);  
    }  

    bus.on("process:event", (data) => {  
      // Only consider start or restart events  
      if (!["start", "restart", "online"].includes(data.event)) return;  

      let pid = null;  
      const { pm_id, name } = data.process;  
      pid = data.process.pid;  
      if (!pid) {  
        // Get pid from pm_pid_path log file  
        const pm_pid_path = data.process.pm_pid_path;  
        const pm_pid = fs.readFileSync(pm_pid_path, "utf8");  
        pid = pm_pid;  
      }  
      console.log(`Event=${data.event} name=${name} pm_id=${pm_id} pid=${pid}`);  

      // Find corresponding configuration  
      const config = PM_CONFIGURATIONS.find((config) => config.pm_id === pm_id);  

      if (config) {  
        // Apply cpulimit if configuration found  
        console.log(`→ Applying cpulimit ${config.cpu_limit}% for PID=${pid}`);  
        spawn("cpulimit", ["-p", pid, "-l", config.cpu_limit, "-b"]);  
      } else {  
        // Do nothing if configuration not found  
        console.log(`→ Skipping pm_id=${pm_id}`);  
      }  
    });  
  });  
});  

PM_CONFIGURATIONS contains configuration information of services to be listened to so that each time it restarts, it executes to find the new assigned PID for it and uses the cpulimit command to limit CPU.

Conclusion

Through this article, I have shared how to optimize and control resource usage of services on PM2 in a resource-limited environment. First, limiting memory with the --max-memory-restart parameter helps minimize the risk of memory overflow and server crashes, ensuring the service automatically restarts when necessary. However, when the issue of high CPU usage arises, an additional solution is to use the cpulimit tool to limit CPU usage for each specific process. Nevertheless, the changing PID each time the service restarts presents a new challenge.

To overcome this, I utilized the PM2 API to automatically listen for events such as service start or restart, thus updating the PID and reassigning cpulimit automatically. This is not only a practical approach but also a useful suggestion for those facing similar issues. I hope this article will help you in managing resource-related issues on pm2.

Premium
Hello

The secret stack of Blog

As a developer, are you curious about the technology secrets or the technical debts of this blog? All secrets will be revealed in the article below. What are you waiting for, click now!

As a developer, are you curious about the technology secrets or the technical debts of this blog? All secrets will be revealed in the article below. What are you waiting for, click now!

View all

Subscribe to receive new article notifications

or
* The summary newsletter is sent every 1-2 weeks, cancel anytime.

Comments (0)

Leave a comment...