DEV Community

Cover image for 🔥 Practical Concurrent Control for Node.js Servers: Keep Your Server from Being Overwhelmed by Traffic!
Max
Max

Posted on

🔥 Practical Concurrent Control for Node.js Servers: Keep Your Server from Being Overwhelmed by Traffic!

💡 Why Do We Need to Limit Concurrent Connections?

First, we need to understand a harsh reality: server resources are limited! Just like a small restaurant, if too many customers rush in at the same time, the service quality will inevitably decline and may even lead to chaos. The same applies to web servers:

  • 🎯 Limited memory resources
  • 🎯 Limited CPU processing power
  • 🎯 Limited network bandwidth
  • 🎯 Limited database connections

If we don't limit the number of concurrent connections, it may lead to:

  • 😱 Slower server response
  • 😱 Memory overflow
  • 😱 Complete service downtime
  • 😱 Other users unable to access

Let's take an example of a service we wrote:

  • Allocated 2GB memory
  • No request limit
  • Each request consumes some memory

Combining these 3 conditions, when there are too many requests and memory exceeds the limit, the service crashes directly. Let's simulate this:

const http = require('http');
const { promisify } = require('util');
const fs = require('fs');

const readFileAsync = promisify(fs.readFile);

async function loadLargeFileIntoMemory() {
    try {
        const data = await readFileAsync('./largefile');
        return data;
    } catch (error) {
        console.error('Error reading large file:', error);
        return null;
    }
}

const server = http.createServer(async (req, res) => {
    const largeFileData = await loadLargeFileIntoMemory();
    if (largeFileData) {
        res.writeHead(200, { 'Content-Type': 'text/plain' });
        res.end('Request processed');
    } else {
        res.writeHead(500, { 'Content-Type': 'text/plain' });
        res.end('Internal Server Error');
    }
});

const PORT = process.env.PORT || 3000;
server.listen(PORT, () => {
    console.log(`Server listening on port ${PORT}`);
});
Enter fullscreen mode Exit fullscreen mode

Before sending requests, our program only occupied 35.1 MB of memory.

Image description

Simulated 200 concurrent connections using ab

Image description

Now our Node.js program has reached its peak memory usage! 27GB!

Image description

Let's limit the memory and make it crash.

node --max-old-space-size=2048 server.js
Enter fullscreen mode Exit fullscreen mode

Can this set the Node.js memory usage?
Let's briefly mention the difference between V8 heap memory and process total memory:
--max-old-space-size only limits V8 engine's heap memory, but Node.js process total memory also includes:

  • Off-heap memory (Buffer, thread pool, etc.)
  • System memory (system calls, file operations, etc.)
  • Native memory (C++ level memory allocation)

For example, if we use Buffer, Buffer is allocated outside the V8 heap, here's an example:

const randomString = crypto.randomBytes(1024).toString('hex');
Enter fullscreen mode Exit fullscreen mode

Then the Buffer created by crypto.randomBytes() is not limited by --max-old-space-size.

Taking fs.readFile in our program as an example, it's used to asynchronously read the entire contents of a file. When reading a file, the content is stored in a Buffer object, so it won't be limited by heap memory allocation.

So how should we limit it?

  • System-level memory limits (ulimit or Docker)
  • Process management tools (PM2)
  • Code-level memory monitoring and control

Interested readers can try it themselves. The above example shows that not controlling request concurrency can have disastrous effects on our program, so after development, we need to estimate how many visitors we will have to prepare appropriate resources for deployment.

🛠️ Implementation Solutions

Let's first implement the basic functionality of limiting concurrency

1. Using Queue to Control Concurrency

Let's look at a simple but practical implementation solution using a queue to control concurrent requests:

const express = require('express');
const app = express();

// Create a simple queue class
class RequestQueue {
  constructor(maxConcurrent) {
    this.maxConcurrent = maxConcurrent;
    this.currentRequests = 0;
    this.queue = [];
  }

  // Add request to queue
  enqueue(req, res, next) {
    if (this.currentRequests < this.maxConcurrent) {
      this.currentRequests++;
      next();
    } else {
      this.queue.push({ req, res, next });
    }
  }

  // Release resources after processing
  dequeue() {
    this.currentRequests--;
    if (this.queue.length > 0) {
      const { req, res, next } = this.queue.shift();
      this.currentRequests++;
      next();
    }
  }
}

// Create queue instance, max concurrency set to 100
const requestQueue = new RequestQueue(100);

// Middleware: limit concurrency
const limitConcurrent = (req, res, next) => {
  requestQueue.enqueue(req, res, next);
};

// Use middleware
app.use(limitConcurrent);

// Release resources when request ends
app.use((req, res, next) => {
  res.on('finish', () => {
    requestQueue.dequeue();
  });
  next();
});

// Example route
app.get('/api/test', async (req, res) => {
  // Simulate time-consuming operation
  await new Promise(resolve => setTimeout(resolve, 1000));
  res.json({ message: 'Request processed successfully!' });
});

app.listen(3000, () => {
  console.log('Server started on port 3000');
});
Enter fullscreen mode Exit fullscreen mode

We limit concurrent request handling to 10, let's test it with ab

ab -n 100 -c 20  http://127.0.0.1:3000/api/test
Enter fullscreen mode Exit fullscreen mode

The effect is very good, the results show that it can process 9 requests per second.

Image description

Let's explain the code briefly. First, we create a queue to store queued requests. If the current number of requests has already exceeded, say 10, then we put these requests in the queue

if (this.currentRequests < this.maxConcurrent) {
      this.currentRequests++;
      next();
    } else {
      this.queue.push({ req, res, next });
    }
Enter fullscreen mode Exit fullscreen mode

When the previous requests are processed, we retrieve these queued requests to continue processing

  res.on('finish', () => {
    requestQueue.dequeue();
  });
Enter fullscreen mode Exit fullscreen mode

2. Implementation Using Third-Party Libraries

With the basic principles covered above, we can take a look at mature libraries like bottleneck to see how they implement concurrency limiting. The implementation methods are quite similar.

const Bottleneck = require('bottleneck');

// Create a limiter instance
const limiter = new Bottleneck({
  maxConcurrent: 100,  // Maximum number of concurrent requests
  minTime: 100       // Minimum time between requests (ms)
});

// Apply the limiter middleware
app.use(limiter);
Enter fullscreen mode Exit fullscreen mode

🎨 Optimizing Our Program's Concurrency Control

1. Implementing Graceful Degradation 🎯

Our current implementation makes users wait in line, consuming server resources while waiting. Another approach is to implement service degradation. Let requests return directly so that users won't be stuck waiting, which provides a better user experience and reduces the server load. We just need to return when the concurrency limit is reached.

// Return a friendly prompt when the concurrency limit is reached
if (requestQueue.currentRequests >= requestQueue.maxConcurrent) {
  return res.status(503).json({
    message: 'The server is busy. Please try again later.'
  });
}
Enter fullscreen mode Exit fullscreen mode

After modification, we can see that the program's concurrent processing capacity has improved. It can handle 50 requests per second. Only the business logic of 10 requests will actually be processed, and other requests will return directly.

Image description

2. Monitoring and Alert Mechanism 📊

After the program is deployed online, we need a complete monitoring and alert mechanism. For example, to achieve the following purposes:

  • I can have data to observe the concurrency situation in the past two days.
  • When the concurrency exceeds a certain value, I should be notified.

At this time, we need to:

  • Expose the current connection situation of the program.
  • Collect connection information and customize alert rules.

In the Node.js ecosystem, prom-client is a commonly used library for creating and exposing monitoring metrics. It works well with monitoring systems (such as Prometheus), making it convenient for us to collect and display various indicator data of the application.

const prometheus = require('prom-client');
const counter = new prometheus.Counter({
  name: 'concurrent_requests',
  help: 'Current number of concurrent requests'
});

// Record the number of concurrent requests
app.use((req, res, next) => {
  counter.inc();
  res.on('finish', () => counter.dec());
  next();
});
Enter fullscreen mode Exit fullscreen mode

Through this integration, we have exposed the connection number of the program to the /metric path. The next step is to configure the collector to collect data and observe and customize alert rules on the Prometheus platform.

3. Dynamically Adjusting Concurrency Limits 🔄

The concurrency limits we set earlier, such as 10 or 100, are not very intelligent. Although we can dynamically adjust them through environment variables and deployment resources to cope with unknown traffic volumes, is there a smarter way to help us determine what the limit should be set to? Yes, there is!

Dynamic concurrency limiting can intelligently adjust the maximum number of concurrent requests based on the real-time load of the system. When the system load is light, appropriately increase the concurrency limit to make full use of idle resources and improve the processing capacity of the application.

When the system load is heavy, timely reduce the concurrency limit to avoid excessive competition and depletion of resources and ensure the basic response ability and stability of the service.

To obtain the current load capacity of the system, we need to use the os library. The os.loadavg() method returns an array containing the average loads for 1 minute, 5 minutes, and 15 minutes:

[ 3.521484375, 3.57373046875, 3.6845703125 ]
Enter fullscreen mode Exit fullscreen mode

This value is related to the number of CPU cores. For example, in a single-core system, a return value of 1 indicates a fully loaded state. Taking a single-core system as an example, we can dynamically adjust the concurrency limit for our program.

const os = require('os');

function startMonitoring() {
  setInterval(() => {
    const load = os.loadavg()[0];
    if (load > 0.7) {
      requestQueue.maxConcurrent = Math.max(50, requestQueue.maxConcurrent - 10);
    } else if (load < 0.3) {
      requestQueue.maxConcurrent = Math.min(200, requestQueue.maxConcurrent + 10);
    }
  }, 60000);
}

const server = http.createServer((req, res) => {
  handleRequest(req, res);
});

server.listen(3000, () => {
  startMonitoring();
  console.log('Server listening on port 3000');
});
Enter fullscreen mode Exit fullscreen mode

By monitoring the system load in real time and adjusting the maximum number of concurrent requests flexibly according to the load situation, we can avoid resource waste and service failures. However, implementing this dynamic concurrency limiting mechanism requires comprehensive testing based on our actual scenario. In this example, loadavg provides us with a good reference:

For example, a Node.js application may frequently perform disk I/O operations (such as reading or writing files) or network I/O operations (such as sending HTTP requests and waiting for responses). These I/O operations may cause the process to enter a waiting state, which will be counted towards the system load. Therefore, even if the CPU usage is not high, if there are a large number of processes waiting for I/O completion, the value of loadavg may still be high.

💝 Practical Suggestions

  1. 🎯 Set a reasonable concurrency limit based on the server configuration and have a clear understanding of your program's capabilities.
  2. 🎯 Consider the characteristics of the business. It may be necessary to set different limits for different APIs. This article only sets the maximum concurrency limit and does not consider fairness.
  3. 🎯 Regularly monitor and analyze performance data to avoid being unaware of the situation online.
  4. 🎯 Establish a comprehensive degradation plan.

🌈 Summary

Through reasonable concurrency control, we can:

  • 🌟 Protect the server from being overloaded.
  • 🌟 Provide more stable services.
  • 🌟 Optimize resource utilization.
  • 🌟 Improve the user experience.

Rather than waiting for the server to be overwhelmed and then trying to rescue it, it's better to take preventive measures in advance! I hope this article is helpful to everyone! If you find it useful, don't forget to like and follow me! 💖

Top comments (1)

Collapse
 
evle profile image
Max

Hey everyone! I'm really excited to hear what you all have to say about this. Whether you're a beginner or an expert, your input is valuable. Let's have a great discussion together.