Leapcell

Posted on Jan 31

Scaling Node.js: Multi-threading You Need to Know

#webdev #javascript #node #programming

In Node.js, due to its single-threaded nature, the main thread is used to execute non-blocking I/O operations. However, when executing CPU-intensive tasks, relying solely on a single thread may lead to performance bottlenecks. Fortunately, Node.js provides several methods to enable and manage threads, allowing applications to take advantage of multi-core CPUs.

Why Enable Subthreads?

The main reasons for enabling subthreads in Node.js are to handle concurrent tasks and improve application performance. Node.js is inherently based on an event-loop, single-threaded model, meaning that all I/O operations (such as file reading/writing and network requests) are non-blocking. However, CPU-intensive tasks (such as large-scale computations) may block the event loop, affecting the overall performance of the application.

Enabling subthreads helps solve the following issues:

Non-blocking operations: The design philosophy of Node.js revolves around non-blocking I/O. However, if external commands are executed directly in the main thread, the execution process may block the main thread, impacting the application's responsiveness. By executing these commands in subthreads, the main thread retains its non-blocking characteristics, ensuring other concurrent operations are not affected.
Efficient use of system resources: By using child processes or worker threads, a Node.js application can better utilize the computing power of multi-core CPUs. This is especially useful for executing CPU-intensive external commands, as they can run on separate CPU cores without affecting Node.js’s main event loop.
Isolation and security: Running external commands in subthreads adds an extra layer of security to the application. If an external command fails or crashes, this isolation helps protect the main Node.js process from being affected, thereby improving application stability.
Flexible data processing and communication: With subthreads, external command outputs can be processed flexibly before being passed back to the main process. Node.js offers multiple ways to implement inter-process communication (IPC), making data exchange seamless.

Methods to Enable Subthreads

Next, we will explore different ways to enable subthreads in Node.js.

Child Processes

Node.js's child_process module allows running system commands or other programs by creating child processes that can communicate with the main process. This is useful for executing CPU-intensive tasks or running other applications.

spawn()

The spawn() method in the child_process module is used to create a new child process that executes a specified command. It returns an object with stdout and stderr streams, allowing interaction with the child process. This method is ideal for long-running processes that generate large amounts of output since it processes data as a stream, rather than buffering it all at once.

The basic syntax of the spawn() function is:

const { spawn } = require('child_process');
const child = spawn(command, [args], [options]);

command: A string representing the command to be executed.
args: An array of strings listing all command-line arguments.
options: An optional object that configures how the child process is created. Common options include:
- cwd: The working directory of the child process.
- env: An object containing environment variables.
- stdio: Configures the standard input/output of the child process, often used for piping operations or file redirection.
- shell: If true, runs the command in a shell. The default shell is /bin/sh on Unix and cmd.exe on Windows.
- detached: If true, the child process runs independently of the parent process and can continue running after the parent exits.

Here is a simple example using spawn():

const { spawn } = require('child_process');
const path = require('path');

// Use the 'touch' command to create a file named 'moment.txt'
const touch = spawn('touch', ['moment.txt'], {
  cwd: path.join(process.cwd(), './m'),
});

touch.on('close', (code) => {
  if (code === 0) {
    console.log('File created successfully');
  } else {
    console.error(`Error creating file, exit code: ${code}`);
  }
});

The purpose of this code is to create an empty file named moment.txt in the m subdirectory of the current working directory. If successful, a success message is printed; otherwise, an error message is displayed.

exec()

The exec() method in the child_process module is used to create a new child process to execute a given command, buffering any output produced. Unlike spawn(), exec() is better suited for scenarios where the output is small, as it stores the child process’s stdout and stderr in memory.

The basic syntax of exec() is:

const { exec } = require('child_process');

exec(command, [options], callback);

command: The command to be executed as a string.
options: Optional parameters to customize the execution environment.
callback: A callback function that receives (error, stdout, stderr) as arguments.

The options object can include:

cwd: Sets the working directory of the child process.
env: Specifies an environment variables object.
encoding: The character encoding.
shell: Specifies the shell used for execution (/bin/sh on Unix, cmd.exe on Windows).
timeout: Sets a timeout in milliseconds; the child process will be killed if execution exceeds this time.
maxBuffer: Sets the maximum buffer size for stdout and stderr (default: 1024 * 1024 or 1MB).
killSignal: Defines the signal used to terminate the process (default: 'SIGTERM').

The callback function receives:

error: An Error object if the command execution fails or returns a non-zero exit code; otherwise, null.
stdout: The standard output of the command.
stderr: The standard error output.

Here is an example using exec():

const { exec } = require('child_process');
const path = require('path');

// Define the command to execute, including the file path
const command = `touch ${path.join('./m', 'moment.txt')}`;

exec(command, { cwd: process.cwd() }, (error, stdout, stderr) => {
  if (error) {
    console.error(`Error executing command: ${error}`);
    return;
  }
  if (stderr) {
    console.error(`Standard error output: ${stderr}`);
    return;
  }
  console.log('File created successfully');
});

Running this code will create the file and display the appropriate output.

fork()

The fork() method in the child_process module is a specialized way to create a new Node.js process that communicates with the parent process via an inter-process communication (IPC) channel. fork() is particularly useful when running Node.js modules separately and is beneficial for parallel execution on multi-core CPUs.

The basic syntax of fork() is:

const { fork } = require('child_process');

const child = fork(modulePath, [args], [options]);

modulePath: A string representing the path of the module to run in the child process.
args: An array of strings containing arguments to pass to the module.
options: An optional object to configure the child process.

The options object can include:

cwd: The working directory of the child process.
env: An object containing environment variables.
execPath: The path to the Node.js executable used to create the child process.
execArgv: A list of arguments passed to the Node.js executable but not to the module itself.
silent: If true, redirects the child's stdin, stdout, and stderr to the parent process; otherwise, they inherit from the parent.
stdio: Configures standard input/output streams.
ipc: Creates an IPC channel for communication between the parent and child processes.

A child process created using fork() automatically establishes an IPC channel, allowing message passing between the parent and child processes. The parent can send messages using child.send(message), and the child process can listen for these messages using process.on('message', callback). Similarly, the child process can send messages to the parent using process.send(message).

Here is an example demonstrating how to use fork() to create a child process and communicate via IPC:

`index.js` (Parent Process)

const { fork } = require('child_process');

const child = fork('./child.js');

child.on('message', (message) => {
  console.log('Message from child process:', message);
});

child.send({ hello: 'world' });

setInterval(() => {
  child.send({ hello: 'world' });
}, 1000);

`child.js` (Child Process)

process.on('message', (message) => {
  console.log('Message from parent process:', message);
});

process.send({ foo: 'bar' });

setInterval(() => {
  process.send({ hello: 'world' });
}, 1000);

In this example, the parent process (index.js) creates a child process that runs child.js. The parent process sends a message to the child, which receives and logs it, then sends a response back. The parent also logs messages received from the child. A timer ensures periodic message exchange.

Using fork(), each child process runs as a separate Node.js instance with its own V8 engine and event loop. This means that creating too many child processes may lead to high resource consumption.

Worker Threads

The worker_threads module in Node.js provides a mechanism for running multiple JavaScript tasks in parallel within a single process. This allows applications to fully utilize multi-core CPU resources, especially for CPU-intensive tasks, without spawning multiple processes. Using worker_threads can significantly improve performance and enable complex computations.

Key Concepts of Worker Threads:

Worker: An independent thread that executes JavaScript code. Each worker runs in its own V8 instance, has its own event loop, and local variables, meaning it can operate independently of the main thread or other workers.
Main Thread: The thread that initiates a worker. In a typical Node.js application, the initial JavaScript execution environment (the event loop) runs on the main thread.
Communication: The main thread and workers communicate by passing messages. They can send JavaScript values, including ArrayBuffer and other transferable objects, allowing efficient data transfer.

Here is a basic example demonstrating how to create a worker and communicate between the main thread and the worker:

const { Worker, isMainThread, parentPort } = require('worker_threads');

if (isMainThread) {
  // Main thread
  const worker = new Worker(__filename);
  worker.on('message', (message) => {
    console.log('Message from Worker:', message);
  });
  worker.postMessage('Hello Worker!');
} else {
  // Worker thread
  parentPort.on('message', (message) => {
    console.log('Message from main thread:', message);
    parentPort.postMessage('Hello Main Thread!');
  });
}

In this example, the index.js file serves both as the main thread entry point and the worker script. By checking isMainThread, the script determines whether it is running in the main thread or as a worker. The main thread creates a worker that executes the same script, then sends a message to the worker. The worker responds back via postMessage().

Differences Between `worker_threads` and `fork()`

Concept:

worker_threads: Uses worker threads to execute JavaScript code in parallel within the same process.
fork(): Spawns a separate Node.js process, each with its own V8 instance and event loop.

Communication:

worker_threads: Uses MessagePort to transfer JavaScript values, including ArrayBuffer and MessageChannel.
fork(): Uses IPC (inter-process communication) via process.send() and message events.

Memory Usage:

worker_threads: Shares memory, reducing redundant data copies, allowing for better performance.
fork(): Each forked process has a separate memory space and its own V8 instance, leading to higher memory usage.

Best Use Cases:

worker_threads: Suitable for CPU-intensive computations and parallel processing.
fork(): Suitable for running independent Node.js applications or isolated services.

Overall, whether to use worker_threads or fork() depends on your application's needs. If you require strict process isolation, fork() may be the better option. However, if you need efficient parallel computation and data processing, worker_threads offers better performance and resource utilization.

Cluster (Clustering)

The cluster module in Node.js allows the creation of child processes that share the same server port. This enables a Node.js application to run across multiple CPU cores, improving performance and throughput. Since Node.js is single-threaded, its non-blocking I/O operations work well for handling many concurrent connections. However, for CPU-intensive tasks or when distributing the workload across multiple cores, using the cluster module is particularly useful.

The basic working principle of the cluster module is that it allows a master process (often called the "master") to create multiple worker processes, which are essentially copies of the main process. The master process manages these workers and distributes incoming network connections among them.

Internally, the cluster module uses child_process.fork() to create worker processes, meaning that each worker runs the same application code. The key difference is that they can communicate with the master process via IPC (inter-process communication).

Here is a simple example using the cluster module:

const cluster = require('cluster');
const http = require('http');
const numCPUs = require('os').cpus().length;

if (cluster.isMaster) {
  console.log(`Master process ${process.pid} is running`);

  // Fork worker processes
  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on('exit', (worker, code, signal) => {
    console.log(`Worker process ${worker.process.pid} exited`);
  });
} else {
  // Worker processes can share any TCP connection
  // In this example, they create an HTTP server
  http
    .createServer((req, res) => {
      res.writeHead(200);
      res.end('hello world\n');
    })
    .listen(8000);

  console.log(`Worker process ${process.pid} started`);
}

When running this script and making requests to the server, you will see different process IDs in the logs, indicating that multiple worker processes are handling requests.

In this example, the master process creates worker processes equal to the number of CPU cores. Each worker process runs independently, handling incoming HTTP requests. If a worker process exits, the master process is notified via the exit event.

While the cluster module improves performance and reliability, it also adds complexity, such as managing worker lifecycles and handling inter-process communication. In some cases, alternative solutions like using a process manager (e.g., pm2) may be more suitable.

However, the cluster module is not necessary for all applications. For non-CPU-intensive applications, a single Node.js instance may be sufficient to handle all workloads.

Summary

Child processes allow Node.js applications to execute operating system commands or run independent Node.js modules, improving concurrency handling. Using APIs like exec(), spawn(), and fork(), developers can flexibly create and manage child processes, enabling complex asynchronous and non-blocking operations. This allows applications to fully utilize system resources and multi-core CPU advantages without interfering with the main event loop.

By choosing the appropriate threading method—whether child processes, worker threads, or clustering—you can optimize your Node.js application for both performance and scalability.

We are Leapcell, your top choice for hosting Node.js projects.

Leapcell is the Next-Gen Serverless Platform for Web Hosting, Async Tasks, and Redis:

Multi-Language Support

Develop with Node.js, Python, Go, or Rust.

Deploy unlimited projects for free

pay only for usage — no requests, no charges.

Unbeatable Cost Efficiency

Pay-as-you-go with no idle charges.
Example: $25 supports 6.94M requests at a 60ms average response time.

Streamlined Developer Experience

Intuitive UI for effortless setup.
Fully automated CI/CD pipelines and GitOps integration.
Real-time metrics and logging for actionable insights.

Effortless Scalability and High Performance

Auto-scaling to handle high concurrency with ease.
Zero operational overhead — just focus on building.

Explore more in the Documentation!

Read on our blog

DEV Community

Scaling Node.js: Multi-threading You Need to Know

Why Enable Subthreads?

Methods to Enable Subthreads

Child Processes

spawn()

exec()

fork()

`index.js` (Parent Process)

`child.js` (Child Process)

Worker Threads

Key Concepts of Worker Threads:

Differences Between `worker_threads` and `fork()`

Cluster (Clustering)

Summary

We are Leapcell, your top choice for hosting Node.js projects.

Top comments (0)

Read next

Simplifying Remote Access: Hosting ComfyUI Online with Pinggy

Introducing Jolt: AI Codegen and Chat for 100K to Multi-Million Line Codebases

ServBay 1.8.0 Released: Support for Apache

Test Data Management Tools: A Complete Guide

Why Enable Subthreads?

Methods to Enable Subthreads

Child Processes

spawn()

exec()

fork()

index.js (Parent Process)

child.js (Child Process)

Worker Threads

Key Concepts of Worker Threads:

Differences Between worker_threads and fork()

Cluster (Clustering)

Summary

We are Leapcell, your top choice for hosting Node.js projects.

Read next

Simplifying Remote Access: Hosting ComfyUI Online with Pinggy

Introducing Jolt: AI Codegen and Chat for 100K to Multi-Million Line Codebases

ServBay 1.8.0 Released: Support for Apache

Test Data Management Tools: A Complete Guide

`index.js` (Parent Process)

`child.js` (Child Process)

Differences Between `worker_threads` and `fork()`