Why Cron Jobs Can Cause Problems in a Multi-Instance Environment
Cron jobs are quite useful for running tasks at specific intervals in the background. However, in a horizontally scaled application, a cron job running simultaneously on multiple instances can lead to repetitive tasks and data inconsistencies. For example, each instance might try to update the same record or repeat a certain task, causing serious issues. To solve this, we should use locking mechanisms to ensure that tasks are only executed by a single instance.
Working with Cron Jobs in NestJS
NestJS provides the @nestjs/schedule
module for cron jobs, which allows us to easily manage cron tasks. This module offers decorators to schedule cron jobs. First, let's install the module:
npm install @nestjs/schedule
After installing, we can define tasks using decorators like @Cron
, @Interval
, and @Timeout
. Below is an example of a simple cron job definition:
import { Injectable } from '@nestjs/common';
import { Cron, CronExpression } from '@nestjs/schedule';
@Injectable()
export class TasksService {
// Define a cron job that runs every minute
@Cron(CronExpression.EVERY_MINUTE)
handleCron() {
console.log('Cron job running: ', new Date().toISOString());
}
}
In the example above, @Cron(CronExpression.EVERY_MINUTE)
specifies a cron job that runs every minute. With the @Cron
decorator, you can set different time intervals for the task. However, this job will be executed by each instance of the application if there are multiple instances. To ensure each cron job is run only once in a multi-instance environment, you need to implement locking mechanisms.
Locking Mechanisms: Solutions to Multi-Instance Issues
Locking mechanisms ensure that a specific task is executed by only one instance, preventing repeated operations or data inconsistencies. Here are the four most common locking methods:
Database-Based Pessimistic Lock
A database-based pessimistic lock is often used in SQL databases. Before a task starts, a row is locked in the database, and the lock is released once the process is complete. For example, if you are using PostgreSQL or MySQL, you can lock a row using commands like SELECT FOR UPDATE
. This method is reliable but can cause performance issues as it requires a database connection for each cron execution.
Advantages:
- Reliable and easily implemented with SQL or NoSQL databases.
- Provides centralized locking due to integration with database transactions.
Disadvantages:
- Can be costly in terms of database connections under heavy load.
Lock Record with Redis
Redis offers fast and lightweight data structures, making it very effective in locking mechanisms. Before a task starts, a lock is created in Redis (e.g., by setting a key), and it is removed once the task completes. Using a timeout with Redis can prevent locks from staying open indefinitely.
Advantages:
- Redis is fast and has low latency.
- Provides a centralized locking mechanism and is easily scalable.
Disadvantages:
- If there are issues with the Redis connection, locks might remain open (though a TTL can mitigate this).
- Requires a separate Redis service.
Monitoring with API Calls
In some projects, tasks are monitored through a central API. Each cron job assigns a “running” status in the API before starting, and this status is updated once the job completes. The API response determines whether the task is run only once. This approach is suitable for projects requiring complex and centralized monitoring.
Advantages:
- Centralized control.
- Makes task monitoring across systems or microservices easier.
Disadvantages:
- Requires additional development (e.g., with a third-party app or shell scripts to make API calls).
- API performance can slightly affect the efficiency of cron jobs.
Task Management Using Queues
Queue-based approaches are highly effective for processing tasks in sequence and reducing duplicate operations. By using BullMQ in NestJS, each cron job can be added to a queue and processed by only one instance. Queue structures are one of the most effective solutions for task management in multi-instance environments.
Advantages: Task processing in sequence and horizontal scalability compatibility.
Disadvantages: Requires additional dependencies (Redis, BullMQ) and configuration.
Adding Jobs to Queue in Multi-Instance Scenarios
In a multi-instance environment where each instance tries to add the same cron job to a queue, Unique Job Definitions can prevent duplicate jobs. BullMQ can block jobs with the same jobId
from being added multiple times, ensuring that even if all instances try to queue the same job, it only gets added once.
Code Example - Redis Lock
Step 1: Create a Redis Client
To set up Redis, first create a Redis client using the ioredis
package:
npm install ioredis
// redis.service.ts
import { Injectable } from '@nestjs/common';
import * as Redis from 'ioredis';
@Injectable()
export class RedisService {
private client: Redis.Redis;
constructor() {
this.client = new Redis({
host: 'localhost', // Redis server host
port: 6379, // Redis server port
});
}
getClient(): Redis.Redis {
return this.client;
}
}
This service allows you to use the Redis client in other files via RedisService
.
Step 2: Using Redis Lock Mechanism for Unique Cron Job Execution
Now, let's write a cron job function that implements locking using the Redis client. In this example, a lock is created in Redis before the task runs; if the lock is acquired, the job executes. Once the job is complete, the lock is released.
// task.service.ts
import { Injectable } from '@nestjs/common';
import { Cron, CronExpression } from '@nestjs/schedule';
import { RedisService } from './redis.service';
@Injectable()
export class TaskService {
private readonly lockKey = 'cron-job-lock'; // Lock key
private readonly lockTTL = 60; // Lock TTL in seconds
constructor(private readonly redisService: RedisService) {}
@Cron(CronExpression.EVERY_MINUTE)
async handleCronJob() {
const client = this.redisService.getClient();
try {
// Try acquiring the lock with NX (Only set if not exists) and EX (Expire time)
const isLocked = await client.set(this.lockKey, 'locked', 'NX', 'EX', this.lockTTL);
if (!isLocked) {
console.log('This cron job is already running on another instance.');
return;
}
// Start the job
console.log('Cron job started: ', new Date().toISOString());
// Place your cron job logic here
await this.executeJob();
console.log('Cron job completed.');
} catch (error) {
console.error('Error occurred during cron job:', error);
} finally {
await client.del(this.lockKey);
console.log('Lock released.');
}
}
private async executeJob() {
return new Promise((resolve) => setTimeout(resolve, 5000)); // Example: 5-second delay
}
}
Bonus PM2
PM2 is a popular process manager for running Node.js applications in production. It can launch applications with multiple instances, utilizing CPU cores efficiently. PM2’s cluster mode allows multiple instances of the same application to run and load balances requests across instances.
Managing Multiple Instances with PM2
Using the -i
parameter, PM2 can start multiple instances. For example, to use the maximum CPU cores, run:
pm2 start app.js -i max
This command starts an instance on each CPU core. Each instance is assigned a unique ID using NODE_APP_INSTANCE
, enabling tasks to be tagged with this ID.
Drawbacks of Multiple Instance Management with PM2
A significant drawback of PM2 is that it doesn’t provide failover or single-task control. If a job needs to be handled by only one instance, PM2 does not natively support this. In dynamic environments where instances are scaled up, PM2 doesn't continuously track the application's state, making it difficult to guarantee job consistency.
PM2 provides a basic solution for multi-instance management but lacks support for single-instance cron jobs. For more reliable duplication prevention, consider using Redis-based lock mechanisms or BullMQ in Kubernetes-managed environments.
Conclusion
This guide covers a common issue with multi-instance environments and provides solutions using different locking mechanisms. While the @nestjs/schedule
library offers basic cron job support, it doesn't fully meet my expectations for multi-instance management as of this writing. There may be more alternatives, and I welcome any comments with suggestions on other options, including their pros and cons.
Good luck 😊
Top comments (0)