DEV Community

pawan deore
pawan deore

Posted on

Processing 1 Million Records in Node.js and MySQL Efficiently

Handling large datasets in Node.js with MySQL can be challenging due to memory constraints and performance bottlenecks. Processing 1 million records efficiently requires optimizing queries, using streaming, and ensuring proper indexing. In this article, we'll go through best practices and code examples for handling large datasets efficiently.

πŸš€ Challenges in Processing Large Data in Node.js

  1. Memory Consumption -- Fetching all records at once can overload memory.
  2. Query Performance -- Large dataset queries can slow down if not optimized.
  3. Concurrency & Bottlenecks -- Processing data in batches is necessary to avoid blocking the event loop.

Learn NodeJs by Building Projects


πŸ› οΈ Solutions to Process 1 Million Records

1. Use Pagination or Batching

Instead of retrieving all records at once, process them in smaller chunks.

Example: Fetching Data in Batches

const mysql = require('mysql2/promise');

async function processLargeDataset() {
    const connection = await mysql.createConnection({
        host: 'localhost',
        user: 'root',
        password: 'password',
        database: 'test_db'
    });

    const batchSize = 10000; // Process 10K records at a time
    let offset = 0;
    let rows;

    do {
        [rows] = await connection.execute(
            `SELECT * FROM large_table ORDER BY id LIMIT ?, ?`,
            [offset, batchSize]
        );

        if (rows.length) {
            console.log(`Processing ${rows.length} records...`);
            await processData(rows);
        }

        offset += batchSize;
    } while (rows.length > 0);

    await connection.end();
}

async function processData(records) {
    for (const record of records) {
        // Perform operations like transformation, writing to another table, etc.
    }
}

processLargeDataset();
Enter fullscreen mode Exit fullscreen mode

Why is this effective?

  • Uses LIMIT ? OFFSET ? to fetch records in chunks.
  • Prevents memory overload by processing a limited set of records at a time.

2. Use MySQL Streaming for Large Data

Instead of loading everything in memory, use MySQL's streaming capability.

Example: Using MySQL Streams

const mysql = require('mysql2');

const connection = mysql.createConnection({
    host: 'localhost',
    user: 'root',
    password: 'password',
    database: 'test_db'
});

const query = connection.query('SELECT * FROM large_table');

query
  .stream()
  .on('data', (row) => {
      console.log('Processing row:', row);
      // Perform processing on each row
  })
  .on('end', () => {
      console.log('All rows processed.');
      connection.end();
  });
Enter fullscreen mode Exit fullscreen mode

Why is this better?

  • Uses streaming, so only a few records are kept in memory at a time.
  • Faster than traditional batch processing when dealing with large datasets.

3. Optimize MySQL Queries

If the dataset is too large, make sure queries are optimized:

  • Use Indexing: Ensure that columns used in WHERE, ORDER BY, and JOIN clauses are indexed.
  • *Avoid SELECT **: Fetch only the required columns to reduce memory usage.
  • Use MySQL Partitioning: If applicable, partition large tables for better performance.

4. Bulk Insert for Faster Processing

If the goal is to transfer or update large datasets, use bulk inserts instead of inserting records one by one.

Example: Bulk Insert

async function bulkInsert(records) {
    const connection = await mysql.createConnection({
        host: 'localhost',
        user: 'root',
        password: 'password',
        database: 'test_db'
    });

    const values = records.map(record => [record.id, record.name, record.value]);

    await connection.query(
        `INSERT INTO new_table (id, name, value) VALUES ?`,
        [values]
    );

    await connection.end();
}
Enter fullscreen mode Exit fullscreen mode

Why is this better?

  • A single query inserts multiple records, reducing query overhead.
  • Improves performance when handling large data migrations.

βœ… Conclusion

Processing 1 million records in Node.js with MySQL requires batch processing, streaming, query optimization, and bulk operations. Using the right approach ensures better performance and prevents memory crashes.

Key Takeaways:

βœ” Use batch processing (LIMIT OFFSET) for handling records in chunks.\
βœ” Use MySQL streaming to avoid loading all records into memory.\
βœ” Optimize queries with indexes and selective column fetching.\
βœ” Use bulk inserts to speed up data migration or updates.

By following these best practices, you can efficiently handle large datasets in Node.js without running into memory issues or slow query performance. πŸš€

Top comments (0)