DEV Community

Jing for Chat2DB

Posted on

MySQL Master-Slave Replication Delay Optimization

What is MySQL Master-Slave Replication?

Master-slave replication refers to creating an identical database environment to the master database (called the slave) and synchronizing the operations performed on the master database to the slave. To ensure data consistency, DDL and DML operations on the master database are synchronized to the slave through binary logs (Binlog). The slave then reads these logs and applies the operations to keep the data consistent.

Why Use Master-Slave Replication?

  1. Improved Performance: In complex business operations, certain actions can cause row or even table locking. If reading and writing are not decoupled, it could severely impact business operations. With master-slave replication, the master database handles writes and the slave handles reads, which makes the responsibilities clearer and improves performance. Even if the master database encounters table locks, the business can continue by reading from the slave.

  2. Hot Backup: In case the master database goes down, a slave can quickly take over as the new master, ensuring business continuity.

  3. Scalable Architecture: As business volume grows, the frequency of I/O operations increases, making a single machine unable to handle the load. Master-slave replication enables a multi-database setup that reduces disk I/O and enhances performance.

  4. Separation of Concerns: Master-slave replication and read-write splitting help in load balancing by distributing the workload.

  5. Read-Write Ratio: The ratio of reads to writes affects the distribution of workload between master and slave databases. A higher read-to-write ratio would require more slaves to balance the load, as shown in the table below:

Read/Write Ratio (Approx.) Master Slaves
50:50 1 1
66.6:33.3 1 2
80:20 1 4

Why Does Master-Slave Replication Lag?

When replication is initiated on the slave, it creates an I/O thread that connects to the master. The master creates a Binlog Dump thread that reads the database events and sends them to the I/O thread. The I/O thread then updates the events to the slave’s relay log. The SQL thread on the slave reads the relay log and applies the changes. Here's an illustration of this process:

Image description

Breakdown of the Process:

  1. The master records data changes (INSERT, DELETE, UPDATE) as events in the binary log (binlog) when a transaction is committed.
  2. A worker thread, the binlog dump thread, sends the binlog content to the slave's relay log.
  3. The slave replays the changes from the relay log to maintain consistency between the master and slave.
  4. MySQL uses three threads to handle replication: the binlog dump thread on the master and the I/O and SQL threads on the slave. For each connected slave, the master creates a binlog dump thread.

Analyzing the Causes of Replication Delay

What is Master-Slave Replication Lag?

Master-slave replication lag refers to the delay that occurs when a slave server receives and applies changes from the master. This delay is the time taken for the data changes on the master to propagate and be applied on the slave. The consequence is that the data queried from the slave may be outdated or inconsistent with the master.

Replication lag can become significant under high concurrency or when there is a large volume of data changes. The core issue arises because the slave’s SQL thread may not be able to handle the volume of DML operations generated by the master, which reduces efficiency.

Image description

Other Contributing Factors:

  • High Load on the Master: If the master has a heavy load and generates many changes, the transmission speed of the logs may slow down, increasing lag.
  • High Load on the Slave: If the slave is under heavy load, the process of applying changes can be delayed, leading to lag.
  • Network Latency: Unstable network connections or insufficient bandwidth between the master and slave can also slow down data transmission, causing delays.
  • Hardware Performance Disparities: Differences in CPU, memory, and disk performance between the master and slave can affect replication speed.
  • Misconfigured MySQL Settings: For example, large binary logs on the master or poorly configured relay logs on the slave can slow down replication.
  • Lock Waits on Large Queries: Long-running or resource-intensive queries on the slave may result in locks, causing delays.

Master-Slave Replication Delay Optimization Solutions

Optimal System Configuration

Optimizing system settings (system-level, connection layer, storage engine layer) ensures that the database runs at its best. Adjustments should include maximum connections, error limits, timeout settings, pool sizes, and log sizes to guarantee the system can scale properly.

For MySQL on Linux, certain kernel parameters can help optimize performance:

# TIME_WAIT timeout, default is 60 seconds
net.ipv4.tcp_fin_timeout = 30
# Increase TCP backlog queue size to handle more waiting connections
net.ipv4.tcp_max_syn_backlog = 65535
# Reduce resource recycling after connection closure
net.ipv4.tcp_max_tw_buckets = 8000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 10
# Open file limits
*soft nofile 65535
*hard nofile 65535
Enter fullscreen mode Exit fullscreen mode

In MySQL 5.5+, the default storage engine is InnoDB, and here are some MySQL and InnoDB parameters that can be adjusted to improve performance:

# MySQL Parameters
max_connections = 151  # Adjust based on workload, typically 80% of the maximum
sort_buffer_size = 16M # Increase buffer size for ORDER BY and GROUP BY
open_files_limit = 1024 # Ensure this is sufficient for open files

# InnoDB Parameters
innodb_buffer_pool_size = 1G # 70% of system memory, if dedicated to MySQL
innodb_buffer_pool_instances = 4 # Number of buffer pool instances
innodb_flush_log_at_trx_commit = 1 # Ensure high data durability, set to 2 for better performance
sync_binlog = 1
innodb_file_per_table = ON
Enter fullscreen mode Exit fullscreen mode

Database Partitioning

Database partitioning is essential for managing replication delays. A frequent cause of lag is a heavily used single database that overburdens the SQL thread. Splitting the database by function or load can help distribute pressure.

Ensure Data Sync Before Acknowledging Writes

If business requirements permit, ensure that data is synchronized to all slaves before returning a success response after writing to the master. However, this solution can significantly impact performance and should be used with caution, particularly in high-throughput systems.

Syncing Data

Use Caching to Mitigate Delay

In scenarios where replication delay is an issue, you can store frequently queried data in Redis or other NoSQL databases. When writing data, also write it to Redis. When reading data, first check Redis; if the data is available, retrieve it directly from Redis. Once the data is synced to the database, remove the cache entry.

Caching

A few important considerations:

  1. Caching helps alleviate delay but may not be ideal for high concurrency due to frequent cache invalidations.
  2. In high-concurrency situations, if the slave has not yet synchronized, the cache may be updated, leading to inconsistencies when the cache is invalidated.

Image description

As shown in the diagram above, if the values are updated sequentially as 1, 2, and 3, the master-slave synchronization will occur in the same order. After the update to 1 is synchronized, the cache will be updated to 3. At this point, if the cache is deleted, read requests will be directed to the slave server, which will return the value 1, causing a temporary inconsistency in the data.

Therefore, it is important to consider this situation. One approach is to also save the unique key (such as the primary key) and perform a check before deletion to prevent accidental removal. Alternatively, you could avoid real-time cache deletion and handle it during off-peak hours.

Multi-Threaded Relay Log Replay

MySQL uses a single thread to replay the relay log, which can cause a bottleneck. A potential solution is to use multiple threads to replay the logs in parallel, but this approach requires careful handling to maintain data consistency.

To achieve parallel processing, you can split the relay log across multiple threads, ensuring that write operations on the same table are serialized and different tables can be processed concurrently.

1 UPDATE t_score SET score = 721 WHERE stu_code = 374532;
2 UPDATE t_score SET score = 806 WHERE stu_code = 374532;
3 UPDATE t_score SET score = 899 WHERE stu_code = 374532;
Enter fullscreen mode Exit fullscreen mode

Image description

By using hashing, you can assign each table’s operations to specific threads for parallel processing, improving performance.

Read Directly from the Master for Low-Traffic Scenarios

For certain low-traffic scenarios, you can bypass the slave and read directly from the master. This reduces reliance on replication and ensures real-time consistency. However, it adds complexity to the application and should be used only when necessary.

Throttling and Downgrading

All systems have throughput limitations, and no solution can handle unlimited traffic. By estimating the system’s capacity, you can apply caching, throttling, and downgrading strategies once the system reaches its limit.

Multi-Threaded Replication Support in MySQL (Version 5.6 and Above)

MySQL 5.6 introduced support for multi-threaded replication (also known as parallel replication), and MySQL 5.7 further enhanced this feature by enabling GTID-based parallel replication. In MySQL 5.6, replication is single-threaded by default, but you can enable multi-threaded replication by configuring the slave_parallel_workers parameter.

To enable multi-threaded replication, follow these steps:

1. Ensure your MySQL version is 5.6 or higher

2. Modify the multi-threaded replication configuration

Edit the MySQL my.cnf (or my.ini) configuration file on the slave server, and set the slave_parallel_workers parameter to the desired number of worker threads, for example:

[mysqld]
slave_parallel_workers = 8
Enter fullscreen mode Exit fullscreen mode

3. Restart the MySQL service to apply the changes.

4. Verify that multi-threaded replication is enabled:

SHOW VARIABLES LIKE 'slave_parallel_workers';
Enter fullscreen mode Exit fullscreen mode

If the returned value is greater than 0, it indicates that multi-threaded replication is enabled, and the specified number of threads will be used to apply log events.

Conclusion

The various solutions mentioned above each have their pros and cons, and the choice of solution should be based on the specific use case and requirements.

For those looking to streamline database management, boost efficiency, and integrate AI-driven features into your MySQL workflow, Chat2DB can be an essential tool. Chat2DB offers intelligent SQL generation, visual data management, and powerful query optimization, helping you take control of your database performance.


Community

Go to Chat2DB website
🙋 Join the Chat2DB Community
🐦 Follow us on X
📝 Find us on Discord

Top comments (0)