Ryan Zhi

Posted on Jan 20

Considerations for Using Distributed Locks in High-Concurrency Scenarios

Performance Overhead of Distributed Locks

Using distributed locks inherently introduces performance issues. When a thread that acquires a lock is delayed due to various reasons, other tasks will be blocked, waiting for the release of the lock.

How can we address the issue of blocking threads, such as when a large number of complaint tickets are being created and threads are waiting for lock resources to be released, without making the user aware of the delay?

One approach is to use a delayed queue in RocketMQ to continue waiting for the lock without blocking. While waiting for the lock, the system can continue performing other operations. Front-end displays can include a progress bar or a spinning icon to mask the blocking behavior, making customers believe that the complaint ticket creation has successfully completed despite the high concurrency.

Scenario: How to Prevent Other Threads from Blocking and Ensure the User Is Unaware of Thread Blocking?

Problem Background

In the process of creating a complaint ticket, the system needs to query the driver's information level from the risk control department. If the credit level is poor, a high-priority ticket is created for manual review by a customer service representative. The risk control department provides an API that requires a token for authentication. However, this token changes frequently and is not available in real-time; instead, it is passed and cached through a callback API.

The problem occurs when the token expires, and multiple requests simultaneously attempt to acquire a new token. This overloads the risk control department's API, causing it to trigger flow control, rejecting large numbers of requests and impacting the creation of complaint tickets.

Problem Analysis

The root cause is that in a high-concurrency environment, multiple requests attempt to initialize a new token simultaneously, causing excessive pressure on the risk control API. Even if a new token is successfully obtained, improper handling of concurrent requests may lead to failure in correctly retrieving user credit scores, which impacts complaint ticket creation.

Solution

To resolve this issue, we implemented Redisson's distributed lock mechanism. When the token expires, a lock is acquired to ensure that only one request at a time initializes a new token, preventing a flood of concurrent requests from overwhelming the risk control API.

Implementation Details

1.Distributed Lock Implementation: Using Redisson's distributed lock functionality, the lock key is generated by concatenating a fixed string with the current token.

2.Lock Logic: Within the lock, a while loop checks if the token in the cache matches the requested token. If they match, the API is called to initialize a new token and wait for one second. If they do not match, the loop exits, and the request uses the latest token to query the risk control level API.

3.Double-Check: This method ensures token consistency through double-checking, avoiding redundant token retrievals and ensuring that the process operates smoothly.

Advantages

Solves the Token Expiry Issue for Large Numbers of Users: The distributed lock prevents a flood of concurrent requests from overwhelming the risk control API.
Prevents Risk Control Department API from Triggering Risky Operations: By using the lock, the system ensures token consistency and stability, avoiding unnecessary API calls.

Further Optimizations

In addition to using distributed locks, the following optimization strategies can be considered:

1.Coordinate with the Big Data Department: Replace the token during off-peak hours to reduce system pressure during peak times.

2.Event-Driven Mechanism: Set up a service that listens for token updates from the big data department. When the token is updated, the service can notify and refresh the system’s cache for smooth token transition.

3.Trade-off between Eventual Consistency and Strong Consistency: During high-traffic periods, to ensure system stability, it may be necessary to sacrifice some performance and use distributed locks to ensure strong consistency.

This approach ensures high-concurrency operations can be managed without compromising on system stability or user experience. Let me know if you need any further adjustments or explanations:)

高并发场景下应用分布式锁的思考

分布式锁的带来的性能损耗问题

使用到分布式锁必定会带来性能问题，当由于某些原因，导致获取锁的线程无法快速地完成自己的任务，其他的任务会进行阻塞等待锁资源的释放。
如何去解决大批量客诉单创建线程在等待锁资源的释放这个过程中让用户无感的知道呢？
可以使用RecketMQ中延时队列去实现，继续去等待这个锁资源，而不是阻塞在这里，继续去做其他操作。并且前端配合做一个进度条或者转圈，让客户无感的感受客诉单，客户认为客诉单硬已经创建成功了，高并发创建阻塞线程。

场景题目：如何在让其他线程不阻塞？让用户不会感觉因为线程阻塞，而不影响实际业务场景的线程。

问题背景

在创建客诉单的过程中，需要调用风控部门的接口查询司机信息等级。如果信用等级较差，会创建一个高飞单，由专门的客服进行审核。风控部门提供的接口需要一个token进行认证，但这个token会经常修改，且获取token的过程不是实时的，而是通过一个回调接口进行传输和缓存。
问题出现在，如果token过期，大量请求会同时尝试获取新的token，这会导致风控部门的接口触发流控，大量请求被拒绝，进而影响客诉单的创建。

问题分析

问题的根源在于高并发环境下，多个请求同时尝试初始化新的token，导致风控部门接口压力过大。此外，即使成功获取了新token，由于并发请求的处理不当，也可能导致部分请求未能正确重新获取用户信用积分，影响客诉单的创建。

解决方案

为了解决这个问题，我们采用了Redisson的分布式锁机制。通过在token过期时加锁，确保同一时刻只有一个请求去初始化新token，从而避免了大量并发请求对风控部门接口的冲击。

实现细节

分布式锁的实现：使用Redisson库提供的分布式锁功能，通过拼接固定字符串和当前token作为锁的key。
锁内逻辑：在锁内部，通过while循环判断缓存中的token是否与当前请求的token相等。如果相等，则调用接口初始化新token并等待一秒；如果不相等，则跳出循环，使用最新的token重新请求风控等级接口。
双重检验：通过这种方式，实现了双重检验，确保了token的一致性，同时也避免了重复请求获取token接口。

优势

解决了大批量用户同时token过期导致的问题：通过分布式锁，避免了大量并发请求对风控部门接口的冲击。
防止了大量用户请求获取token接口触发风控部门的风险操作：通过锁机制，确保了token的一致性和系统的稳定性。

进一步优化

除了使用分布式锁，还可以考虑以下优化方案：

与大数据部门协调：在业务低峰期更换token，避免高峰期的系统压力。
监听机制：建立一个服务，当大数据部门更换token时，主动通知并刷新服务缓存，实现token的平滑切换。
最终一致性与强一致性的权衡：在流量高峰期，为了保证系统的稳定性，可能需要牺牲一定的性能，采用分布式锁来保证强一致性。
大数据部门使用MQ广播通知到每个微服务平台也是可以。

DEV Community