Ilya

Posted on Nov 19 • Originally published at ilya-chumakov.com

What is inside Rate Limiting for .NET

#dotnet #aspnet #csharp #dotnetcore

The Rate Limiting API debuted in .NET 7. It implements several popular algorithms for limiting the number of requests to a shared resource. This API is typically promoted as a part of a built-in rate-limiting ASP.NET Core middleware. However, the API itself doesn't depend on ASP.NET and has a broader scope. This API is recently written and may reflect the current state of concurrency in .NET. Finally, it's a production-ready library, not a book example of a semaphore in a loop with sleep. So let's take a look inside and see if we can learn something.

Rate Limiting API overview

Disclaimer: In a mature system, and for the sake of horizontal scaling, rate limiting is often implemented at the infrastructure layer (e.g. Rate Limiting with NGINX), except for some tricky quota partitioning driven by domain logic.

Middleware

As mentioned above, there is a built-in RateLimitingMiddleware in ASP.NET Core. Its basic usage is extensively covered in Microsoft Learn and community blogs, so allow me to skip it. There is not much inside: the midlleware basically does two things:

Choosing a limiter by the endpoint's metatada and global settings.
Trying to acquire the protected resource (i.e. the endpoint).

Things get more interesting at the rate limiter level.

Limiting algorithms

System.Threading.RateLimiting namespace contains limiters that implement the following algorithms:

Each limiter, except ConcurrencyLimiter, caps the number of requests in a time period. ConcurrencyLimiter caps only the number of concurrent requests. All limiters are derived from the abstract RateLimiter class and implement the following public API:

// Fast synchronous attempt to acquire permits.
RateLimitLease AttemptAcquire(int permitCount);

// Await until the requested permits are available
// or can no longer be acquired.
ValueTask<RateLimitLease> AcquireAsync(int permitCount);

We can see how both methods are used at the middleware level. First, a fast sync attempt is made. If it fails, then the async method is called, possibly hitting a wait queue (I discuss this queue in the next section).

Partitioning and chaining

Another API worth mentioning is PartitionedRateLimiter<TResource>, which is responsible for:

Partitioning: selecting a limiter at runtime by the TResource value.
Chaining: combining limiters into a sequence.

An example of partitioning:

using var limiter = PartitionedRateLimiter.Create<string, int>(
    resource =>
{
    if (resource == "foo_resource_id")
    {
        return RateLimitPartition.GetFixedWindowLimiter(
            partitionKey: 1, 
            _ => new FixedWindowRateLimiterOptions {...});
    }
    else
    {
        return RateLimitPartition.GetConcurrencyLimiter(
            partitionKey: 2,
            _ => new ConcurrencyLimiterOptions {...});
    }
});

var lease = limiter.AttemptAcquire("foo_resource_id");

An example of chaining:

using var limiter1 = PartitionedRateLimiter.Create<string, int>(
    resource =>
{
    return RateLimitPartition.Get(partitionKey: 1, 
        key => concurrencyLimiter1);
});

using var limiter2 = PartitionedRateLimiter.Create<string, int>(
    resource =>
{
    return RateLimitPartition.Get(partitionKey: 2, 
        key => concurrencyLimiter2);
});

using var chainedLimiter = PartitionedRateLimiter
    .CreateChained<string>(limiter1, limiter2);

Note that chaining is only supported for partitioned limiters. Non-partitioned limiters must be wrapped into partitioned ones. This wrapping doesn't prevent or otherwise affect the direct use of original limiters, i.e. the same instance can be used as a part of a chain and independently at the same time.

What is inside a limiter?

Let's discuss the main components of a limiter. It may seem obvious, but to implement a public API as in Limiting algorithms we need a permit counter, and a wait queue. For time-based limiters, we also need a timer.

Counter

Each limiter must track the amount of available permits. There is no fancy data structure behind this: _permitCount counter is just an integer private field. The real challenge is to correctly update that little cute field in a multi-threaded environment. For example, the inner logic of the AttemptAcquire method of ConcurrencyLimiter looks pretty simple...

RateLimitLease AttemptAcquire(int permitCount)
{
    if (_permitCount >= permitCount)
    {
        lock (Lock)
        {
            // it tries to reduce _permitCount inside:
            // _permitCount -= permitCount;
            if (TryLeaseUnsynchronized(permitCount, 
                out RateLimitLease? lease))
            {
                return lease;
            }
        }
    }
}

Ignoring the fact that I've omitted 99% of the supplementary code, which covers thread synchronisation and various edge cases. A LOT of edge cases. Ensuring secure write access to shared data is inherently complex, and the limiter's complete code is no joke. And there is even more to discuss.

Wait queue

Any limiter from above supports queuing permit requests, which can be de-queued (FIFO or LIFO) at some point in the future when the permits become available. Suitable job for a double-linked list, or a double-ended queue! The developers chose the latter, and built the wait queue on top of the internal Deque collection. There is a long-term discussion about implementing a public version of Deque, but unfortunately it is not even close to a deal.

This double-ended queue has many similarities with such well-known collections as Stack<T> or Queue<T>:

It is built on top of... a simple T[] array with some index math around it.
It grows in size by allocating a new array of x2 size when it reaches the current array capacity.
It is not thread-safe.

The last point is the most important: again, lock statements are heavily used to ensure exclusive access to a shared Deque instance.

Timer

Another important building block is a timer. Indeed, except for ConcurrencyLimiter, each of the rate limiting algorithms considered has to repeat an action over time (to replenish the bucket, move the window and so on). This is where System.Threading.Timer comes in handy, it's a lightweight class that can execute a callback method on a thread from the thread pool at regular intervals.

For TokenBucketRateLimiter, implementing bucket replenishment is pretty straightforward:

_renewTimer = new Timer(
                    Replenish, 
                    this, 
                    _options.ReplenishmentPeriod);

Things get more interesting in DefaultPartitionedRateLimiter<TResource, TKey>, where TimerAwaitable is introduced as an internal async-over-sync wrapper over the synchronous System.Threading.Timer API:

_timer = new TimerAwaitable(timerInterval, timerInterval);
_timerTask = RunTimer();

private async Task RunTimer()
{
    _timer.Start();
    while (await _timer)
    {
        try
        {
            // handle a cache of created limiters
            await Heartbeat().ConfigureAwait(false);
        }
        catch { }
    }
    _timer.Dispose();
}

TimerAwaitable is a good reminder of the fact that not every awaitable expression has to be a Task. In fact, the compiler requirements for 'await' are quite loose.

Let's have a look at what's in TimerAwaitable:

private void Tick()
{
    Interlocked.Exchange(ref _callback, _callbackCompleted);
}

public TimerAwaitable GetAwaiter() => this;

public bool IsCompleted => ReferenceEquals(_callback, _callbackCompleted);

public bool GetResult()
{
    _callback = null;
    return _running;
}

Here is a System.Threading.Timer instance with only one job: on every "tick" set _callback field to its "completed" value. This field, despite the name, is mostly used to determine whether this quazi-task is completed or not. The value is later reset to null in the GetResult call. These methods plus the implementation of the ICriticalNotifyCompletion interface allow to await the timer instance itself, without allocation of a new task!

Usage Example

You might have expected an example of how to use the Rate Limiting API at the end of this post. I was going to, but then I had a better idea. In this post I've mostly shown code written by someone else (probably by a better software engineer than me), so let's continue that exploitation.

This API is extensively covered by tests, like this one. These tests are a better example than I could ever provide. They are isolated and run locally, assuming you have cloned and built the dotnet/runtime repo on your machine. I strongly recommend that you give it a try. Building the entire .NET runtime from source may seem intimidating at first, and it will take some time. But if you follow the Workflow Guide, this mega project shall compile without a single warning.

Summary

In this post I discussed the internals of the built-in .NET Rate Limiting API. I showed how the limiter behavour can be tailored to your application's needs by using techniques such as partitioning and chaining. I examined the source code in the System.Threading.RateLimiting namespace and discussed the key components of a limiter: a counter, a wait queue, and a timer.

The most popular examples of using the Rate Limiting API are also the most controversial. It's natural to enable rate limiting for inbound HTTP requests at the infrastructure layer, while outbound limiting is easily implemented within an application using Polly. I would love to hear about other real-world uses of the Rate Limiting API, so please let me know if you have one.

DEV Community

What is inside Rate Limiting for .NET

Rate Limiting API overview

Middleware

Limiting algorithms

Partitioning and chaining

What is inside a limiter?

Counter

Wait queue

Timer

Usage Example

Summary

Top comments (0)

Read next

C# 13 novidades

Any() vs Count, Which one to choose??

StepWise: A Powerful C# Workflow Engine for Task Execution

Experimental attribute in C# is a powerful tool : Let's explore