DEV Community

Cover image for Rate limiting using Python and Redis
Andrea Stagi
Andrea Stagi

Posted on

Rate limiting using Python and Redis

Rate limiting is a mechanism that many developers may have to deal with at some point in their life. It’s useful for a variety of purposes like sharing access to limited resources or limit the number of requests made to an API endpoint and respond with a 429 status code.

Here we’ll explore some rate limiting algorithms using Python and Redis, starting from a naive approach and culminate at an advanced one called Generic Cell Rate Algorithm (GCRA).

In the following article we'll use redis-py to communicate with Redis (pip install redis). I suggest you to clone my repository from GitHub to make some experiments with rate limiting.

Time Bucketed

A first approach for rate limiting a number of requests within a period is to use a time-bucketed algorithm, where a counter (initially set to limit) and expiry time (period) are stored for each rate-key (something unique like username or IP address) and decremented on subsequent requests.

Using Python and Redis we can implement time-bucket logic in this way:

  • check if the rate-key key exists
  • if key doesn't exist, initialize it to the limit value (Redis SETNX) and an expiration time period (Redis EXPIRE)
  • decrement this value on each subsequent requests (Redis DECRBY)
  • the request is limited only when key value drops below zero
  • after the given period the key will automatically be deleted
from datetime import timedelta
from redis import Redis


def request_is_limited(r: Redis, key: str, limit: int, period: timedelta):
    if r.setnx(key, limit):
        r.expire(key, int(period.total_seconds()))
    bucket_val = r.get(key)
    if bucket_val and int(bucket_val) > 0:
        r.decrby(key, 1)
        return False
    return True
Enter fullscreen mode Exit fullscreen mode

You can see this code in action simulating requests with a limit of 20 requests / 30 seconds (to keep things clear I wrapped the function in a module as you can see in my repository).

import redis
from datetime import timedelta
from ratelimit import time_bucketed


r = redis.Redis(host='localhost', port=6379, db=0)
requests = 25

for i in range(requests):
    if time_bucketed.request_is_limited(r, 'admin', 20, timedelta(seconds=30)):
        print ('🛑 Request is limited')
    else:
        print ('✅ Request is allowed')
Enter fullscreen mode Exit fullscreen mode

Only the first 20 requests won't be limited, you have to wait 30 seconds before being allowed to make a new request again.

The downside to this approach is it actually enforces a limit to the number of requests a user can do within a period rather than a rate. This means that the entire quota can be exhausted immediately, resetting only after the period has expired.

Leaky bucket

There's an algorithm that can take care of rate limiting and produces a very smooth rate limiting effect: the leaky-bucket approach. The algorithm works similarly to an actual leaky bucket: you can pour water (requests) in the bucket up to a maximum capacity while some amount of water is escaping at a constant rate (usually implemented using a background process). If the amount of water entering the bucket is greater than the amount leaving through the leak, the bucket starts to fill and when the bucket is full new requests are limited.

Watch the video

Video: Leaky Bucket Traffic Shaping

We can skip over this for a more elegant approach that doesn't require a separate process to simulate a leak: the Generic Cell Rate Algorithm.

Generic Cell Rate Algorithm

Generic Cell Rate Algorithm (GCRA) comes from a communications technology called Asynchronous Transfer Mode (ATM) and was used in ATM network’s schedulers to delay or drop cells, small fixed size packets of data, that came in over their rate limit.

GCRA works by tracking remaining limit through a time called the Theoretical Arrival Time (TAT) on each request

tat = current_time + period
Enter fullscreen mode Exit fullscreen mode

and limit the next request if the arrival time is less than the stored TAT. This works fine if rate = 1 request / period where each request is separated by period, anyway in the real world we usually have rate = limit / period. For example with rate = 10 requests / 60 seconds a user would be allowed to make 10 requests in the first 6 seconds, with rate = 1 request / 6 seconds the user would have to wait 6 seconds between requests.

To be able to bunch requests in a short period of time and support limit requests within period with limit > 1, each request should be separated by period / limit and the next Theoretical Arrival Time (new_tat) calculated in a different way than before. Indicating with t the time of the request arrival

  • new_tat = tat + period / limit if requests bunch (t <= tat)
  • new_tat = t + period / limit if requests don’t bunch (t > tat)

hence

new_tat = max(tat, t) + period / limit.
Enter fullscreen mode Exit fullscreen mode

The request is limited if new_tat exceeds the current time plus the period, new_tat > t + period. With new_tat = tat + period / limit we have

tat + period / limit > t + period
Enter fullscreen mode Exit fullscreen mode

Hence, we should limit requests only when

tat  t > period  period / limit
Enter fullscreen mode Exit fullscreen mode
       period  period / limit
      <----------------------->
--|----------------------------|--->
  t                           TAT
Enter fullscreen mode Exit fullscreen mode

In this case TAT should not be updated, because limited requests don’t have a theoretical arrival time.

Now it's time to unveil the final code!

from datetime import timedelta
from redis import Redis


def request_is_limited(r: Redis, key: str, limit: int, period: timedelta):
    period_in_seconds = int(period.total_seconds())
    t = r.time()[0]
    separation = round(period_in_seconds / limit)
    r.setnx(key, 0)
    tat = max(int(r.get(key)), t)
    if tat - t <= period_in_seconds - separation:
        new_tat = max(tat, t) + separation
        r.set(key, new_tat)
        return False
    return True
Enter fullscreen mode Exit fullscreen mode

In this implementation we use Redis TIME because GCRA is dependent on time and it’s crucial to make sure that the current time is consistent between multiple deployments (clock drift between machines could lead to false positives).

In the following script you can see GCRA in action with rate = 10 requests / 60 seconds

import redis
from datetime import timedelta
from ratelimit import gcra


r = redis.Redis(host='localhost', port=6379, db=0)
requests = 10

for i in range(requests):
    if gcra.request_is_limited(r, 'admin', 10, timedelta(minutes=1)):
        print ('🛑 Request is limited')
    else:
        print ('✅ Request is allowed')
Enter fullscreen mode Exit fullscreen mode

The first 10 requests are allowed by GCRA, but you need to wait at least 6 seconds to make another request. Try to run the script after some time to see how it works and try to change limit and period (e.g. limit=1 and period=timedelta(seconds=6) 😉).

To keep GCRA implementation clear, I avoided to add a lock between get and set calls, but it's needed in case of multiple requests with the same key at the same time. Using Lua-based lock we can simply add a lock as a context manager.

def request_is_limited(r: Redis, key: str, limit: int, period: timedelta):
    period_in_seconds = int(period.total_seconds())
    t = r.time()[0]
    separation = round(period_in_seconds / limit)
    r.setnx(key, 0)
    try:
        with r.lock('lock:' + key, blocking_timeout=5) as lock:
            tat = max(int(r.get(key)), t)
            if tat - t <= period_in_seconds - separation:
                new_tat = max(tat, t) + separation
                r.set(key, new_tat)
                return False
            return True
    except LockError:
        return True
Enter fullscreen mode Exit fullscreen mode

Check the complete code on GitHub.


Cover photo by Scott Granneman (CC BY-SA 2.0)

Top comments (2)

Collapse
 
omarcham96 profile image
Omar Chammaa

Thank you for the article.

I'm still a bit confused by GRCA, as I don't understand when it should be used. If I specify a rate of 5 requests per minute, I can make up to 10 requests per minute, since I can make 5 requests all together, filling the bucket, then every 12 seconds I can make another request. So it's not really enforcing the limit of 5 requests / minute, so I'm not really sure why it would be used.

Collapse
 
astagi profile image
Andrea Stagi

Thanks Omar :) yes, you're right, when I discovered GCRA this thing confused me too. GCRA allows you to bunch requests and when you reach the limit (5 requests in this case) you can make a new request after 12 seconds and not after a minute, because in opposite to time-bucketed "it actually enforces a rate rather than a limit to the number of requests a user can do within a period". GCRA has the same behaviour as Leaky Bucket, you can fill the bucket immediately and make a new request only when the background process, that runs every N seconds, releases a slot. I hope I've clarified your doubts!