4 Rate Limit Algorithms Every Developer Should Know | by Jason Ngan | Jul, 2022

A beginner’s guide to implementing rate limit

Photo by Lamar Belina on Pexels

If you have dealt with any backend services before, you have heard of the term Rate Limit.

Without this crucial toolkit, clients can make as many requests as they want to your service at any point in time. This leads to a sudden traffic spike, thus, hoarding your servers.

In this blog post, let’s return to the fundamentals and discuss the four commonly used rate-limiting algorithms.

Let’s get started!

1*vG qX3fdN12moc682JbcgA
Photo by Wikimedia Commons

The token bucket is one of the easiest to implement among the other four.

Let’s take a look at the simplified steps.

  • Imagine having a bucket containing N number of tokens
  • If a request arrives, we take one token out from the bucket and process the request
  • Supposed there’s no token left, we discard and do not process the request.
  • At every fixed interval, we replenish the tokens in the bucket back to N number.

We can implement the algorithm using a simple hash map.

Supposed each user can only trigger four requests every minute.

userID = 123
usersBucket[userID] = 4

When a request is received, we reduce the counter in usersBucket[userID] by 1. If the token runs out, we discard the request. At every minute, we reset the counter in the hash map back to 4.

Here’s come a tricky scenario:

  • 12:00:59: User triggers four requests
  • 12:01:00: We reset the tokens back to 4
  • 12:01:01: User triggers four requests again

In the scenario above, the user triggered eight requests within three seconds!

To ensure smoother traffic, the refill rate of the token should be different from the rate limit.

Say our rate limit is four requests per minute. Instead of replenishing four tokens per minute, we can top up one token per 15 seconds. This prevents the sudden burst of traffic at the reset boundary.

1*hS4dvii7LvuW bZGZOFZNA
Photo by Towfiqu Barbhuiya on Pexels

In the previous algorithm, though we replenish our token at one token every 15 seconds, the user can still trigger four requests at the 14th second, which leads to a sudden spike of traffic.

The leaky bucket comes in handy in ensuring a smoother traffic distribution.

Here are the simplified steps:

  • Imagine having a bucket with a hole at the bottom
  • When a request comes in, we fill the request into the bucket
  • At every fixed interval, a request “leaks” out from the bottom and gets processed

The leaky bucket follows the FIFO concept. We can implement it using a queue.

Every request that comes in will not be dumped but be enqueued into a bucket. At every fixed interval, the request that first came in will be dequeued and processed.

Using the leaky bucket, requests can come in at different rates while the server processes them at a constant and predictable rate.

As the saying goes, “There are no best solutions, only trade-offs.” A consistent rate implies that the leaky bucket processes requests at an average rate leading to a slower response time.

Story by Pixabay on Pexels

Fixed window is quite similar to the token bucket, whereby both of them might experience a sudden burst of traffic.

As always, let’s simplify the steps. Say we are implementing a rate limit of four requests/minute:

  • The timeline is split according to minutes for each window
  • Each window contains a counter of 4
  • When a request reaches, we allocate the request based on the FLOOR of the timestamp
  • If the request comes in at 12:01:45, it will be allocated to the window of 12:01:00
  • If the counter is larger than 0, we decrement the counter and process the request
  • Else, we discard the request
Traffic burst happens when all requests come in at the window boundary

The biggest shortcoming of the fixed window algorithm is the following:

  • It will potentially lead to a sudden burst of traffic near the boundary of the window
  • If the counter runs out at the start of the window, all clients will need to wait for a long reset window
Photo by Tara Winstead on Pexels

The sliding window algorithm is very much akin to the fixed window algorithm, except that it tackles the mentioned disadvantages.

Say we are implementing a rate limit of four requests/minute:

Request 5 will be processed and added to the array, as there were only 3 processed requests in the last 1 minute. Request 1 will be popped.
  • An array, aka a window, is used to store processed requests, together with their timestamp
  • When a request reaches, we loop through the array and check the number of requests processed within the last one minute
  • If the number of requests processed within the last minute is less than four, we add the request to the array and process it
  • As we loop through the array, we pop up requests that are processed long before the last minute

As you can tell, the sliding window algorithm is the ideal candidate to ensure that the rate limit is strictly adhered to at any given point in time.

Since we are storing all the requests in the past 1 minute and looping through the array every time a request reaches, the algorithm consumes much more memory and CPU.

1*etL8QeVx8Ahp nkQcGjT6g
Photo by Christina Morillo on Pexels

Each of these algorithms exudes different advantages and also limitations.

Hence, knowing how they work becomes crucial when deciding the tradeoffs your application will bear.

I hope this read helps, and I will see you in the next one!


News Credit

%d bloggers like this: