In early 2023, every API request at Databricks had to wait for a rate limit check that took 10 to 20 milliseconds. Two network hops. One Redis call. For a platform handling billions of requests, those milliseconds added up to an unacceptable tax on every single user.

The team rewrote their rate limiter from scratch. The result: a 10x reduction in tail latency, zero network calls on the critical path, and a system that handles orders of magnitude more traffic with fewer servers.

The catch? They had to give up something most engineers consider non negotiable: accuracy.

Welcome to Grind Engineer, your guide to becoming a better software engineer! No fluff. Pure engineering insights.

The Old Architecture: Two Hops Too Many

The original setup was textbook. A request hit the Envoy ingress gateway, which forwarded it to a Ratelimit Service, which then queried Redis to check the counter. If the counter was below the threshold, the request passed. If not, it got rejected.

This meant every single API call required two network hops before it could proceed: Envoy to Ratelimit Service, then Ratelimit Service to Redis.

Metric

Old System

Network hops per check

2 (Envoy → Service → Redis)

P99 latency per hop

10 to 20ms

Total rate limit overhead

20 to 40ms per request

Redis

Single instance, single point of failure

Scaling approach

Add more Ratelimit Service instances

The team tried optimizing within this architecture. They configured Envoy with consistent hashing so that requests with the same rate limit key always landed on the same service instance, enabling local counting. It helped, but hit three walls:

  1. Non Envoy services could not participate, fragmenting the rate limit view

  2. Service restarts and scaling events churned hash assignments, forcing syncs back to Redis

  3. Hot keys saturated individual machines while neighbors sat idle

Adding more machines stopped helping. The architecture itself was the ceiling.

Subscribe to keep reading

This content is free, but you must be subscribed to Grind Engineer to continue reading.

Already a subscriber?Sign in.Not now

Reply

Avatar

or to participate

Keep Reading