Token Bucket Implementation in Guava
Li Wei
Title: Implementation Mechanism of Token Bucket in Guava
Common Rate‑Limiting Algorithms and Comparison
In high‑concurrency systems, rate‑limiting algorithms are essential for keeping services stable, improving user experience, and protecting downstream resources. The most common algorithms include Fixed Window, Sliding Window, Leaky Bucket, and Token Bucket. The table below compares the mainstream approaches:
| Algorithm | Brief Principle | Advantages | Disadvantages | Typical Use Cases |
|---|---|---|---|---|
| Fixed Window | Count requests per time window; reject when the limit is exceeded | Simple to implement | Burst at window boundaries; only coarse QPS control | Simple rate control |
| Sliding Window | Record timestamps of each request and slide the counting window | High accuracy, smooth throttling | Higher memory consumption, more complex | Fine‑grained traffic control |
| Leaky Bucket | Requests enter a queue and are dequeued at a constant rate | Smooth output, protects downstream | Does not allow bursts | Downstream‑sensitive protection |
| Token Bucket | Tokens are generated at a constant rate; each request consumes tokens | Supports bursts, flexible | Slightly more complex to implement | High concurrency, burst‑friendly scenarios |
Token Bucket Algorithm Basics
Core Idea
The token bucket algorithm uses an abstract “token” to regulate request rates. Tokens are added to the bucket at a steady rate up to a maximum capacity. When a request arrives, it tries to consume one or more tokens. If enough tokens are available, the request proceeds immediately; otherwise, depending on the implementation, it may wait, be queued, or be rejected.
Key Parameters
- Token generation rate (
rate): Number of tokens added per second; determines the long‑term average processing rate. - Bucket capacity (
capacity): Maximum number of tokens the bucket can hold; determines the maximum burst handling capability. - Current token count (
storedPermits): Tokens presently available in the bucket. - Token generation interval (
stableInterval): Time interval required to generate a single token. - Next available time (
nextFreeTicketMicros): Used in credit/queue scenarios; records when the next token will become available.
Workflow
- Token generation: The system periodically creates tokens and puts them into the bucket; excess tokens are dropped when the bucket is full.
- Request arrival: A request attempts to consume tokens; if sufficient, it passes instantly.
- Burst handling: When traffic has been low, tokens accumulate, allowing a sudden surge to consume the stored tokens at once.
- Wait/queue or reject: If tokens are insufficient, the implementation may let the request wait for new tokens (credit), or reject it outright.
Token Bucket Implementation in Guava
Guava’s RateLimiter is a widely used rate‑limiting component. Its implementation extends the basic token bucket with engineering‑grade features that support various business scenarios. Guava mainly adds three mechanisms: Warm‑up Mode, Burst Mode, and Credit Mechanism. The following sections explain each in detail, referencing the source‑code structure and class diagram.
Warm‑up Mode
In Warm‑up Mode, the limiter gradually ramps up its rate according to incoming traffic, allowing the service to transition smoothly from idle to high load. Tokens in the bucket have a “logical” meaning that only indicates the limiter’s cold or hot state; all tokens must be freshly generated, and the generation rate varies during the warm‑up period.
Guava’s source comments illustrate the warm‑up period using the areas of a trapezoid and a rectangle:
- Horizontal axis: Number of tokens in the bucket
- Vertical axis: Time interval between token generations
- When the bucket holds many tokens, traffic is low and the limiter is in a cold state; token generation is slow (e.g., three times the
stableInterval). - As tokens are consumed, the limiter warms up, the generation rate accelerates, and eventually returns to the stable rate (
stableInterval).
The trapezoid area represents the “slow” token‑generation phase, while the rectangle area corresponds to the stable‑rate phase. By integrating (calculating the area), one can intuitively see that the time needed to issue m tokens equals the area over the interval [n‑m, n].
Practical significance of Warm‑up Mode
- Suitable for downstream services that are cold‑started or resource‑sensitive, preventing a sudden traffic shock.
- Dynamically adjusts token generation speed to achieve a smooth service ramp‑up.
Burst Mode
In Burst Mode, tokens in Guava’s bucket have a validity period, giving the limiter a certain “elasticity.” This allows the limiter to temporarily exceed the steady rate when the system has been idle, smoothing out sudden traffic spikes.
Token issuance in Burst Mode: The model is straightforward—tokens can be issued directly without extra computation. As long as the bucket contains tokens, a request passes immediately without waiting.
Code sketch
// Pseudocode
if (storedPermits > 0) {
storedPermits--;
// request proceeds instantly
}
Thus, burst traffic can be handled instantly as long as tokens are available.
Credit Mechanism
The credit mechanism lets a request “reserve” tokens that will be generated in the future. Even when the current token count is insufficient, the system does not reject the request immediately; instead, it queues the request and processes it once enough tokens have been produced. Guava’s RateLimiter acquire() method employs this strategy.
Comparison with non‑credit (classic token bucket)
- No credit: If a request arrives and the bucket lacks enough tokens, the request is rejected outright. This guarantees the system never overloads but leads to lost requests and a poor user experience.
- With credit: When tokens are insufficient, the required wait time is calculated, the request is queued, and it is automatically processed when tokens become available. This ensures the request eventually succeeds, improving user experience, though some requests may experience longer waits.
How it works
- Upon arrival, a request first consumes any existing tokens.
- If tokens are still lacking, the system computes how many new tokens must be generated and translates that into a wait time.
- The next token‑availability timestamp (
nextFreeTicketMicros) is recorded; subsequent requests are queued based on this value. - The caller can choose to block synchronously, receive an asynchronous notification, or simply get the estimated wait time and decide how to handle it.
Pseudocode & flow
long now = readCurrentTime();
if (storedPermits >= required) {
storedPermits -= required;
return 0; // no wait
}
long waitMicros = computeWaitTime(required - storedPermits);
nextFreeTicketMicros = now + waitMicros;
return waitMicros;
Typical scenarios: batch job processing, APIs that prioritize user experience, message‑queue consumption, etc.
Guava RateLimiter Source Structure and Class Diagram
(Insert class diagram and package overview here)
Conclusion
The token bucket algorithm’s flexibility, smoothness, and burst support make it the go‑to choice for rate limiting in modern high‑concurrency systems. Guava’s RateLimiter enriches the basic algorithm with credit, burst, and warm‑up mechanisms, allowing it to meet diverse business requirements. In practice, combine these features with a solid understanding of your workload, system architecture, and downstream capacity; tune the limiter parameters, set up proper monitoring and alerts, and enable dynamic adjustments or graceful degradation. This ensures the system remains stable while delivering a high‑quality experience to users.
Originally written by Li Wei (李唯_) and published in Chinese on 后端技术栈全书 (Full-Stack Backend Engineering). Translated and adapted for DriftSeas with permission.