Home

High-Concurrency Ranking System Design and Implementation

Li

Li Wei

December 26, 20257 min read

Title: Design and Implementation of a High‑Concurrency Ranking System

Background

In the platform, the ranking feature is one of the core business scenarios. Every user interaction during live streams or events (gifts, likes, shares, orders, etc.) must be reflected in real time on the corresponding leaderboard, covering anchors, audiences, rooms, and other business contexts. These leaderboards need to support multiple time dimensions (hourly, daily, weekly, monthly, yearly, all‑time) and must provide real‑time updates under high concurrency.

From a business perspective, the ranking system must support:

  • Multiple time dimensions: parallel maintenance of hourly, daily, weekly, monthly, yearly, and all‑time leaderboards
  • Multiple role types: within the same activity, separate leaderboards for anchors, audiences, rooms, etc.
  • High‑frequency updates: in a live‑stream scenario, thousands of score changes may occur each second

Pain Points

Business issues

  • Real‑time requirement – e.g., in a gifting scenario, users expect to see their rank change immediately after sending a gift.
  • Scoring flow – how to distinguish hot users from long‑tail users, and how a long‑tail user can become hot.

Technical issues

  • Data separation – hot users and long‑tail users must be distinguished; otherwise a huge Redis key will be created, degrading overall performance.
  • Back‑source strategy – which storage media to use, how to use them, and how to fall back when multiple media are involved.
  • High‑concurrency contention – when many users boost scores simultaneously, Redis operations can race, potentially causing incorrect score calculations.

Design Proposal

Data‑Structure Design

Based on Redis characteristics and business needs, we adopt a dual‑cache structure that separates hot and cold data: a ZSET stores hot data, while a String holds the full dataset. The key design is as follows:

Main Ranking Storage (Redis Sorted Set)

Key design highlights

  • Environment – differentiate development, testing, and production via an env parameter.
  • Ranking listrankId guarantees that leaderboards of different activities are independent.
  • RolerankRole supports anchor (主播), audience (观众), room (直播间), etc.
  • Time – each time dimension uses its own time identifier to ensure data accuracy.
Personal Score Cache (Redis String)

(details omitted in the original)

Scoring Flow Design

From the above, scoring touches both the whole leaderboard and the individual cumulative score data structures. Regarding the order of operations on the ZSET and the String, we choose to update the ZSET first, then the String, because the leaderboard must be immediately available for user viewing.

For concurrency issues during leaderboard scoring, the following solutions were considered:

Option 1: XX→NX→XX Three‑Step Retry

Core idea – a progressive handling strategy based on Redis atomic commands. The three‑step retry covers different concurrency scenarios. We avoid using ZINCRBY directly because truncating a huge key could cause incorrect ZSET increments.

Example scenario

## pseudo‑code placeholder

Advantages

  • Strong performance: a hot room only needs a single ZSET operation to get on the leaderboard.

Disadvantages

  • Not strictly atomic; the process involves multiple Redis commands, especially when touching both ZSET and String.
  • For non‑hot rooms, members may be repeatedly added and removed from the ZSET.
Option 2: Lua‑Script Atomicity

Core idea – wrap all operations in a Lua script, leveraging Redis’s single‑threaded nature to guarantee full atomicity.

## pseudo‑code placeholder

Advantages

  • Full atomicity – the whole operation either succeeds completely or fails completely.
  • Low network overhead – only one round‑trip to Redis.
  • Strong data consistency – no intermediate states.

Disadvantages

  • System entropy: Lua scripts are hard to debug and increase maintenance cost.
  • Poor flexibility: any logic change requires updating and redeploying the script.
  • Redis server load: complex logic runs on the Redis side, adding pressure.
Option 3: Distributed Lock + Traditional Operations

Core idea – use a Redis distributed lock to ensure mutual exclusion; the leaderboard update runs under lock protection.

## pseudo‑code placeholder

Advantages

  • Simple logic: the classic lock‑operate‑release pattern is easy to understand.
  • Strong consistency: the lock serializes operations.
  • Good fault tolerance: lock timeout prevents deadlocks.

Disadvantages

  • Significant performance loss: acquiring and releasing locks adds overhead.
  • Low concurrency: only one thread can operate at a time.
  • Higher complexity: must handle lock timeouts, deadlock edge cases, etc.

Trade‑off Decision

The system ultimately chooses Option 1, mainly because:

  • Concurrency performance – we prefer a lock‑free approach, so Option 3 is ruled out.
  • Complexity balance – it avoids the maintenance burden of Lua scripts while still being feasible.

Code Implementation

Overall Architecture

The system follows the Template Method pattern in an object‑oriented design. An abstract base class defines the core workflow, and subclasses implement business‑specific variations.

Time‑Dimension Handling

Redis key design fully considers business complexity and operational requirements.

Key design logic

  • Differentiated time granularity – each time dimension uses a distinct identifier (hourly keys include the hour, weekly keys use the week number), ensuring accuracy while keeping keys readable.
  • TTL gradient – hourly leaderboards expire after 2 hours, daily after 2 days, weekly after 14 days, matching business‑driven data‑retention periods.
  • Environment & business isolationenv and rankId guarantee complete separation of data across environments and activities.

Core Insertion Logic

(implementation details omitted in the original)

FAQ

Q1: In high‑concurrency scenarios, what problems can arise from this non‑strictly‑atomic scoring process, and how can they be solved?

Potential issue: Under extreme conditions (Redis under heavy load, network jitter, timeouts, etc.) scores may be lost, leading to inconsistency between the ZSET and the memberKey data.

Explanation

  • Hot anchors are almost always on the leaderboard, so memberKey inconsistencies are less concerning.
  • For non‑hot anchors, occasional scoring failures under massive traffic are tolerable; they may still make the leaderboard, or they may not—sporadic loss is acceptable.

Solution: Introduce a Lua script to guarantee atomicity.

Q2: Setting the personal‑score cache to twice the TTL—does this waste memory, and is there a better approach?

A: Cost‑benefit analysis:

Memory cost: (details omitted)

Alternative options

  • Option 1: Database backup – high query latency makes it unsuitable for millisecond‑level leaderboard reconstruction.
  • Option 2: Message‑queue reconstruction – retain user‑action messages and replay them when the leaderboard expires; however, rebuilding is slow and the queue itself consumes storage.

Advantages of the current approach

  • Fast recovery – a GET operation retrieves historical scores in milliseconds.
  • Simplicity – no extra storage systems or synchronization mechanisms are required.
  • Strong consistency – uses the same Redis instance as the leaderboard, avoiding cross‑system consistency issues.

Q3: If a Redis instance crashes, what is the recovery strategy? Will data be lost?

A: The actual data‑protection mechanisms are:

Application‑level fault tolerance: (details omitted)

Dual‑layer data protection

  • Personal‑score cache as backup
  • ClickHouse data lake as the ultimate safeguard

ClickHouse synchronization details: (details omitted)

Q4: Under what circumstances could this solution become a bottleneck?

A: Bottleneck analysis:

  • Redis network I/O(details omitted)
  • CPU‑intensive operations(details omitted)

Scaling strategies

  • Horizontal sharding – shard by activity ID or user ID across multiple Redis instances.
  • Read‑write separation – route leaderboard reads to replicas, writes to the master.
  • Cache layering – keep hot leaderboard data also in application memory.

Q5: Why is the default rankLimit set to 200? How was this number determined?

A: Product requirements specify the top 100; the system stores double that amount to allow headroom.

Configurable design: (details omitted)

Q6: Could the XX→NX→XX scheme cause livelocks under extreme concurrency? How to prevent it?

A: Theoretically, livelocks are possible: (details omitted)

Q7: Scoring requests are sent via MQ. How is idempotent consumption ensured?

We use Redis SETNX + UUID as a distributed‑lock mechanism to guarantee idempotency; the UUID is a unique identifier generated at the very start of the request chain.

Idempotency key composition: (details omitted)

Code example: (details omitted)

Q8: What is the logic for retrieving a leaderboard?

(details omitted)

References


Originally written by Li Wei (李唯_) and published in Chinese on 后端技术栈全书 (Full-Stack Backend Engineering). Translated and adapted for DriftSeas with permission.

Keep reading

More related articles from DriftSeas.