Home

MQ Pitfall Avoidance Guide

Li

Li Wei

September 11, 20253 min read

MQ Pitfall Avoidance Guide

Message Backlog

Problem Category: Message Backlog
Related Description: During MQ usage, various issues can cause messages to be consumed late, leading to a large backlog.
Root Causes

  • Consumer process gets stuck.
  • Message consumption takes too long.
  • Consumer‑group client fails to start.
  • Too few consumer threads, insufficient processing capacity.
  • After a consumption failure the client returns CONSUME_FAILURE; if it cannot recover, it will keep retrying indefinitely.

Best Practices

  • Keep business logic in the consumer short; if there is long‑running work, handle it asynchronously.
  • Minimize interactions with external services to avoid their problems throttling your consumption rate.
  • Classify exceptions in consumer threads and handle them appropriately; do not let a simple exception terminate the consumer or deregister the node.
  • When a backlog occurs, use a shovel (message forwarding) for emergency handling to prevent loss.
  • For single‑partition consumption, enable parallel processing when ordering is not required.
  • Detect issues early and scale out partitions and consumer machines as needed.
  • Optimize consumption logic; make anything that can be processed asynchronously do so.
  • On consumption failure, avoid CONSUME_FAILURE; use RECONSUME_LATER and implement proper fallback/back‑up logic.

Message Loss

Problem Category: Message Loss
Related Description: Message loss can occur due to MQ system failures or misuse.
Root Causes

  • Kafka partition leader election issues causing loss.
  • Reliability level not set to ack=-1.
  • During a machine restart, asynchronous sends have not completed before the client is destroyed.
  • Oversized messages cause send failures.
  • Send failures are not monitored promptly.
  • Large‑scale cluster outages.
  • Some business logic discards messages after a timeout.

Best Practices

  • Do not acknowledge consumption unless the business processing has succeeded.
  • Gracefully shut down consumers and producers before the application exits.
  • If zero tolerance for loss is required, set client ack=-1.
  • Implement robust cluster disaster‑recovery; for Kafka, strive for an even distribution of partitions across all brokers.
  • Avoid sending messages larger than 1 MB.

Duplicate Consumption

Problem Category: Duplicate Consumption
Related Description: Duplicate consumption is a common issue in MQ usage; if not handled correctly it can cause production problems.
Root Causes

  • Most message middleware cannot guarantee exactly‑once delivery.
  • Producers may publish the same message multiple times.

Best Practices

  • Enforce strict idempotency in consumption. There are many ways to achieve idempotency, such as using distributed locks to serialize parallel processing, leveraging database transactions, or employing a state‑machine that tracks record status in the database.

Message Send Failure

Problem Category: Message Send Failure
Related Description: Improper usage or system faults can lead to failed message sends.
Root Causes

  • Improper client usage, such as repeatedly creating instances, consumes excessive system resources.
  • System anomalies are not monitored, leading to uncontrolled traffic without throttling or fallback plans.
  • Send results are ignored.

Best Practices

  • Create clients according to best practices, e.g., configure them as Spring beans to ensure a single instance per consumer group or producer.
  • Pay attention to send results.
  • Build effective traffic monitoring and emergency response plans.

Originally written by Li Wei (李唯_) and published in Chinese on 后端技术栈全书 (Full-Stack Backend Engineering). Translated and adapted for DriftSeas with permission.

Keep reading

More related articles from DriftSeas.