A Dead Letter Queue (DLQ) is a secondary queue used to store messages that fail to be processed correctly by the main consumer queue.
In distributed systems, message queues help decouple services and ensure reliable communication. But sometimes, messages can’t be processed (due to errors, timeouts, or malformed data). DLQs capture these failures so you don’t lose data and can investigate or retry them later.
Use a DLQ when you want to safely handle failed messages without blocking the main message queue or losing data.
You need to know
Redrive Policy: A DLQ uses a redrive policy that defines how many times a message should be retried before being moved to the DLQ.
Troubleshooting & Recovery: DLQs allow developers to inspect failed messages, fix the root cause, and optionally reprocess them after resolution.
Message Ordering Impact: In FIFO systems, using DLQs can break message order, so use them carefully if order is critical.
Popular technologies
Amazon SQS DLQ – Built-in support for dead letter queues in AWS’s managed queueing service.
RabbitMQ DLX (Dead Letter Exchange) – Routes undeliverable messages to a designated exchange.
Apache Kafka + Dead Letter Topics – Uses separate topics for dead letters, typically handled at the consumer level.
Like posts like this?
Every week, you'll get a new system design concept, broken down like this one.
Free subscribers also get a little bonus:
🎁 The System Design Interview Preparation Cheat Sheet
If you're into visuals, paid subscribers unlock:
→ My Excalidraw system design template – so you have somewhere to start
→ My Excalidraw component library – used in the diagram of this issue
No pressure though. Your support helps me keep writing, and I appreciate it more than you know ❤️