Queues & Messaging
Three broker shapes, three delivery semantics, and the operational problems they all share. Pick the shape that matches your workload, not the product you saw on a blog.
The senior framing
Every async system is at-least-once in practice. "Exactly-once" is a
marketing claim — what you actually deliver is "at-least-once + idempotent
consumer." If an interviewer hears you say "exactly-once" unguarded, they
know you have not shipped one.
Delivery semantics
| Semantic | Means | Achieve with | Risk |
|---|---|---|---|
| At-most-once | Message is delivered zero or one time. Never duplicated. | Fire-and-forget producer; consumer does not ack; no retries. | Lost messages on any failure. Only acceptable for telemetry / metrics. |
| At-least-once | Message is delivered one or more times. Never lost. | Producer retries until ack; consumer acks after processing; broker resends on missed ack. | Duplicates — consumer MUST be idempotent. This is the sane default. |
| Exactly-once (the lie) | Message is processed exactly once end-to-end. | At-least-once delivery + idempotent consumer with dedupe store. Transactional producers (Kafka EOS) only cover producer→broker, not side effects. | Marketing shorthand. In practice you are implementing "effectively once" via dedupe keys. |
Broker shapes
Anchor on shape, not product. Products bolt features on; the shape is the stable decision.
| Shape | Delivery | Ordering | Examples | Best for |
|---|---|---|---|---|
| Append-only log | Consumers track their own offset; messages retained for days. | Per-partition total order. Across partitions, unordered. | Kafka, Kinesis, Pulsar, Redpanda | Event sourcing, analytics fan-out, replay, change-data-capture. |
| Work queue | Each message goes to one consumer; ack/nack with visibility timeout. | Generally unordered (SQS standard). FIFO queues trade throughput for order. | SQS, RabbitMQ (classic), Beanstalkd, Celery broker | Task queues, background jobs, uneven consumer pools. |
| Pub/sub broadcast | Every subscriber gets every message; no durable offset by default. | Best-effort; subscribers that are offline miss messages. | Redis Pub/Sub, SNS, NATS core | Live notifications, cache invalidation fan-out. |
| Stream (hybrid) | Log-like durability + consumer-group semantics; both replay and work-queue patterns. | Per-partition. | Kafka consumer groups, Redis Streams, NATS JetStream | When you want log durability but also work-queue consumption per consumer group. |
Operational problems
| Problem | Cause | Mitigation |
|---|---|---|
| Dead-letter queue (DLQ) fills up silently | Poison message retried N times, auto-moved to DLQ; nobody reads the DLQ. | Alert on DLQ depth > 0. Every DLQ needs a human runbook — not just storage. |
| Consumer lag | Consumers slower than producers; log grows unbounded. | Horizontal scale of consumer group (up to partition count). Backpressure upstream. |
| Partition-key skew | One key (hot customer) maps to one partition → one consumer bottleneck. | Re-key with a salt, accept out-of-order for hot keys, or shard the hot key at app layer. |
| Duplicate processing | At-least-once delivery + non-idempotent consumer. | Idempotency key per message stored for retention window; `INSERT ... ON CONFLICT DO NOTHING`. |
| Head-of-line blocking | One slow message per partition blocks everything behind it. | Parallel processing with concurrency within partition; timeout + push to DLQ. |
| Reprocessing after a bug fix | Consumer bug corrupted downstream state; need to reprocess last N days. | Log-shaped broker (Kafka) allows offset rewind. Work queues cannot replay — design for it. |
| Consumer crashes mid-batch | Batch ack pattern commits after the whole batch; crash mid-batch re-delivers the whole batch. | Design consumers idempotent AND make batches small. Commit offsets only after durable side-effect. |
Idempotency rule
Assume at-least-once. Build consumers to be safely re-entrant. An idempotency key is the contract: first time it commits, every retry is a no-op. Store the key for at least the broker retention + clock skew.