Queues & Messaging

Three broker shapes, three delivery semantics, and the operational problems they all share. Pick the shape that matches your workload, not the product you saw on a blog.

The senior framing

Every async system is at-least-once in practice. "Exactly-once" is a marketing claim — what you actually deliver is "at-least-once + idempotent consumer." If an interviewer hears you say "exactly-once" unguarded, they know you have not shipped one.

Delivery semantics

Semantic	Means	Achieve with	Risk
At-most-once	Message is delivered zero or one time. Never duplicated.	Fire-and-forget producer; consumer does not ack; no retries.	Lost messages on any failure. Only acceptable for telemetry / metrics.
At-least-once	Message is delivered one or more times. Never lost.	Producer retries until ack; consumer acks after processing; broker resends on missed ack.	Duplicates — consumer MUST be idempotent. This is the sane default.
Exactly-once (the lie)	Message is processed exactly once end-to-end.	At-least-once delivery + idempotent consumer with dedupe store. Transactional producers (Kafka EOS) only cover producer→broker, not side effects.	Marketing shorthand. In practice you are implementing "effectively once" via dedupe keys.

Broker shapes

Anchor on shape, not product. Products bolt features on; the shape is the stable decision.

Shape	Delivery	Ordering	Examples	Best for
Append-only log	Consumers track their own offset; messages retained for days.	Per-partition total order. Across partitions, unordered.	Kafka, Kinesis, Pulsar, Redpanda	Event sourcing, analytics fan-out, replay, change-data-capture.
Work queue	Each message goes to one consumer; ack/nack with visibility timeout.	Generally unordered (SQS standard). FIFO queues trade throughput for order.	SQS, RabbitMQ (classic), Beanstalkd, Celery broker	Task queues, background jobs, uneven consumer pools.
Pub/sub broadcast	Every subscriber gets every message; no durable offset by default.	Best-effort; subscribers that are offline miss messages.	Redis Pub/Sub, SNS, NATS core	Live notifications, cache invalidation fan-out.
Stream (hybrid)	Log-like durability + consumer-group semantics; both replay and work-queue patterns.	Per-partition.	Kafka consumer groups, Redis Streams, NATS JetStream	When you want log durability but also work-queue consumption per consumer group.

Operational problems

Problem	Cause	Mitigation
Dead-letter queue (DLQ) fills up silently	Poison message retried N times, auto-moved to DLQ; nobody reads the DLQ.	Alert on DLQ depth > 0. Every DLQ needs a human runbook — not just storage.
Consumer lag	Consumers slower than producers; log grows unbounded.	Horizontal scale of consumer group (up to partition count). Backpressure upstream.
Partition-key skew	One key (hot customer) maps to one partition → one consumer bottleneck.	Re-key with a salt, accept out-of-order for hot keys, or shard the hot key at app layer.
Duplicate processing	At-least-once delivery + non-idempotent consumer.	Idempotency key per message stored for retention window; `INSERT ... ON CONFLICT DO NOTHING`.
Head-of-line blocking	One slow message per partition blocks everything behind it.	Parallel processing with concurrency within partition; timeout + push to DLQ.
Reprocessing after a bug fix	Consumer bug corrupted downstream state; need to reprocess last N days.	Log-shaped broker (Kafka) allows offset rewind. Work queues cannot replay — design for it.
Consumer crashes mid-batch	Batch ack pattern commits after the whole batch; crash mid-batch re-delivers the whole batch.	Design consumers idempotent AND make batches small. Commit offsets only after durable side-effect.

Idempotency rule

Assume at-least-once. Build consumers to be safely re-entrant. An idempotency key is the contract: first time it commits, every retry is a no-op. Store the key for at least the broker retention + clock skew.