Queues & Messaging

Three broker shapes, three delivery semantics, and the operational problems they all share. Pick the shape that matches your workload, not the product you saw on a blog.

The senior framing

Every async system is at-least-once in practice. "Exactly-once" is a marketing claim — what you actually deliver is "at-least-once + idempotent consumer." If an interviewer hears you say "exactly-once" unguarded, they know you have not shipped one.

Delivery semantics

SemanticMeansAchieve withRisk
At-most-once Message is delivered zero or one time. Never duplicated. Fire-and-forget producer; consumer does not ack; no retries. Lost messages on any failure. Only acceptable for telemetry / metrics.
At-least-once Message is delivered one or more times. Never lost. Producer retries until ack; consumer acks after processing; broker resends on missed ack. Duplicates — consumer MUST be idempotent. This is the sane default.
Exactly-once (the lie) Message is processed exactly once end-to-end. At-least-once delivery + idempotent consumer with dedupe store. Transactional producers (Kafka EOS) only cover producer→broker, not side effects. Marketing shorthand. In practice you are implementing "effectively once" via dedupe keys.

Broker shapes

Anchor on shape, not product. Products bolt features on; the shape is the stable decision.

ShapeDeliveryOrderingExamplesBest for
Append-only log Consumers track their own offset; messages retained for days. Per-partition total order. Across partitions, unordered. Kafka, Kinesis, Pulsar, Redpanda Event sourcing, analytics fan-out, replay, change-data-capture.
Work queue Each message goes to one consumer; ack/nack with visibility timeout. Generally unordered (SQS standard). FIFO queues trade throughput for order. SQS, RabbitMQ (classic), Beanstalkd, Celery broker Task queues, background jobs, uneven consumer pools.
Pub/sub broadcast Every subscriber gets every message; no durable offset by default. Best-effort; subscribers that are offline miss messages. Redis Pub/Sub, SNS, NATS core Live notifications, cache invalidation fan-out.
Stream (hybrid) Log-like durability + consumer-group semantics; both replay and work-queue patterns. Per-partition. Kafka consumer groups, Redis Streams, NATS JetStream When you want log durability but also work-queue consumption per consumer group.

Operational problems

ProblemCauseMitigation
Dead-letter queue (DLQ) fills up silently Poison message retried N times, auto-moved to DLQ; nobody reads the DLQ. Alert on DLQ depth > 0. Every DLQ needs a human runbook — not just storage.
Consumer lag Consumers slower than producers; log grows unbounded. Horizontal scale of consumer group (up to partition count). Backpressure upstream.
Partition-key skew One key (hot customer) maps to one partition → one consumer bottleneck. Re-key with a salt, accept out-of-order for hot keys, or shard the hot key at app layer.
Duplicate processing At-least-once delivery + non-idempotent consumer. Idempotency key per message stored for retention window; `INSERT ... ON CONFLICT DO NOTHING`.
Head-of-line blocking One slow message per partition blocks everything behind it. Parallel processing with concurrency within partition; timeout + push to DLQ.
Reprocessing after a bug fix Consumer bug corrupted downstream state; need to reprocess last N days. Log-shaped broker (Kafka) allows offset rewind. Work queues cannot replay — design for it.
Consumer crashes mid-batch Batch ack pattern commits after the whole batch; crash mid-batch re-delivers the whole batch. Design consumers idempotent AND make batches small. Commit offsets only after durable side-effect.

Idempotency rule

Assume at-least-once. Build consumers to be safely re-entrant. An idempotency key is the contract: first time it commits, every retry is a no-op. Store the key for at least the broker retention + clock skew.