The Problem with Synchronous Architectures
Tightly coupled services create cascading failures. When Service A calls Service B synchronously, and B is slow or down, A fails too. As you add more services, the dependency graph becomes a liability—one slow database query can bring down your entire system.
Event-driven architecture breaks these dependencies. Services communicate through events rather than direct calls, allowing them to scale independently and fail gracefully.
Core Concepts
Events vs. Commands
Events describe something that happened ("OrderPlaced"). Commands request an action ("ProcessPayment"). Events are broadcast to interested subscribers; commands are sent to specific handlers.
Eventual Consistency
In event-driven systems, data isn't immediately consistent everywhere. Subscribers process events asynchronously. Design for this—it enables scale but requires different thinking about transactions.
Idempotency
Events can be delivered more than once. Your handlers must produce the same result whether they process an event once or ten times. Use idempotency keys and check-before-write patterns.
Dead Letter Queues
When event processing fails repeatedly, messages go to a DLQ for investigation. Always configure DLQs—without them, failed events disappear silently.
Choosing the Right AWS Service
AWS offers multiple messaging services, each with different strengths:
| Service | Best For | Key Characteristics |
|---|---|---|
| EventBridge | Event routing, cross-account events | Schema registry, content filtering, 35+ AWS service integrations |
| SQS | Work queues, buffering | At-least-once delivery, FIFO option, long polling |
| SNS | Fan-out, notifications | Push to multiple subscribers, filtering, mobile push |
| Kinesis | High-volume streaming | Ordered within shard, replay capability, real-time analytics |
Common Patterns
SNS + SQS Fan-Out
Publish events to SNS, subscribe multiple SQS queues. Each subscriber gets its own copy of the event and processes at its own pace. Failed processing in one subscriber doesn't affect others.
EventBridge for Cross-Service Communication
Use EventBridge as your central event bus. Services publish events without knowing who consumes them. Rules route events to the right targets based on content. Great for microservices that need loose coupling.
SQS for Work Distribution
Queue work items in SQS, process with Lambda or ECS workers. Built-in retry with exponential backoff. Visibility timeout prevents duplicate processing. Scale workers independently of producers.
Step Functions for Orchestration
When you need coordinated workflows across multiple services, Step Functions provides visual orchestration with built-in error handling, retries, and state management. Better than chaining Lambdas directly.
Common Pitfalls
- Forgetting idempotency — SQS and EventBridge can deliver messages more than once. If your handler isn't idempotent, you'll process the same event multiple times with different results.
- No DLQ monitoring — Setting up DLQs isn't enough. You need alerts when messages land there, and a process to investigate and replay them.
- Oversized events — Keep events small. Include IDs and let consumers fetch details if needed. Large events increase costs and latency.
- Missing correlation IDs — Without a correlation ID flowing through all events, debugging distributed transactions becomes nearly impossible.
- Ignoring ordering requirements — Standard SQS doesn't guarantee order. If order matters, use FIFO queues or Kinesis—but understand the throughput trade-offs.
When Event-Driven Isn't the Answer
Event-driven architecture adds complexity. For simple CRUD applications or when you need immediate consistency (like checking inventory before accepting an order), synchronous calls are often simpler and sufficient. Don't add messaging infrastructure just because it's trendy—add it when you have a specific scaling, resilience, or decoupling problem to solve.