AWS Lambda

What Is Lambda?

Lambda is AWS's serverless compute service. You upload code, define a trigger (API request, queue message, file upload, schedule), and AWS runs it. No servers to provision, patch, or scale. You pay per invocation and per millisecond of execution time.

Lambda is the default compute choice for event-driven architectures on AWS. But it's not always the right choice. Understanding its scaling model, pricing, and limitations helps you make that call.

When to Use Lambda

Good fit:

API endpoints (behind API Gateway or ALB)
Event processing (S3 uploads, DynamoDB streams, SQS messages, EventBridge events)
Scheduled tasks (cron-like via EventBridge Scheduler)
Data transformation (ETL, file processing)
Microservice backends with variable traffic

Consider Lambda MicroVMs when:

You need per-user or per-session isolated sandboxes for untrusted code
Your workload is interactive and stateful (AI coding assistants, notebooks, sandboxes)
You need VM-level isolation without managing infrastructure
Sessions last up to 8 hours with idle suspend/resume

Consider containers (ECS/Fargate) instead when:

Your workload runs longer than 15 minutes (or 8 hours for MicroVMs)
You need more than 10GB of memory (or 32GB for MicroVMs)
You have steady, predictable traffic where per-invocation pricing is expensive
You need WebSocket connections or long-lived TCP connections
Your startup time exceeds what's acceptable (very large ML models, heavy frameworks)

How Lambda Scales

Lambda's scaling is automatic but has rules:

One invocation per instance. Each concurrent request runs in its own execution environment.
Burst scaling. New accounts get 500-3000 concurrent instances immediately (region-dependent), then 500 additional per minute.
Account-level concurrency limit. Default is 1,000 concurrent executions across all functions in a region. Request an increase for production.
Reserved concurrency. Guarantee a function always has capacity. But it also caps that function's concurrency.
Provisioned concurrency. Pre-warm instances to eliminate cold starts. You pay for them whether they're used or not.

The cold start problem

When Lambda creates a new execution environment:

Download your deployment package
Start the runtime (JVM, .NET CLR, Node.js, etc.)
Run your initialization code (static constructors, DI setup)
Execute the handler

Steps 1-3 are the "cold start." For Node.js/Python it's 100-300ms. For Java/.NET managed runtime it's 1-3 seconds. For .NET with Native AoT it's 200-400ms.

Warm invocations skip steps 1-3 and reuse the existing environment.

Pricing Model

Lambda pricing has two dimensions:

Requests: $0.20 per million invocations
Duration: $0.0000166667 per GB-second (charged per 1ms)

Example: A function with 256MB memory running for 200ms, invoked 1 million times/month:

Requests: 1M × $0.20/M = $0.20
Duration: 1M × 0.2s × 0.25GB × $0.0000166667 = $0.83
Total: ~$1.03/month

That same workload on a t3.small EC2 instance (always on): ~$15/month. Lambda wins when utilization is low or traffic is spiky. EC2/Fargate wins with sustained high-throughput.

Graviton (arm64) pricing

arm64 Lambda functions are 20% cheaper per GB-second and often faster. Unless you have a dependency that requires x86, always use arm64.

Concurrency Patterns

Unreserved (default)

All functions share your account's concurrency pool. One runaway function can starve others.

Reserved concurrency

Set a cap on one function (e.g., 100). That function can never exceed 100 concurrent instances, but it's also guaranteed 100 from the pool. Use this to:

Protect downstream systems (databases, APIs) from being overwhelmed
Ensure critical functions always have capacity

Provisioned concurrency

Pre-warm N instances. Eliminates cold starts for latency-sensitive workloads. Costs ~$0.015 per GB-hour whether instances are used or not. Use for:

Synchronous APIs where P99 latency matters
Functions with expensive initialization (large ML models, heavy DI graphs)

Architecture Patterns

API backend

API Gateway → Lambda → DynamoDB/RDS. The classic serverless pattern. Works great for CRUD APIs with variable traffic.

Event processing fan-out

EventBridge/SNS → Multiple Lambda functions. One event triggers multiple independent processors. Each scales independently.

Queue consumer

SQS → Lambda. Lambda polls SQS and processes batches. Built-in retry, DLQ support, and automatic scaling based on queue depth.

Streaming

Kinesis/DynamoDB Streams → Lambda. Ordered processing within a shard. Good for real-time analytics, change-data-capture, and replication.

Step Functions orchestration

Step Functions → Lambda. Each step invokes a function. The state machine handles retry, error handling, branching, and long-running workflows.

Limitations to Know

Limit	Value
Timeout	15 minutes max
Memory	128MB – 10,240MB
Deployment package	50MB zipped, 250MB unzipped
/tmp storage	512MB (configurable to 10GB)
Concurrent executions	1,000 default (soft limit)
Burst concurrency	500-3000 (region-dependent)
Environment variables	4KB total

Cost Optimization

Right-size memory. Lambda allocates CPU proportionally to memory. A function at 128MB gets a fraction of a vCPU. Sometimes doubling memory halves execution time and costs the same.
Use Graviton. 20% cheaper, same or better performance.
Minimize deployment package size. Smaller packages = faster cold starts = better warm-instance reuse.
Avoid provisioned concurrency unless you need it. It's expensive for idle capacity.
Use Compute Savings Plans. They apply to Lambda duration charges (up to 17% discount).