What Is Lambda?
Lambda is AWS's serverless compute service. You upload code, define a trigger (API request, queue message, file upload, schedule), and AWS runs it. No servers to provision, patch, or scale. You pay per invocation and per millisecond of execution time.
Lambda is the default compute choice for event-driven architectures on AWS. But it's not always the right choice. Understanding its scaling model, pricing, and limitations helps you make that call.
When to Use Lambda
Good fit:
- API endpoints (behind API Gateway or ALB)
- Event processing (S3 uploads, DynamoDB streams, SQS messages, EventBridge events)
- Scheduled tasks (cron-like via EventBridge Scheduler)
- Data transformation (ETL, file processing)
- Microservice backends with variable traffic
Consider containers (ECS/Fargate) instead when:
- Your workload runs longer than 15 minutes
- You need more than 10GB of memory
- You have steady, predictable traffic where per-invocation pricing is expensive
- You need WebSocket connections or long-lived TCP connections
- Your startup time exceeds what's acceptable (very large ML models, heavy frameworks)
How Lambda Scales
Lambda's scaling is automatic but has rules:
- One invocation per instance. Each concurrent request runs in its own execution environment.
- Burst scaling. New accounts get 500-3000 concurrent instances immediately (region-dependent), then 500 additional per minute.
- Account-level concurrency limit. Default is 1,000 concurrent executions across all functions in a region. Request an increase for production.
- Reserved concurrency. Guarantee a function always has capacity. But it also caps that function's concurrency.
- Provisioned concurrency. Pre-warm instances to eliminate cold starts. You pay for them whether they're used or not.
The cold start problem
When Lambda creates a new execution environment:
- Download your deployment package
- Start the runtime (JVM, .NET CLR, Node.js, etc.)
- Run your initialization code (static constructors, DI setup)
- Execute the handler
Steps 1-3 are the "cold start." For Node.js/Python it's 100-300ms. For Java/.NET managed runtime it's 1-3 seconds. For .NET with Native AoT it's 200-400ms.
Warm invocations skip steps 1-3 and reuse the existing environment.
Pricing Model
Lambda pricing has two dimensions:
- Requests: $0.20 per million invocations
- Duration: $0.0000166667 per GB-second (charged per 1ms)
Example: A function with 256MB memory running for 200ms, invoked 1 million times/month:
- Requests: 1M Γ $0.20/M = $0.20
- Duration: 1M Γ 0.2s Γ 0.25GB Γ $0.0000166667 = $0.83
- Total: ~$1.03/month
That same workload on a t3.small EC2 instance (always on): ~$15/month. Lambda wins when utilization is low or traffic is spiky. EC2/Fargate wins with sustained high-throughput.
Graviton (arm64) pricing
arm64 Lambda functions are 20% cheaper per GB-second and often faster. Unless you have a dependency that requires x86, always use arm64.
Concurrency Patterns
Unreserved (default)
All functions share your account's concurrency pool. One runaway function can starve others.
Reserved concurrency
Set a cap on one function (e.g., 100). That function can never exceed 100 concurrent instances, but it's also guaranteed 100 from the pool. Use this to:
- Protect downstream systems (databases, APIs) from being overwhelmed
- Ensure critical functions always have capacity
Provisioned concurrency
Pre-warm N instances. Eliminates cold starts for latency-sensitive workloads. Costs ~$0.015 per GB-hour whether instances are used or not. Use for:
- Synchronous APIs where P99 latency matters
- Functions with expensive initialization (large ML models, heavy DI graphs)
Architecture Patterns
API backend
API Gateway β Lambda β DynamoDB/RDS. The classic serverless pattern. Works great for CRUD APIs with variable traffic.
Event processing fan-out
EventBridge/SNS β Multiple Lambda functions. One event triggers multiple independent processors. Each scales independently.
Queue consumer
SQS β Lambda. Lambda polls SQS and processes batches. Built-in retry, DLQ support, and automatic scaling based on queue depth.
Streaming
Kinesis/DynamoDB Streams β Lambda. Ordered processing within a shard. Good for real-time analytics, change-data-capture, and replication.
Step Functions orchestration
Step Functions β Lambda. Each step invokes a function. The state machine handles retry, error handling, branching, and long-running workflows.
Limitations to Know
| Limit | Value |
|---|---|
| Timeout | 15 minutes max |
| Memory | 128MB β 10,240MB |
| Deployment package | 50MB zipped, 250MB unzipped |
| /tmp storage | 512MB (configurable to 10GB) |
| Concurrent executions | 1,000 default (soft limit) |
| Burst concurrency | 500-3000 (region-dependent) |
| Environment variables | 4KB total |
Cost Optimization
- Right-size memory. Lambda allocates CPU proportionally to memory. A function at 128MB gets a fraction of a vCPU. Sometimes doubling memory halves execution time and costs the same.
- Use Graviton. 20% cheaper, same or better performance.
- Minimize deployment package size. Smaller packages = faster cold starts = better warm-instance reuse.
- Avoid provisioned concurrency unless you need it. It's expensive for idle capacity.
- Use Compute Savings Plans. They apply to Lambda duration charges (up to 17% discount).
Further Reading
- Lambda Developer Guide
- Lambda scaling behavior
- Lambda pricing
- Lambda quotas
- Using Lambda with .NET: handlers, DI, Native AoT, and deployment patterns for .NET
Related Blog Posts
Looking for hands-on help? View my AWS architecture services β