The Cost Problem
Most teams overpay for AWS by 30-50%. Not because AWS is expensive, but because the defaults are expensive and optimization requires knowledge that's separate from building features.
The three categories of savings, in order of effort:
- Commitment discounts (Savings Plans, Reserved Instances). Buy in bulk, save 30-72%
- Architecture changes (Graviton, right-sizing, spot). Change how you run things, save 20-60%
- Waste elimination (idle resources, over-provisioned storage, unused NAT traffic). Find and remove what you're paying for but not using
Savings Plans
Savings Plans replaced Reserved Instances for most use cases. You commit to a dollar amount of compute usage per hour for 1 or 3 years.
Compute Savings Plans
- Apply to EC2, Fargate, and Lambda across all regions and instance types
- 1-year no upfront: ~20% discount
- 3-year all upfront: ~50-60% discount
- Most flexible. Change instance types, regions, even services and the savings still apply
EC2 Instance Savings Plans
- Locked to a specific instance family in a specific region
- Deeper discount (~35% for 1-year, ~72% for 3-year all upfront)
- Less flexible. Only use if you're certain about your instance family
How to size them
Look at your minimum sustained usage over the past 3 months. Commit to that floor. Cover bursts with on-demand.
Example: If your compute baseline is $1,000/hour and peaks at $1,500/hour, commit to $800/hour (leave headroom) and pay on-demand for the rest.
Graviton (ARM64)
Graviton processors are 20% cheaper than x86 equivalents with equal or better performance. This is the highest-impact, lowest-effort optimization for most workloads:
- EC2: c7g/m7g/r7g vs c6i/m6i/r6i, 20% cheaper
- Lambda: arm64 functions, 20% cheaper per GB-second
- RDS/Aurora: graviton instances, 20% cheaper
- ElastiCache: graviton nodes, 20% cheaper
- ECS/Fargate: ARM tasks, 20% cheaper
What needs to change: Recompile your code for arm64 (trivial for most languages). Check that all dependencies have ARM builds. Most Docker base images now support multi-arch.
Right-Sizing
Most instances are over-provisioned. A t3.xlarge running at 5% CPU should be a t3.medium.
How to right-size
- Enable AWS Compute Optimizer (free)
- Review recommendations (it analyzes 14 days of CloudWatch metrics)
- Look for instances with <20% average CPU or <40% memory utilization
- Downsize one tier at a time, monitor for a week, repeat
Common over-provisioning
- RDS instances: db.r6g.2xlarge running at 15% CPU β db.r6g.large
- ElastiCache nodes: cache.r6g.xlarge using 2GB of 26GB β cache.r6g.large
- ECS tasks: 4 vCPU / 8GB tasks at 10% CPU β 1 vCPU / 2GB
Spot Instances
Spare EC2 capacity at up to 90% discount. The trade-off: AWS can terminate them with 2 minutes notice.
Good for:
- Batch processing, CI/CD builds, data pipelines
- ECS/EKS worker nodes (with graceful shutdown handling)
- Dev/test environments
- Any workload that can handle interruption
Not for:
- Single-instance production databases
- Workloads that can't recover from sudden termination
Spot in practice
Use a mix of instance types to reduce interruption frequency. If you only request c5.xlarge, you compete with everyone else wanting that type. Request c5.xlarge OR c5a.xlarge OR c6i.xlarge OR m5.xlarge and AWS picks from available capacity.
Storage Optimization
S3 lifecycle policies
Most data is accessed frequently for a short time, then rarely:
- S3 Standard β Infrequent Access after 30 days (40% cheaper)
- Infrequent Access β Glacier Instant Retrieval after 90 days (68% cheaper)
- Glacier β Deep Archive after 1 year (95% cheaper)
S3 Intelligent-Tiering does this automatically for $0.0025/1000 objects/month.
EBS volume types
- gp3 is cheaper than gp2 for the same performance (and you can increase IOPS/throughput independently)
- If you're still on gp2, switch to gp3. Same performance, 20% cheaper baseline
- Delete unattached EBS volumes (common waste; instances terminated but volumes retained)
RDS storage
- Aurora I/O-Optimized removes per-I/O charges for $$ heavy workloads
- Check for unused RDS snapshots (they accumulate)
Network Cost Traps
NAT Gateway
$0.045/hour + $0.045/GB. A NAT Gateway processing 500GB/month costs $55 just in data processing. Solutions:
- S3/DynamoDB gateway endpoints (free)
- Interface endpoints for other AWS services ($7.20/month/AZ but no per-GB NAT charge)
- Reduce cross-AZ traffic (keep producer and consumer in the same AZ where possible)
Data transfer
- Same AZ: free
- Cross-AZ: $0.01/GB each direction
- Cross-region: $0.02/GB
- To internet: $0.09/GB (first 10TB)
Cross-AZ charges are invisible until your bill arrives. ECS tasks talking to RDS in a different AZ, Lambda fetching from ElastiCache across AZs. It adds up.
Quick Wins Checklist
How many of these have you done?
Check the ones you've already done.
Further Reading
- AWS Cost Optimization Pillar
- Savings Plans documentation
- Compute Optimizer
- Cost Explorer
- AWS Cost Optimization for SaaS (guide): the full framework for reducing SaaS AWS spend
Related Blog Posts
Looking for hands-on help? View my AWS architecture services β