Most AWS environments have significant waste—oversized instances, unused resources, suboptimal pricing models, and traffic patterns that cost more than they should. The good news: most of it is fixable without major architectural changes.
Here's where I typically find savings, roughly ordered from quick wins to longer-term optimizations.
Quick Wins: The Low-Hanging Fruit
Unused Elastic IPs
AWS charges for EIPs that aren't attached to running instances. At ~$3.65/month each, a handful of forgotten EIPs adds up.
Check: EC2 → Elastic IPs → look for "not associated"
Unattached EBS Volumes
Volumes left behind after instance termination. You're paying for storage nobody's using.
Check: EC2 → Volumes → filter by "available" state
Old EBS Snapshots
Snapshots accumulate over time. Many are from instances that no longer exist or backups that exceed retention needs.
Check: EC2 → Snapshots → sort by age, review anything older than your retention policy
Idle Load Balancers
ALBs and NLBs have hourly charges whether they're handling traffic or not. Dev/test environments often have load balancers that should be torn down.
Check: CloudWatch metrics for RequestCount = 0 over extended periods
Right-Sizing Compute
Most instances run at 10-30% CPU utilization. That's money left on the table.
EC2 Right-Sizing
- • Use AWS Compute Optimizer recommendations
- • Review CloudWatch CPU/memory metrics
- • Consider Graviton (ARM) instances—often 20% cheaper
- • Match instance family to workload (compute vs memory optimized)
ECS/Fargate Right-Sizing
- • Review task CPU/memory reservations vs actual usage
- • Container Insights shows actual utilization
- • Over-provisioned tasks are common—start smaller
- • Fargate Spot for fault-tolerant workloads
The Catch
Right-sizing requires understanding your workload patterns. A server that's 10% utilized most of the time but spikes to 80% during batch jobs needs headroom. Look at P95/P99 metrics, not just averages.
Commitment Discounts
Once you've right-sized, commit to what you're actually using. AWS rewards commitment with significant discounts.
Compute Savings Plans
Commit to a $/hour spend on compute (EC2, Fargate, Lambda). Flexible across instance types, regions, and OS. Up to 66% savings.
Best for: Most workloads. Start with your baseline steady-state usage.
EC2 Instance Savings Plans
Commit to specific instance family in a region. Less flexible but deeper discounts than Compute Savings Plans.
Best for: Stable workloads where you know the instance family won't change.
Reserved Instances
The original commitment model. Still useful for RDS, ElastiCache, OpenSearch, and Redshift where Savings Plans don't apply.
Best for: Databases and caches with predictable, steady usage.
ElastiCache Reserved Nodes
If you're running ElastiCache (Redis/Valkey) 24/7, reserved nodes can save 30-50% over on-demand.
Best for: Production caches that run continuously.
CloudFront Security Savings Bundle
Commit to CloudFront spend and get AWS WAF included at a discount. Good if you're using both anyway.
Best for: Workloads already using CloudFront + WAF together.
Network Cost Optimization
Data transfer charges are often the surprise line item on AWS bills. The key principle: keep traffic inside AWS, and ideally inside the same region.
Data Transfer Cost Hierarchy
VPC Endpoints
Traffic to S3, DynamoDB, and other AWS services can go through VPC endpoints instead of the internet. Saves data transfer costs and improves security.
Same-Region Architecture
Keep services that talk to each other in the same region. Cross-region replication has its place, but don't do it by accident.
CloudFront for Egress
CloudFront data transfer is cheaper than direct EC2/S3 egress. For high-traffic APIs, putting CloudFront in front can reduce costs even without caching.
S3 Storage Tiering
S3 Standard is the default, but most data doesn't need instant access forever.
| Tier | Use Case | Savings |
|---|---|---|
| Intelligent-Tiering | Unknown access patterns | Automatic, ~40% |
| Infrequent Access | Accessed < 1x/month | ~45% |
| Glacier Instant | Rarely accessed, need instant retrieval | ~68% |
| Glacier Flexible | Archives, minutes-hours retrieval OK | ~90% |
| Glacier Deep Archive | Compliance archives, 12-hour retrieval | ~95% |
Lifecycle Policies
Set up lifecycle policies to automatically transition objects to cheaper tiers based on age. Most logs, for example, can move to IA after 30 days and Glacier after 90.