AWS Cost Optimization Patterns

The Cost Problem

Most teams overpay for AWS by 30-50%. Not because AWS is expensive, but because the defaults are expensive and optimization requires knowledge that's separate from building features.

The three categories of savings, in order of effort:

Commitment discounts (Savings Plans, Reserved Instances). Buy in bulk, save 30-72%
Architecture changes (Graviton, right-sizing, spot). Change how you run things, save 20-60%
Waste elimination (idle resources, over-provisioned storage, unused NAT traffic). Find and remove what you're paying for but not using

Savings Plans

Savings Plans replaced Reserved Instances for most use cases. You commit to a dollar amount of compute usage per hour for 1 or 3 years.

Compute Savings Plans

Apply to EC2, Fargate, and Lambda across all regions and instance types
1-year no upfront: ~20% discount
3-year all upfront: ~50-60% discount
Most flexible. Change instance types, regions, even services and the savings still apply

EC2 Instance Savings Plans

Locked to a specific instance family in a specific region
Deeper discount (~35% for 1-year, ~72% for 3-year all upfront)
Less flexible. Only use if you're certain about your instance family

How to size them

Look at your minimum sustained usage over the past 3 months. Commit to that floor. Cover bursts with on-demand.

Example: If your compute baseline is $1,000/hour and peaks at $1,500/hour, commit to $800/hour (leave headroom) and pay on-demand for the rest.

Graviton (ARM64)

Graviton processors are 20% cheaper than x86 equivalents with equal or better performance. This is the highest-impact, lowest-effort optimization for most workloads:

EC2: c7g/m7g/r7g vs c6i/m6i/r6i, 20% cheaper
Lambda: arm64 functions, 20% cheaper per GB-second
RDS/Aurora: graviton instances, 20% cheaper
ElastiCache: graviton nodes, 20% cheaper
ECS/Fargate: ARM tasks, 20% cheaper

What needs to change: Recompile your code for arm64 (trivial for most languages). Check that all dependencies have ARM builds. Most Docker base images now support multi-arch.

Right-Sizing

Most instances are over-provisioned. A t3.xlarge running at 5% CPU should be a t3.medium.

How to right-size

Enable AWS Compute Optimizer (free)
Review recommendations (it analyzes 14 days of CloudWatch metrics)
Look for instances with <20% average CPU or <40% memory utilization
Downsize one tier at a time, monitor for a week, repeat

Common over-provisioning

RDS instances: db.r6g.2xlarge running at 15% CPU → db.r6g.large
ElastiCache nodes: cache.r6g.xlarge using 2GB of 26GB → cache.r6g.large
ECS tasks: 4 vCPU / 8GB tasks at 10% CPU → 1 vCPU / 2GB

Spot Instances

Spare EC2 capacity at up to 90% discount. The trade-off: AWS can terminate them with 2 minutes notice.

Good for:

Batch processing, CI/CD builds, data pipelines
ECS/EKS worker nodes (with graceful shutdown handling)
Dev/test environments
Any workload that can handle interruption

Not for:

Single-instance production databases
Workloads that can't recover from sudden termination

Spot in practice

Use a mix of instance types to reduce interruption frequency. If you only request c5.xlarge, you compete with everyone else wanting that type. Request c5.xlarge OR c5a.xlarge OR c6i.xlarge OR m5.xlarge and AWS picks from available capacity.

Storage Optimization

S3 lifecycle policies

Most data is accessed frequently for a short time, then rarely:

S3 Standard → Infrequent Access after 30 days (40% cheaper)
Infrequent Access → Glacier Instant Retrieval after 90 days (68% cheaper)
Glacier → Deep Archive after 1 year (95% cheaper)

S3 Intelligent-Tiering does this automatically for $0.0025/1000 objects/month.

EBS volume types

gp3 is cheaper than gp2 for the same performance (and you can increase IOPS/throughput independently)
If you're still on gp2, switch to gp3. Same performance, 20% cheaper baseline
Delete unattached EBS volumes (common waste; instances terminated but volumes retained)

RDS storage

Aurora I/O-Optimized removes per-I/O charges for $$ heavy workloads
Check for unused RDS snapshots (they accumulate)

Network Cost Traps

NAT Gateway

$0.045/hour + $0.045/GB. A NAT Gateway processing 500GB/month costs $55 just in data processing. Solutions:

S3/DynamoDB gateway endpoints (free)
Interface endpoints for other AWS services ($7.20/month/AZ but no per-GB NAT charge)
Reduce cross-AZ traffic (keep producer and consumer in the same AZ where possible)

Data transfer

Same AZ: free
Cross-AZ: $0.01/GB each direction
Cross-region: $0.02/GB
To internet: $0.09/GB (first 10TB)

Cross-AZ charges are invisible until your bill arrives. ECS tasks talking to RDS in a different AZ, Lambda fetching from ElastiCache across AZs. It adds up.

Quick Wins Checklist

How many of these have you done?

Enable Compute Savings Plans for your sustained baseline Switch to Graviton (arm64) for EC2, Lambda, RDS, ElastiCache Add S3 lifecycle policies to all buckets with aging data Replace NAT Gateway traffic to AWS services with VPC endpoints Switch gp2 EBS volumes to gp3 Delete unattached EBS volumes and unused snapshots Enable Compute Optimizer and review recommendations Right-size RDS instances (most are over-provisioned by 2-4x) Use spot for CI/CD and batch workloads Set up Cost Anomaly Detection alerts

Check the ones you've already done.