What Is S3?
S3 (Simple Storage Service) is AWS's object storage. It stores files (objects) in containers (buckets) with effectively unlimited capacity. 11 9's of durability (99.999999999%). You'll lose hardware before S3 loses your data.
Every AWS architecture uses S3 somewhere: static assets, backups, data lake storage, log archives, deployment artifacts, user uploads.
Storage Classes
S3 has multiple storage classes optimized for different access patterns:
| Class | Use case | Cost (per GB/month) | Retrieval |
|---|---|---|---|
| Standard | Frequently accessed data | $0.023 | Instant |
| Intelligent-Tiering | Unknown/changing access patterns | $0.023 + monitoring fee | Instant |
| Infrequent Access (IA) | Once a month or less | $0.0125 | Instant, per-retrieval fee |
| One Zone-IA | Reproducible, infrequent data | $0.010 | Instant, single AZ only |
| Glacier Instant | Archive, but need millisecond access | $0.004 | Instant |
| Glacier Flexible | Archive, can wait minutes | $0.0036 | 1-5 min (expedited) to 12 hrs |
| Glacier Deep Archive | Long-term archive, rare access | $0.00099 | 12-48 hours |
The optimization: Most data is "hot" for days/weeks, then rarely accessed again. Without lifecycle policies, everything stays in Standard forever. At 2-20x the cost it needs to be.
Lifecycle Policies
Automatically transition objects between storage classes based on age:
{
"Rules": [{
"ID": "ArchiveOldData",
"Status": "Enabled",
"Transitions": [
{ "Days": 30, "StorageClass": "STANDARD_IA" },
{ "Days": 90, "StorageClass": "GLACIER_IR" },
{ "Days": 365, "StorageClass": "DEEP_ARCHIVE" }
],
"Expiration": { "Days": 2555 }
}]
}
This saves 40-95% on storage for aging data with zero application changes.
Intelligent-Tiering
If you can't predict access patterns, Intelligent-Tiering monitors access and moves objects automatically. Costs $0.0025 per 1,000 objects/month for monitoring, but saves money if patterns are unpredictable.
Event Notifications
S3 can trigger actions when objects are created, deleted, or restored:
Targets:
- Lambda functions (most common. Process uploads, generate thumbnails, run moderation)
- SQS queues (buffer for batch processing)
- SNS topics (fan-out to multiple consumers)
- EventBridge (content-based routing, more flexibility)
Common patterns:
- Image upload β Lambda β resize/compress β store processed version
- CSV upload β Lambda β parse β write to DynamoDB
- Log file delivery β SQS β batch processor β data warehouse
Security
Block public access
Enable on every bucket. S3 Block Public Access prevents accidental public exposure at the account or bucket level. There's almost never a reason to make a bucket public (use CloudFront for static websites).
Bucket policy vs IAM policy
- IAM policy: Controls what a principal (role/user) can do. "This Lambda can read from this bucket."
- Bucket policy: Controls who can access the bucket. "Only these roles can access this bucket." Also enables cross-account access.
Encryption
- SSE-S3 (default): AWS manages keys. Free. No configuration needed.
- SSE-KMS: You control the key in KMS. Audit access via CloudTrail. Required for some compliance frameworks.
- Client-side: You encrypt before upload. AWS never sees plaintext. Most complex.
Default (SSE-S3) is fine for most workloads. Use SSE-KMS when you need key access auditing or key rotation control.
Presigned URLs
Grant temporary access to specific objects without making them public:
- Upload: Client PUTs directly to S3 (avoids proxying through your API)
- Download: Client GETs with a time-limited URL
Performance
Request rate
S3 supports 5,500 GET and 3,500 PUT requests per second per prefix. For most workloads this is plenty. If you need more:
- Distribute objects across multiple prefixes
- Use random prefixes instead of date-based (e.g., hash prefix)
Transfer acceleration
Use S3 Transfer Acceleration for uploading over long distances. Routes through CloudFront edge locations. Adds cost but significantly improves upload speed from far-away clients.
Multipart upload
For objects over 100MB, use multipart upload:
- Faster (parallel part uploads)
- Resilient (retry individual parts)
- Required for objects over 5GB
CDK Example
import { Bucket, BucketEncryption, BlockPublicAccess, StorageClass } from 'aws-cdk-lib/aws-s3';
const dataBucket = new Bucket(this, 'DataBucket', {
bucketName: 'my-app-data',
encryption: BucketEncryption.S3_MANAGED,
blockPublicAccess: BlockPublicAccess.BLOCK_ALL,
versioned: true,
lifecycleRules: [{
transitions: [
{ storageClass: StorageClass.INFREQUENT_ACCESS, transitionAfter: Duration.days(30) },
{ storageClass: StorageClass.GLACIER_INSTANT_RETRIEVAL, transitionAfter: Duration.days(90) },
],
expiration: Duration.days(365 * 7), // 7 years
}],
removalPolicy: RemovalPolicy.RETAIN,
});
Cost Optimization
- Lifecycle policies are the biggest lever. Moving 1TB from Standard to Glacier saves ~$19/month.
- Delete incomplete multipart uploads. They accumulate silently. Add a lifecycle rule:
AbortIncompleteMultipartUpload: { DaysAfterInitiation: 7 }. - Use S3 Storage Lens to analyze access patterns across buckets.
- Avoid unnecessary requests.
ListObjectscalls cost $0.005 per 1,000 requests. If you're listing millions of objects, store metadata in DynamoDB instead. - Watch for cross-region transfer. Accessing S3 from a different region costs $0.02/GB.
Further Reading
Looking for hands-on help? View my AWS architecture services β