Amazon S3

What Is S3?

S3 (Simple Storage Service) is AWS's object storage. It stores files (objects) in containers (buckets) with effectively unlimited capacity. 11 9's of durability (99.999999999%). You'll lose hardware before S3 loses your data.

Every AWS architecture uses S3 somewhere: static assets, backups, data lake storage, log archives, deployment artifacts, user uploads.

Storage Classes

S3 has multiple storage classes optimized for different access patterns:

Class	Use case	Cost (per GB/month)	Retrieval
Standard	Frequently accessed data	$0.023	Instant
Intelligent-Tiering	Unknown/changing access patterns	$0.023 + monitoring fee	Instant
Infrequent Access (IA)	Once a month or less	$0.0125	Instant, per-retrieval fee
One Zone-IA	Reproducible, infrequent data	$0.010	Instant, single AZ only
Glacier Instant	Archive, but need millisecond access	$0.004	Instant
Glacier Flexible	Archive, can wait minutes	$0.0036	1-5 min (expedited) to 12 hrs
Glacier Deep Archive	Long-term archive, rare access	$0.00099	12-48 hours

The optimization: Most data is "hot" for days/weeks, then rarely accessed again. Without lifecycle policies, everything stays in Standard forever. At 2-20x the cost it needs to be.

Lifecycle Policies

Automatically transition objects between storage classes based on age:

{
  "Rules": [{
    "ID": "ArchiveOldData",
    "Status": "Enabled",
    "Transitions": [
      { "Days": 30, "StorageClass": "STANDARD_IA" },
      { "Days": 90, "StorageClass": "GLACIER_IR" },
      { "Days": 365, "StorageClass": "DEEP_ARCHIVE" }
    ],
    "Expiration": { "Days": 2555 }
  }]
}

This saves 40-95% on storage for aging data with zero application changes.

Intelligent-Tiering

If you can't predict access patterns, Intelligent-Tiering monitors access and moves objects automatically. Costs $0.0025 per 1,000 objects/month for monitoring, but saves money if patterns are unpredictable.

Event Notifications

S3 can trigger actions when objects are created, deleted, or restored:

Targets:

Lambda functions (most common. Process uploads, generate thumbnails, run moderation)
SQS queues (buffer for batch processing)
SNS topics (fan-out to multiple consumers)
EventBridge (content-based routing, more flexibility)

Common patterns:

Image upload → Lambda → resize/compress → store processed version
CSV upload → Lambda → parse → write to DynamoDB
Log file delivery → SQS → batch processor → data warehouse

Security

Block public access

Enable on every bucket. S3 Block Public Access prevents accidental public exposure at the account or bucket level. There's almost never a reason to make a bucket public (use CloudFront for static websites).

Bucket policy vs IAM policy

IAM policy: Controls what a principal (role/user) can do. "This Lambda can read from this bucket."
Bucket policy: Controls who can access the bucket. "Only these roles can access this bucket." Also enables cross-account access.

Encryption

SSE-S3 (default): AWS manages keys. Free. No configuration needed.
SSE-KMS: You control the key in KMS. Audit access via CloudTrail. Required for some compliance frameworks.
Client-side: You encrypt before upload. AWS never sees plaintext. Most complex.

Default (SSE-S3) is fine for most workloads. Use SSE-KMS when you need key access auditing or key rotation control.

Presigned URLs

Grant temporary access to specific objects without making them public:

Upload: Client PUTs directly to S3 (avoids proxying through your API)
Download: Client GETs with a time-limited URL

Performance

Request rate

S3 supports 5,500 GET and 3,500 PUT requests per second per prefix. For most workloads this is plenty. If you need more:

Distribute objects across multiple prefixes
Use random prefixes instead of date-based (e.g., hash prefix)

Transfer acceleration

Use S3 Transfer Acceleration for uploading over long distances. Routes through CloudFront edge locations. Adds cost but significantly improves upload speed from far-away clients.

Multipart upload

For objects over 100MB, use multipart upload:

Faster (parallel part uploads)
Resilient (retry individual parts)
Required for objects over 5GB

CDK Example

import { Bucket, BucketEncryption, BlockPublicAccess, StorageClass } from 'aws-cdk-lib/aws-s3';

const dataBucket = new Bucket(this, 'DataBucket', {
  bucketName: 'my-app-data',
  encryption: BucketEncryption.S3_MANAGED,
  blockPublicAccess: BlockPublicAccess.BLOCK_ALL,
  versioned: true,
  lifecycleRules: [{
    transitions: [
      { storageClass: StorageClass.INFREQUENT_ACCESS, transitionAfter: Duration.days(30) },
      { storageClass: StorageClass.GLACIER_INSTANT_RETRIEVAL, transitionAfter: Duration.days(90) },
    ],
    expiration: Duration.days(365 * 7), // 7 years
  }],
  removalPolicy: RemovalPolicy.RETAIN,
});

Cost Optimization

Lifecycle policies are the biggest lever. Moving 1TB from Standard to Glacier saves ~$19/month.
Delete incomplete multipart uploads. They accumulate silently. Add a lifecycle rule: AbortIncompleteMultipartUpload: { DaysAfterInitiation: 7 }.
Use S3 Storage Lens to analyze access patterns across buckets.
Avoid unnecessary requests. ListObjects calls cost $0.005 per 1,000 requests. If you're listing millions of objects, store metadata in DynamoDB instead.
Watch for cross-region transfer. Accessing S3 from a different region costs $0.02/GB.