Home β€Ί AWS Resources β€Ί Amazon S3

Amazon S3

Object storage on AWS: storage classes, lifecycle policies, event notifications, security, and cost optimization.

What Is S3?

S3 (Simple Storage Service) is AWS's object storage. It stores files (objects) in containers (buckets) with effectively unlimited capacity. 11 9's of durability (99.999999999%). You'll lose hardware before S3 loses your data.

Every AWS architecture uses S3 somewhere: static assets, backups, data lake storage, log archives, deployment artifacts, user uploads.

Storage Classes

S3 has multiple storage classes optimized for different access patterns:

Class Use case Cost (per GB/month) Retrieval
Standard Frequently accessed data $0.023 Instant
Intelligent-Tiering Unknown/changing access patterns $0.023 + monitoring fee Instant
Infrequent Access (IA) Once a month or less $0.0125 Instant, per-retrieval fee
One Zone-IA Reproducible, infrequent data $0.010 Instant, single AZ only
Glacier Instant Archive, but need millisecond access $0.004 Instant
Glacier Flexible Archive, can wait minutes $0.0036 1-5 min (expedited) to 12 hrs
Glacier Deep Archive Long-term archive, rare access $0.00099 12-48 hours

The optimization: Most data is "hot" for days/weeks, then rarely accessed again. Without lifecycle policies, everything stays in Standard forever. At 2-20x the cost it needs to be.

Lifecycle Policies

Automatically transition objects between storage classes based on age:

{
  "Rules": [{
    "ID": "ArchiveOldData",
    "Status": "Enabled",
    "Transitions": [
      { "Days": 30, "StorageClass": "STANDARD_IA" },
      { "Days": 90, "StorageClass": "GLACIER_IR" },
      { "Days": 365, "StorageClass": "DEEP_ARCHIVE" }
    ],
    "Expiration": { "Days": 2555 }
  }]
}

This saves 40-95% on storage for aging data with zero application changes.

Intelligent-Tiering

If you can't predict access patterns, Intelligent-Tiering monitors access and moves objects automatically. Costs $0.0025 per 1,000 objects/month for monitoring, but saves money if patterns are unpredictable.

Event Notifications

S3 can trigger actions when objects are created, deleted, or restored:

Targets:

  • Lambda functions (most common. Process uploads, generate thumbnails, run moderation)
  • SQS queues (buffer for batch processing)
  • SNS topics (fan-out to multiple consumers)
  • EventBridge (content-based routing, more flexibility)

Common patterns:

  • Image upload β†’ Lambda β†’ resize/compress β†’ store processed version
  • CSV upload β†’ Lambda β†’ parse β†’ write to DynamoDB
  • Log file delivery β†’ SQS β†’ batch processor β†’ data warehouse

Security

Block public access

Enable on every bucket. S3 Block Public Access prevents accidental public exposure at the account or bucket level. There's almost never a reason to make a bucket public (use CloudFront for static websites).

Bucket policy vs IAM policy

  • IAM policy: Controls what a principal (role/user) can do. "This Lambda can read from this bucket."
  • Bucket policy: Controls who can access the bucket. "Only these roles can access this bucket." Also enables cross-account access.

Encryption

  • SSE-S3 (default): AWS manages keys. Free. No configuration needed.
  • SSE-KMS: You control the key in KMS. Audit access via CloudTrail. Required for some compliance frameworks.
  • Client-side: You encrypt before upload. AWS never sees plaintext. Most complex.

Default (SSE-S3) is fine for most workloads. Use SSE-KMS when you need key access auditing or key rotation control.

Presigned URLs

Grant temporary access to specific objects without making them public:

  • Upload: Client PUTs directly to S3 (avoids proxying through your API)
  • Download: Client GETs with a time-limited URL

Performance

Request rate

S3 supports 5,500 GET and 3,500 PUT requests per second per prefix. For most workloads this is plenty. If you need more:

  • Distribute objects across multiple prefixes
  • Use random prefixes instead of date-based (e.g., hash prefix)

Transfer acceleration

Use S3 Transfer Acceleration for uploading over long distances. Routes through CloudFront edge locations. Adds cost but significantly improves upload speed from far-away clients.

Multipart upload

For objects over 100MB, use multipart upload:

  • Faster (parallel part uploads)
  • Resilient (retry individual parts)
  • Required for objects over 5GB

CDK Example

import { Bucket, BucketEncryption, BlockPublicAccess, StorageClass } from 'aws-cdk-lib/aws-s3';

const dataBucket = new Bucket(this, 'DataBucket', {
  bucketName: 'my-app-data',
  encryption: BucketEncryption.S3_MANAGED,
  blockPublicAccess: BlockPublicAccess.BLOCK_ALL,
  versioned: true,
  lifecycleRules: [{
    transitions: [
      { storageClass: StorageClass.INFREQUENT_ACCESS, transitionAfter: Duration.days(30) },
      { storageClass: StorageClass.GLACIER_INSTANT_RETRIEVAL, transitionAfter: Duration.days(90) },
    ],
    expiration: Duration.days(365 * 7), // 7 years
  }],
  removalPolicy: RemovalPolicy.RETAIN,
});

Cost Optimization

  • Lifecycle policies are the biggest lever. Moving 1TB from Standard to Glacier saves ~$19/month.
  • Delete incomplete multipart uploads. They accumulate silently. Add a lifecycle rule: AbortIncompleteMultipartUpload: { DaysAfterInitiation: 7 }.
  • Use S3 Storage Lens to analyze access patterns across buckets.
  • Avoid unnecessary requests. ListObjects calls cost $0.005 per 1,000 requests. If you're listing millions of objects, store metadata in DynamoDB instead.
  • Watch for cross-region transfer. Accessing S3 from a different region costs $0.02/GB.

Further Reading

Looking for hands-on help? View my AWS architecture services β†’

Storing data on AWS?

Drop me a message β€” I typically respond within one business day.