Optimizing .NET Lambda Cold Starts with Native AoT and Graviton

.NET Lambda functions have a reputation for slow cold starts. It's deserved. A typical .NET Lambda function running JIT compilation on x86 will take somewhere between 800ms and 1.2 seconds to cold start at 1024MB of memory. That's fine for background processing, but it's noticeable in API responses.

Two changes can cut that roughly in half: compiling with Native AoT and running on Graviton (ARM64) processors. At Oproto, we've deployed both across our Lambda fleet and the results have been consistent. Cold starts at 1024MB dropped from the 0.9-1.2 second range down to around 500ms.

This post walks through how to set up both, what tradeoffs to expect, and the build pipeline details that the documentation glosses over.

Why Cold Starts Happen

When a Lambda function receives a request and no warm execution environment is available, AWS has to spin one up. For .NET, that means downloading the deployment package, starting the runtime, JIT-compiling the code paths needed for the first request, and running your initialization logic (dependency injection, client creation, etc.).

The JIT compilation step is the expensive part. The .NET runtime compiles your IL code to native machine code on demand, method by method, as execution flows through your application. That first request pays the cost of compiling every method it touches.

What Native AoT Changes

Native AoT (Ahead-of-Time) compilation does the machine code generation at build time instead of runtime. The output is a single native binary with no dependency on the .NET runtime. When Lambda loads it, there's no JIT step. The code is already compiled to native instructions for the target architecture.

This eliminates the largest contributor to cold start latency. The tradeoff is that AoT compilation doesn't benefit from 20+ years of JIT optimization. The JIT compiler can make runtime decisions based on actual execution patterns, branch prediction data, and CPU-specific features. AoT compilation makes those decisions once, at build time, with less information.

In practice, this means AoT cold starts are significantly faster, but sustained execution of compute-heavy workloads can be slightly slower than JIT. For most API workloads where requests are short-lived and cold start latency matters, AoT is the clear winner. For long-running Lambda functions doing heavy computation, JIT may still be the better choice.

What Graviton Changes

Graviton processors are AWS's ARM64-based chips. Lambda functions running on Graviton are priced 20% lower per GB-second than x86 equivalents. AWS reports up to 19% better performance on top of that, though real-world results vary by workload.

The cost savings alone make Graviton worth considering. The performance improvement on top is a bonus. For .NET Lambda functions specifically, the combination of AoT + Graviton produces the best cold start numbers we've seen.

Cold Start Numbers

Here are representative cold start times from our production Lambda functions at 1024MB memory. These are P50 values from CloudWatch, not synthetic benchmarks:

Configuration	Cold Start (1024MB, P50)
JIT, x86	~1.1s
Native AoT, Graviton (ARM64)	~0.4s

The jump from JIT/x86 to AoT/Graviton cuts cold starts roughly in half. Graviton and AoT each contribute to the improvement independently, but the combination is where the real gains show up. At higher memory allocations the absolute numbers come down further, but the relative improvement stays consistent.

One thing worth understanding about memory configuration: Lambda allocates CPU proportionally to memory, and at 1769MB you cross the threshold to a full vCPU. Above that, you start getting access to additional cores. JIT benefits significantly from this because the .NET runtime can JIT-compile methods on background threads while the main thread runs your initialization code. More CPU means more of that compilation work happens in parallel. AoT doesn't benefit nearly as much from the extra cores during cold start because there's no compilation step to parallelize. The practical implication is that AoT's advantage over JIT is most pronounced at lower memory configurations where CPU is scarce and JIT is starved for resources. The gap narrows as you increase memory, but AoT still wins at every level.

Setting Up Native AoT

Project Configuration

Add two properties to your .csproj:

<PropertyGroup>
  <TargetFramework>net10.0</TargetFramework>
  <PublishAot>true</PublishAot>
  <InvariantGlobalization>true</InvariantGlobalization>
</PropertyGroup>

InvariantGlobalization is required because the PROVIDED_AL2023 runtime (which AoT uses) doesn't include ICU libraries. Without it, your function will fail to start with a globalization-related error.

Serialization Changes

This is where most people hit their first wall. JIT .NET can use reflection-based serialization. AoT cannot. All JSON serialization must use source generators.

With JIT, you might get away with:

var result = JsonSerializer.Deserialize<MyRequest>(json);

With AoT, you need a JsonSerializerContext that tells the compiler exactly which types to generate serialization code for:

[JsonSerializable(typeof(MyRequest))]
[JsonSerializable(typeof(MyResponse))]
[JsonSerializable(typeof(APIGatewayProxyRequest))]
[JsonSerializable(typeof(APIGatewayProxyResponse))]
public partial class LambdaJsonSerializerContext : JsonSerializerContext
{
}

Then configure your Lambda to use it:

[assembly: LambdaSerializer(
    typeof(SourceGeneratorLambdaJsonSerializer<LambdaJsonSerializerContext>))]

Every type that passes through JSON serialization or deserialization needs to be registered in the context. Miss one and you'll get a runtime error, not a compile-time error. This is the most common source of "it works in JIT but breaks in AoT" issues.

The Bootstrap Binary

When you publish a JIT Lambda function, dotnet publish produces a .dll (or .exe on Windows) that the .NET runtime loads. The Lambda handler points to your assembly name, and the managed runtime takes care of the rest.

AoT publish produces a single native binary named after your project. If your project is MyService.Api, the output is a file called MyService.Api with no extension. But Lambda's PROVIDED_AL2023 custom runtime expects the binary to be named bootstrap.

You need to rename it. In a CI/CD pipeline, this is a simple step after publish:

# After dotnet publish
mv ./publish/MyService.Api ./publish/bootstrap

If you forget this step, Lambda will fail to start with a Runtime.InvalidEntrypoint error.

Lambda Configuration

When running AoT, the Lambda runtime and handler change:

Setting	JIT Value	AoT Value
Runtime	`dotnet10`	`provided.al2023`
Handler	`MyService.Api` (assembly name)	`bootstrap`
Architecture	`x86_64` or `arm64`	`x86_64` or `arm64`

In CDK or CloudFormation, set the runtime to PROVIDED_AL2023 and the handler to bootstrap when AoT is enabled.

A useful pattern is to keep the JIT values in your infrastructure code and override them conditionally:

// CDK example
var runtime = useNativeAot 
    ? Runtime.PROVIDED_AL2023 
    : Runtime.DOTNET_10;

var handler = useNativeAot 
    ? "bootstrap" 
    : "MyService.Api";

This makes it easy to toggle back to JIT for debugging without changing multiple files.

Setting Up Graviton

Switching to Graviton is simpler than AoT. In most cases it's a single configuration change:

// CDK
Architecture = Architecture.ARM_64

Or in CloudFormation/SAM:

Architectures:
  - arm64

For JIT functions, that's it. The .NET runtime handles the architecture difference transparently.

For AoT functions, the architecture matters at build time. This is where it gets interesting.

The Cross-Compilation Problem

Native AoT compiles your .NET code into native machine instructions for a specific architecture. If you build on an x86 machine, you get an x86 binary. If you want an ARM64 binary for Graviton, you need to build on an ARM64 machine.

Cross-compilation from x86 to ARM64 on Linux is technically possible. Microsoft documents a process involving cross-linkers, target-architecture object files, and distribution-specific package configurations. In practice, it's fragile and not well-supported across Linux distributions.

The practical solution is to build on an ARM64 build server. In AWS CodeBuild, this means using a Graviton-based build environment:

# buildspec.yml or pipeline configuration
environment:
  type: ARM_CONTAINER
  compute-type: BUILD_GENERAL1_LARGE
  image: aws/codebuild/amazonlinux2-aarch64-standard:3.0

The build server's architecture must match your target Lambda architecture. If you're deploying to Graviton Lambda functions, build on Graviton CodeBuild instances. The pipeline detects the host architecture and uses the correct .NET runtime identifier (linux-arm64) automatically.

This also means you can't easily test AoT ARM64 builds on your local x86 development machine. Local development and testing should use JIT mode, with AoT reserved for CI/CD and production deployments.

The Stack Trace Problem

There's one debugging tradeoff with AoT that's worth knowing about before you commit.

In JIT mode, exceptions produce stack traces with method names, file paths, and line numbers. In AoT mode, you still get method names and namespaces, but file paths and line numbers are replaced with memory offsets. A stack trace that would normally read:

at MyService.OrderService.CreateOrderAsync(CreateOrderRequest request) 
    in OrderService.cs:line 42

Instead reads:

at MyService.OrderService.<CreateOrderAsync>d__7.MoveNext() + 0x19c

You can still identify which method threw the exception and trace the call chain. What you lose is the ability to jump directly to the exact line. For methods with complex logic, that + 0x19c offset means you're reading through the method to find the failure point rather than clicking a line number.

.NET 10 has this limitation, as did .NET 8 before it. The .NET team has indicated that improved AoT stack traces are planned for a future release, but as of today it remains a real operational cost.

For most API workloads, structured logging with correlation IDs and request context gives you enough information to diagnose issues without relying on exact line numbers. But if your debugging workflow depends heavily on stack trace precision, factor this into your decision.

When to Use AoT vs JIT

AoT is the right choice when:

Cold start latency matters (API endpoints, synchronous workflows)
Your function uses standard serialization patterns
You can build on the target architecture in CI/CD
Your team can work with less readable stack traces

JIT is still the better choice when:

The function does heavy computation where runtime optimization matters
You need reflection-based libraries that don't support AoT
Debugging with full stack traces is critical
Cold start latency isn't a concern (async processing, scheduled tasks)

Both can coexist in the same platform. We run AoT on all API-facing Lambda functions and keep JIT for a handful of compute-heavy background processors.

Useful .NET Lambda Tooling

Two libraries worth mentioning for .NET Lambda development, both fully AoT-compatible:

FluentDynamoDB is a source-generated DynamoDB client for .NET. It uses compile-time code generation instead of reflection for entity mapping, which makes it naturally AoT-safe. No runtime reflection means no trimming warnings and no surprises when you switch from JIT to AoT.

LambdaOpenApi generates OpenAPI documentation from Lambda Annotations using source generators. Like FluentDynamoDB, it avoids reflection entirely, so it works identically in both JIT and AoT modes.

Both are open source and available on NuGet.

Putting It All Together

The full setup for an AoT + Graviton Lambda function involves:

Add <PublishAot>true</PublishAot> and <InvariantGlobalization>true</InvariantGlobalization> to your .csproj (targeting net10.0)
Switch all JSON serialization to source-generated JsonSerializerContext
Set the Lambda architecture to arm64
Set the runtime to provided.al2023 and handler to bootstrap
Build on an ARM64 (Graviton) build server in your CI/CD pipeline
Rename the publish output to bootstrap before packaging

The result is a Lambda function that cold starts in roughly half the time of a JIT/x86 equivalent, costs 20% less per invocation, and produces a single native binary with no runtime dependencies.

The tradeoffs are real: source-generated serialization is more verbose, cross-compilation doesn't work reliably, and stack traces lose readability. But for API workloads where cold start latency directly affects user experience, the math works out clearly in favor of AoT + Graviton.