Bedrock from .NET
Amazon Bedrock gives you access to foundation models (Claude, Llama, Titan, Mistral, and more) through a unified API. No model hosting, no GPU management, no container orchestration. Call an API, get a response.
From .NET, Bedrock is fairly simple: the SDK provides typed request/response objects, and the Converse API gives you a single interface that works across all models. The hard part isn't calling the API. It's designing your application to handle non-deterministic responses, managing token costs, implementing RAG, and building agents that do useful work.
The Converse API (Recommended)
The Converse API is the unified way to call any Bedrock model. Instead of model-specific payload formats, you use a consistent interface:
using Amazon.BedrockRuntime;
using Amazon.BedrockRuntime.Model;
var client = new AmazonBedrockRuntimeClient();
var response = await client.ConverseAsync(new ConverseRequest
{
ModelId = "anthropic.claude-sonnet-4-20250514",
Messages = new List<Message>
{
new()
{
Role = ConversationRole.User,
Content = new List<ContentBlock>
{
new() { Text = "Explain the single-table design pattern in DynamoDB in 3 sentences." },
},
},
},
InferenceConfig = new InferenceConfiguration
{
MaxTokens = 500,
Temperature = 0.3f,
},
});
var reply = response.Output.Message.Content[0].Text;
Console.WriteLine(reply);
Streaming Responses
For user-facing applications, stream the response token-by-token instead of waiting for the full response:
var request = new ConverseStreamRequest
{
ModelId = "anthropic.claude-sonnet-4-20250514",
Messages = new List<Message>
{
new()
{
Role = ConversationRole.User,
Content = new List<ContentBlock>
{
new() { Text = prompt },
},
},
},
InferenceConfig = new InferenceConfiguration
{
MaxTokens = 2048,
Temperature = 0.7f,
},
};
var response = await client.ConverseStreamAsync(request);
await foreach (var item in response.Stream.AsAsyncEnumerable())
{
if (item is ContentBlockDeltaEvent deltaEvent)
{
Console.Write(deltaEvent.Delta.Text);
}
else if (item is MessageStopEvent)
{
Console.WriteLine(); // end of response
}
}
Streaming to an HTTP response (ASP.NET Core)
[HttpPost("chat")]
public async Task Chat([FromBody] ChatRequest request)
{
Response.ContentType = "text/event-stream";
var bedrockRequest = new ConverseStreamRequest
{
ModelId = "anthropic.claude-sonnet-4-20250514",
Messages = BuildMessages(request),
InferenceConfig = new InferenceConfiguration { MaxTokens = 2048 },
};
var response = await _bedrockClient.ConverseStreamAsync(bedrockRequest);
await foreach (var item in response.Stream.AsAsyncEnumerable())
{
if (item is ContentBlockDeltaEvent delta)
{
await Response.WriteAsync($"data: {delta.Delta.Text}\n\n");
await Response.Body.FlushAsync();
}
}
await Response.WriteAsync("data: [DONE]\n\n");
}
System Prompts and Conversation History
var request = new ConverseRequest
{
ModelId = "anthropic.claude-sonnet-4-20250514",
System = new List<SystemContentBlock>
{
new() { Text = "You are a helpful assistant for an e-commerce platform. Be concise. Only answer questions about orders, products, and shipping." },
},
Messages = new List<Message>
{
new() { Role = ConversationRole.User, Content = new List<ContentBlock> { new() { Text = "Where's my order?" } } },
new() { Role = ConversationRole.Assistant, Content = new List<ContentBlock> { new() { Text = "I'd be happy to help. Could you provide your order number?" } } },
new() { Role = ConversationRole.User, Content = new List<ContentBlock> { new() { Text = "ORD-12345" } } },
},
InferenceConfig = new InferenceConfiguration { MaxTokens = 500 },
};
Tool Use (Function Calling)
Let the model call your functions to retrieve data or take actions:
var tools = new List<Tool>
{
new()
{
ToolSpec = new ToolSpecification
{
Name = "get_order_status",
Description = "Look up the current status of a customer order",
InputSchema = new ToolInputSchema
{
Json = JsonDocument.Parse("""
{
"type": "object",
"properties": {
"order_id": {
"type": "string",
"description": "The order ID (e.g., ORD-12345)"
}
},
"required": ["order_id"]
}
"""),
},
},
},
};
var response = await client.ConverseAsync(new ConverseRequest
{
ModelId = "anthropic.claude-sonnet-4-20250514",
Messages = messages,
ToolConfig = new ToolConfiguration { Tools = tools },
});
// Check if the model wants to call a tool
if (response.StopReason == StopReason.ToolUse)
{
var toolUse = response.Output.Message.Content
.First(c => c.ToolUse != null).ToolUse;
if (toolUse.Name == "get_order_status")
{
var orderId = toolUse.Input.RootElement.GetProperty("order_id").GetString()!;
var status = await _orderService.GetStatusAsync(orderId);
// Send the tool result back
messages.Add(response.Output.Message);
messages.Add(new Message
{
Role = ConversationRole.User,
Content = new List<ContentBlock>
{
new()
{
ToolResult = new ToolResultBlock
{
ToolUseId = toolUse.ToolUseId,
Content = new List<ToolResultContentBlock>
{
new() { Text = JsonSerializer.Serialize(status) },
},
},
},
},
});
// Get the final response
var finalResponse = await client.ConverseAsync(new ConverseRequest
{
ModelId = "anthropic.claude-sonnet-4-20250514",
Messages = messages,
ToolConfig = new ToolConfiguration { Tools = tools },
});
}
}
RAG Pattern (Retrieval-Augmented Generation)
The most common production pattern: retrieve relevant documents, stuff them into context, then ask the model to answer based on that context.
public async Task<string> AnswerWithContextAsync(string question, string userId)
{
// 1. Retrieve relevant documents (from OpenSearch, Kendra, or your own vector store)
var relevantDocs = await _searchService.SearchAsync(question, maxResults: 5);
// 2. Build context from retrieved documents
var context = string.Join("\n---\n", relevantDocs.Select(d => d.Content));
// 3. Ask the model with context
var response = await _bedrockClient.ConverseAsync(new ConverseRequest
{
ModelId = "anthropic.claude-sonnet-4-20250514",
System = new List<SystemContentBlock>
{
new() { Text = "Answer the user's question based only on the provided context. If the context doesn't contain the answer, say so." },
},
Messages = new List<Message>
{
new()
{
Role = ConversationRole.User,
Content = new List<ContentBlock>
{
new() { Text = $"Context:\n{context}\n\nQuestion: {question}" },
},
},
},
InferenceConfig = new InferenceConfiguration { MaxTokens = 1000, Temperature = 0.2f },
});
return response.Output.Message.Content[0].Text;
}
CDK Setup (C#)
using Amazon.CDK;
using Amazon.CDK.AWS.IAM;
using Amazon.CDK.AWS.Bedrock;
// Grant Lambda access to invoke Bedrock models
lambdaFunction.AddToRolePolicy(new PolicyStatement(new PolicyStatementProps
{
Actions = new[]
{
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream",
},
Resources = new[]
{
$"arn:aws:bedrock:{Aws.REGION}::foundation-model/anthropic.claude-sonnet-4-20250514",
$"arn:aws:bedrock:{Aws.REGION}::foundation-model/anthropic.claude-haiku-4-20250514",
},
}));
Cost Management
Bedrock charges per token (input and output). In a production application, you need guardrails:
public class BedrockCostGuard
{
private const int MaxInputTokensPerRequest = 4000;
private const decimal MaxDailyCostPerUser = 1.00m; // $1/user/day
public async Task<bool> CanInvokeAsync(string userId)
{
var todayUsage = await _usageTracker.GetDailyUsageAsync(userId);
return todayUsage.EstimatedCost < MaxDailyCostPerUser;
}
}
Rough pricing (Claude Sonnet):
- Input: ~$3 per million tokens
- Output: ~$15 per million tokens
A typical question-answer interaction (500 token prompt + 300 token response) costs ~$0.006. RAG with large context windows gets expensive fast. 4000 tokens of context + response is ~$0.02 per query.
Tips
- Use the Converse API: not the legacy
InvokeModelAPI. Converse works across all models with the same interface and supports tool use natively. - Stream for user-facing responses. Waiting 3-5 seconds for a full response feels broken. Streaming the first token in 200ms feels responsive.
- Temperature matters. Use low temperature (0.1-0.3) for factual/structured tasks, higher (0.7-1.0) for creative tasks.
- Implement retry with backoff. Bedrock throttles at high concurrency. Use exponential backoff with jitter.
- Log everything. Token counts, latency, model responses. You need this for cost tracking and debugging hallucinations.
- Bedrock Guardrails can filter harmful content, PII, and off-topic responses without custom code. Configure them per-model in the Bedrock console.
Further Reading
Looking for hands-on help? View my .NET on AWS services β