Using Amazon Bedrock with .NET

Bedrock from .NET

Amazon Bedrock gives you access to foundation models (Claude, Llama, Titan, Mistral, and more) through a unified API. No model hosting, no GPU management, no container orchestration. Call an API, get a response.

From .NET, Bedrock is fairly simple: the SDK provides typed request/response objects, and the Converse API gives you a single interface that works across all models. The hard part isn't calling the API. It's designing your application to handle non-deterministic responses, managing token costs, implementing RAG, and building agents that do useful work.

The Converse API (Recommended)

The Converse API is the unified way to call any Bedrock model. Instead of model-specific payload formats, you use a consistent interface:

using Amazon.BedrockRuntime;
using Amazon.BedrockRuntime.Model;

var client = new AmazonBedrockRuntimeClient();

var response = await client.ConverseAsync(new ConverseRequest
{
    ModelId = "anthropic.claude-sonnet-4-20250514",
    Messages = new List<Message>
    {
        new()
        {
            Role = ConversationRole.User,
            Content = new List<ContentBlock>
            {
                new() { Text = "Explain the single-table design pattern in DynamoDB in 3 sentences." },
            },
        },
    },
    InferenceConfig = new InferenceConfiguration
    {
        MaxTokens = 500,
        Temperature = 0.3f,
    },
});

var reply = response.Output.Message.Content[0].Text;
Console.WriteLine(reply);

Streaming Responses

For user-facing applications, stream the response token-by-token instead of waiting for the full response:

var request = new ConverseStreamRequest
{
    ModelId = "anthropic.claude-sonnet-4-20250514",
    Messages = new List<Message>
    {
        new()
        {
            Role = ConversationRole.User,
            Content = new List<ContentBlock>
            {
                new() { Text = prompt },
            },
        },
    },
    InferenceConfig = new InferenceConfiguration
    {
        MaxTokens = 2048,
        Temperature = 0.7f,
    },
};

var response = await client.ConverseStreamAsync(request);

await foreach (var item in response.Stream.AsAsyncEnumerable())
{
    if (item is ContentBlockDeltaEvent deltaEvent)
    {
        Console.Write(deltaEvent.Delta.Text);
    }
    else if (item is MessageStopEvent)
    {
        Console.WriteLine(); // end of response
    }
}

Streaming to an HTTP response (ASP.NET Core)

[HttpPost("chat")]
public async Task Chat([FromBody] ChatRequest request)
{
    Response.ContentType = "text/event-stream";
    
    var bedrockRequest = new ConverseStreamRequest
    {
        ModelId = "anthropic.claude-sonnet-4-20250514",
        Messages = BuildMessages(request),
        InferenceConfig = new InferenceConfiguration { MaxTokens = 2048 },
    };

    var response = await _bedrockClient.ConverseStreamAsync(bedrockRequest);

    await foreach (var item in response.Stream.AsAsyncEnumerable())
    {
        if (item is ContentBlockDeltaEvent delta)
        {
            await Response.WriteAsync($"data: {delta.Delta.Text}\n\n");
            await Response.Body.FlushAsync();
        }
    }

    await Response.WriteAsync("data: [DONE]\n\n");
}

System Prompts and Conversation History

var request = new ConverseRequest
{
    ModelId = "anthropic.claude-sonnet-4-20250514",
    System = new List<SystemContentBlock>
    {
        new() { Text = "You are a helpful assistant for an e-commerce platform. Be concise. Only answer questions about orders, products, and shipping." },
    },
    Messages = new List<Message>
    {
        new() { Role = ConversationRole.User, Content = new List<ContentBlock> { new() { Text = "Where's my order?" } } },
        new() { Role = ConversationRole.Assistant, Content = new List<ContentBlock> { new() { Text = "I'd be happy to help. Could you provide your order number?" } } },
        new() { Role = ConversationRole.User, Content = new List<ContentBlock> { new() { Text = "ORD-12345" } } },
    },
    InferenceConfig = new InferenceConfiguration { MaxTokens = 500 },
};

Tool Use (Function Calling)

Let the model call your functions to retrieve data or take actions:

var tools = new List<Tool>
{
    new()
    {
        ToolSpec = new ToolSpecification
        {
            Name = "get_order_status",
            Description = "Look up the current status of a customer order",
            InputSchema = new ToolInputSchema
            {
                Json = JsonDocument.Parse("""
                {
                    "type": "object",
                    "properties": {
                        "order_id": {
                            "type": "string",
                            "description": "The order ID (e.g., ORD-12345)"
                        }
                    },
                    "required": ["order_id"]
                }
                """),
            },
        },
    },
};

var response = await client.ConverseAsync(new ConverseRequest
{
    ModelId = "anthropic.claude-sonnet-4-20250514",
    Messages = messages,
    ToolConfig = new ToolConfiguration { Tools = tools },
});

// Check if the model wants to call a tool
if (response.StopReason == StopReason.ToolUse)
{
    var toolUse = response.Output.Message.Content
        .First(c => c.ToolUse != null).ToolUse;
    
    if (toolUse.Name == "get_order_status")
    {
        var orderId = toolUse.Input.RootElement.GetProperty("order_id").GetString()!;
        var status = await _orderService.GetStatusAsync(orderId);
        
        // Send the tool result back
        messages.Add(response.Output.Message);
        messages.Add(new Message
        {
            Role = ConversationRole.User,
            Content = new List<ContentBlock>
            {
                new()
                {
                    ToolResult = new ToolResultBlock
                    {
                        ToolUseId = toolUse.ToolUseId,
                        Content = new List<ToolResultContentBlock>
                        {
                            new() { Text = JsonSerializer.Serialize(status) },
                        },
                    },
                },
            },
        });
        
        // Get the final response
        var finalResponse = await client.ConverseAsync(new ConverseRequest
        {
            ModelId = "anthropic.claude-sonnet-4-20250514",
            Messages = messages,
            ToolConfig = new ToolConfiguration { Tools = tools },
        });
    }
}

RAG Pattern (Retrieval-Augmented Generation)

The most common production pattern: retrieve relevant documents, stuff them into context, then ask the model to answer based on that context.

public async Task<string> AnswerWithContextAsync(string question, string userId)
{
    // 1. Retrieve relevant documents (from OpenSearch, Kendra, or your own vector store)
    var relevantDocs = await _searchService.SearchAsync(question, maxResults: 5);
    
    // 2. Build context from retrieved documents
    var context = string.Join("\n---\n", relevantDocs.Select(d => d.Content));
    
    // 3. Ask the model with context
    var response = await _bedrockClient.ConverseAsync(new ConverseRequest
    {
        ModelId = "anthropic.claude-sonnet-4-20250514",
        System = new List<SystemContentBlock>
        {
            new() { Text = "Answer the user's question based only on the provided context. If the context doesn't contain the answer, say so." },
        },
        Messages = new List<Message>
        {
            new()
            {
                Role = ConversationRole.User,
                Content = new List<ContentBlock>
                {
                    new() { Text = $"Context:\n{context}\n\nQuestion: {question}" },
                },
            },
        },
        InferenceConfig = new InferenceConfiguration { MaxTokens = 1000, Temperature = 0.2f },
    });
    
    return response.Output.Message.Content[0].Text;
}

CDK Setup (C#)

using Amazon.CDK;
using Amazon.CDK.AWS.IAM;
using Amazon.CDK.AWS.Bedrock;

// Grant Lambda access to invoke Bedrock models
lambdaFunction.AddToRolePolicy(new PolicyStatement(new PolicyStatementProps
{
    Actions = new[]
    {
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream",
    },
    Resources = new[]
    {
        $"arn:aws:bedrock:{Aws.REGION}::foundation-model/anthropic.claude-sonnet-4-20250514",
        $"arn:aws:bedrock:{Aws.REGION}::foundation-model/anthropic.claude-haiku-4-20250514",
    },
}));

Cost Management

Bedrock charges per token (input and output). In a production application, you need guardrails:

public class BedrockCostGuard
{
    private const int MaxInputTokensPerRequest = 4000;
    private const decimal MaxDailyCostPerUser = 1.00m; // $1/user/day
    
    public async Task<bool> CanInvokeAsync(string userId)
    {
        var todayUsage = await _usageTracker.GetDailyUsageAsync(userId);
        return todayUsage.EstimatedCost < MaxDailyCostPerUser;
    }
}

Rough pricing (Claude Sonnet):

Input: ~$3 per million tokens
Output: ~$15 per million tokens

A typical question-answer interaction (500 token prompt + 300 token response) costs ~$0.006. RAG with large context windows gets expensive fast. 4000 tokens of context + response is ~$0.02 per query.

Tips

Use the Converse API: not the legacy InvokeModel API. Converse works across all models with the same interface and supports tool use natively.
Stream for user-facing responses. Waiting 3-5 seconds for a full response feels broken. Streaming the first token in 200ms feels responsive.
Temperature matters. Use low temperature (0.1-0.3) for factual/structured tasks, higher (0.7-1.0) for creative tasks.
Implement retry with backoff. Bedrock throttles at high concurrency. Use exponential backoff with jitter.
Log everything. Token counts, latency, model responses. You need this for cost tracking and debugging hallucinations.
Bedrock Guardrails can filter harmful content, PII, and off-topic responses without custom code. Configure them per-model in the Bedrock console.