AI Agents on AWS

From proof-of-concept to production, without the "it works in the demo" gap

You've seen the demo. The agent answered questions, pulled data from your docs, maybe even called an API. Now someone asks: "How do we put this in production?" and suddenly you're thinking about auth, error handling, cost controls, observability, and what happens when the model hallucinates in front of a customer.

That gap between demo and production is mostly infrastructure. Idempotent operations, graceful degradation, proper observability, clear security boundaries. The LLM adds non-determinism on top, which means your guardrails need to be better than usual, not worse.

Where things actually stand

AI agents on AWS are powerful but young. Bedrock's agent orchestration works for many use cases, but token budgets can be exceeded on complex reasoning, action group invocations add latency, and the agent's decision-making isn't always transparent.

I'll tell you when an agent is the right solution and when a simpler approach β€” a well-designed API, a rule engine, a basic prompt chain β€” would serve you better. Not everything needs to be an agent.

What this looks like in practice

Bedrock Agents with action groups

Multi-step agents that reason about requests and call your APIs. I design the OpenAPI schemas, build the Lambda action groups, configure guardrails. The agent does what it should and stops there.

RAG that returns relevant results

Most RAG setups I see return garbage because the chunking doesn't match the document structure. Proper chunking strategy, embedding model selection, and retrieval tuning make the difference between "the agent answered correctly" and "the agent made something up."

Event-driven agent workflows

Agents triggered by events: new support tickets, document uploads, system alerts. They process asynchronously, take action when confidence is high, and flag humans when it's not. EventBridge + Step Functions + Bedrock.

Cost controls

Token budgets per interaction, maximum reasoning steps, circuit breakers on runaway loops, cost allocation per feature. Agents get expensive if nobody's watching.

Decisions that matter early

Easy to get wrong, expensive to fix later:

Which model?

Claude for complex reasoning, Llama for cost-sensitive high-volume, Mistral for fast lightweight tasks. Wrong choice = paying too much or getting bad output.

Knowledge base backend?

OpenSearch Serverless vs Aurora pgvector. Cost profiles are very different and the right choice depends on query patterns and data volume.

Bedrock Agents vs custom orchestration?

Bedrock's built-in orchestration is simpler but less flexible. Step Functions + direct Bedrock calls gives you more control at more complexity.

Security boundaries?

What data can the agent see? What actions can it take? How do you prevent prompt injection from escalating privileges? Get this wrong and you have a real problem.

How an engagement works

1

Define the use case

What should the agent do? What shouldn't it? What are the failure modes, and what does "good enough" look like? We answer these before writing code.

2

Pick the simplest architecture that works

Sometimes that's a Bedrock Agent with action groups. Sometimes it's a prompt chain with no orchestration. I'll recommend the least complex option that meets your requirements.

3

Build with guardrails from day one

CDK, proper IAM boundaries, cost controls, observability. These aren't things you add after the demo works. They're the difference between a demo and a product.

4

Test the edges

What happens when the model is uncertain? When an action group returns an error? When the input is adversarial? If you only test happy paths, production will surprise you.

Frequently Asked Questions

What is Amazon Bedrock and why would I use it for AI agents?

Amazon Bedrock is a managed service that gives you access to foundation models from Anthropic, Meta, Mistral, and others through a unified API. For AI agents specifically, Bedrock provides built-in agent orchestration, knowledge base integration (RAG), and action groups that let agents call your APIs. The advantage over self-hosting models is that you get production-grade infrastructure without managing GPU instances.

How much does it cost to run AI agents on AWS?

Bedrock charges per input/output token, so costs scale with usage. A typical agent interaction might cost $0.01-0.10 depending on the model and context length. The serverless architecture means you pay nothing when agents aren't being invoked. For RAG pipelines, the main ongoing costs are knowledge base storage (OpenSearch Serverless or Aurora) and embedding generation.

Can AI agents integrate with my existing APIs and databases?

Yes. Bedrock agents use "action groups" which are essentially Lambda functions that the agent can call. You define the API schema (OpenAPI format), and the agent decides when to invoke which action based on the user's request. This means your agent can query databases, call internal APIs, send notifications, or trigger any workflow your Lambda can reach.

What's the difference between RAG and fine-tuning?

RAG (Retrieval-Augmented Generation) gives the model access to your documents at query time by retrieving relevant chunks and including them in the prompt. Fine-tuning permanently adjusts the model's weights with your data. RAG is better for most business use cases: it's cheaper, keeps data fresh, and doesn't risk degrading the model's general capabilities. Fine-tuning makes sense when you need the model to adopt a very specific style or domain expertise that can't be achieved through context alone.

Have an agent use case in mind?

Tell me what you're trying to build and I'll reply within a business day with the simplest way to get there on AWS.