AI Will Not Replace Your APIs

There's a take circulating in architecture circles that goes something like this: traditional APIs are dead. Instead of building REST endpoints, we should expose a single natural language interface and let an LLM figure out what the caller wants. Send a prompt, get data back. No more endpoint design, no more versioning, no more documentation.

CIO.com declared "the age of the static API is ending". Mastra.ai argues that "every API needs a natural language endpoint". Medium posts proclaim the "API Gateway is Dead" and should be replaced by an "AI Gateway." Satya Nadella has framed natural language as "a universal interface to any computer".

This sounds visionary right up until you do the math.

The Math Doesn't Work

Google Cloud's own research on LLM-powered queries found that LLM invocations add 10-100x to overall query latency and roughly 1000x the cost compared to traditional query execution. A medium-sized analytical query on tens of millions of rows consumes a token volume that is "prohibitively expensive for some applications."

An arxiv paper comparing LLMs to relational databases for query processing concluded bluntly: "We advise against replacing relational databases with LLMs due to their substantial resource utilization."

Your REST API returns a response in 20-50ms. An LLM inference call takes 500ms-3s on a good day, longer if the context window is large. For a single internal dashboard query, maybe that's tolerable. For an API handling 10,000 requests per second from mobile clients, it's not even in the same universe.

This isn't a temporary limitation that will be solved by faster hardware. LLM inference is fundamentally more expensive than executing a pre-compiled query plan against an index. It always will be. You're comparing "look up a value in a B-tree" to "run a neural network with billions of parameters." These are not the same class of operation.

You Still Need the API Layer

The "replace APIs with prompts" crowd seems to miss something fundamental: you don't actually eliminate the API layer. You just hide it.

The LLM could connect directly to your database. Text-to-SQL agents already do this. But that means the model is now responsible for access control, query validation, connection pooling, and transaction safety. Either you build those concerns into the agent's tooling (congratulations, you've rebuilt an API) or you skip them and accept the security risk of an unpredictable system generating arbitrary queries against your production data.

Either way, the request path gets worse.

Best case, the architecture becomes:

Client → LLM → API → Database → API → LLM → Client

You've added two inference hops to what was previously:

Client → API → Database → API → Client

Worst case: you skip the API, the LLM hits the database directly, and you've traded a few milliseconds of structured validation for uncontrolled query generation against production data.

What did you gain? The caller can use natural language instead of a structured request. What did you lose? Predictable latency, deterministic behavior, type safety, cacheability, and a massive chunk of your operating budget.

We Already Solved Flexible Queries

The argument for natural language APIs usually boils down to one of two claims: "Developers shouldn't have to learn a rigid endpoint structure. They should just ask for what they want." Or the more sweeping version: LLMs and agents will simply replace all software, APIs included.

The first is a real ergonomics problem with a real solution. The second is hype dressed up as vision. Software exists because we need deterministic, repeatable, auditable operations running at scale for fractions of a cent per request. LLMs are none of those things. They're useful for tasks that benefit from flexibility and interpretation. They're terrible as a general replacement for compiled logic that already works.

As for the ergonomics problem, we solved it years ago. Twice.

OData gives you a standardized query language over REST. Filter, sort, paginate, expand related entities, select specific fields. All deterministic, all cacheable, all type-safe. It's been production-ready since 2007.

GraphQL lets callers request exactly the data they need in exactly the shape they want it. No over-fetching, no under-fetching. The schema is self-documenting. Introspection is built in. It's been in production at Facebook's scale since 2012 and public since 2015.

Both of these give you flexible, caller-defined queries without the latency, cost, or non-determinism of routing through an LLM. If your complaint about REST is that endpoints are too rigid, the answer is GraphQL or OData. Not "add a billion-parameter neural network to the request path."

Non-Determinism Is Not a Feature

When you call a traditional API with the same parameters, you get the same response (assuming the underlying data hasn't changed). This is a feature, not a limitation. You can cache responses. You can write tests. You can debug production issues by replaying requests. Your monitoring can detect anomalies because it knows what "normal" looks like.

LLMs are probabilistic. The same prompt can produce different outputs on different calls. For a chatbot, that's fine. For an API that returns financial data, inventory counts, or medical records, it's disqualifying. You cannot build reliable systems on non-deterministic data access.

"But you can set temperature to zero!" Sure. And you still get different outputs across model versions, context window variations, and prompt formatting changes. The surface area for unexpected behavior is vastly larger than a deterministic endpoint.

Where AI Actually Fits

AI has a role in the API lifecycle. It's just not the role these hot takes are proposing.

Building and maintaining APIs. This is where the real value is. Not the interface layer. We already have deterministic tooling for generating servers and clients from OpenAPI specs, and those tools are faster, cheaper, and more reliable than an LLM for that job. The value is in the business logic behind the interface: writing the validation rules, the data transformations, the workflow orchestration, the edge case handling. AI accelerates the human work of building and evolving that logic. The API itself stays deterministic and fast at runtime.

Query translation for internal tools. A business analyst who needs data but doesn't know SQL can benefit from an LLM that translates natural language into a structured query, which then executes against your existing API or database. But notice: the LLM is translating to a structured query, not replacing it. The actual data access is still deterministic. And once you have that generated query, you can cache and reuse it for future requests asking for the same data without ever hitting the LLM again. The inference cost becomes a one-time expense, not a per-request tax.

Discovery and documentation. Helping developers find the right endpoint, understand the schema, and generate example requests. That's AI improving the developer experience around APIs, not replacing the APIs themselves.

Routing and orchestration. An agent that reads a user's intent and decides which of your existing APIs to call, in what order, with what parameters. That's a reasonable use of AI. The APIs still exist. The AI is a smart client, not a replacement for the server.

The High-Volume Reality

The people proposing "just send a prompt" as an API strategy are usually thinking about low-volume, internal, exploratory use cases. And for those, fine. If you have 50 analysts making 200 queries a day against a data warehouse, a natural language layer is a reasonable UX improvement.

But APIs serve production traffic. Thousands or millions of requests per hour from other services, mobile apps, web frontends, and third-party integrations. At that scale, every millisecond of latency matters, every dollar of compute cost matters, and every non-deterministic response is a potential bug. The economics of LLM inference make it a poor replacement for the request-response patterns that power modern software.

The Real Motivation

So why does this take keep appearing? Partly because it sounds futuristic and gets engagement. Partly because people conflate "AI can help with X" with "AI should replace X." And partly because the companies selling AI inference have every incentive to convince you that more of your request volume should flow through their models.

If every API call becomes a token-consuming inference request, that's a lot of tokens. And someone is selling those tokens.

The pattern is the same one playing out with tokenmaxxing: the people who profit from higher consumption are the ones telling you that higher consumption is the future. Consider the source.

Build With AI. Don't Replace Your Architecture With It.

AI coding tools are transforming how fast we can build and iterate on APIs. I use them constantly for that. But the APIs themselves need to stay deterministic, fast, and cheap to operate.

The future isn't "send a prompt, get data." The future is APIs that are built faster, documented better, tested more thoroughly, and evolved more safely because AI is part of the development process. The runtime stays structured. The intelligence moves to build time.

That's less catchy than "APIs are dead." But it's what actually works at scale.