LGTM
|

Most “AI Agents”
Are Just Chatbots with Extra Steps

88% of AI pilots never reach production.
The #1 cause isn't bad models. It's wrong architecture.

By Saheb Singh · Enterprise AI, American Express. Ex-Google. CMU CS.

The Architecture Spectrum

Four things that get called “AI” — only one is an agent.

Gartner, 2025: AI Agents at Peak of Inflated Expectations. GenAI already in the Trough of Disillusionment.

The Trap

Most 'AI Agents' Aren't Agents

Input
fixed
Step 1
fixed logic
LLM
single pass
Step 2
fixed logic
Output
fixed
No feedback loop · No planning · Not an agent

An LLM in a pipeline ≠ an agent. Gartner: only ~130 of thousands of “agent” vendors are real.

Let's start with the uncomfortable truth: most 'AI agents' in production today are agentic workflows — deterministic pipelines with an LLM at one or two steps. Gartner estimates only ~130 of thousands of 'agentic AI' vendors are real. The rest are agent-washing.

They look like agents in demos. They're marketed as agents. But they can't adapt to novel situations, don't maintain state across sessions, and follow pre-defined orchestration paths. Deloitte found only 11% of enterprises have agents in production — and of those, Menlo Ventures found only 16% are truly agentic.

This isn't pedantic. The distinction determines your architecture, your risk model, your governance requirements, and your cost structure. Get it wrong, and you'll build chatbot-level guardrails for agent-level autonomy — or agent-level overhead for a problem a simple pipeline could solve.

Chatbot

A Stateless Function

Input
string
LLM
single pass
Output
string

f(prompt) → response

○ Stateless○ No tools○ ~$0.001/query

A chatbot is a pure function: input in, output out. No memory between calls. No tools. No planning. Gmail's Smart Reply, customer service FAQ bots, most 'AI-powered' support widgets — all chatbots. Used by billions.

This isn't a criticism. The question is whether your problem requires more than a stateless text transformation. If not, you've found the cheapest, lowest-risk architecture. At ~$0.001/query, chatbots are 100-1000x cheaper than agents.

Copilot

Human-in-the-Loop

Context Window
code, docs, data
User Input
LLM
context-aware
Suggestion
Human decides · ~$0.01/query

Add a context window and a human checkpoint. A copilot suggests, a human decides. GitHub Copilot doesn't push code — a developer hits Tab or Escape. The blast radius of a bad suggestion is zero until a human approves it.

McKinsey reports ~70% of Fortune 500 use Microsoft 365 Copilot. But here's the Gen AI Paradox: horizontal copilots deliver diffuse, hard-to-measure gains. The real value is in vertical, domain-specific copilots — and 90% of those are stuck in pilot mode.

Agent

Autonomous Reasoning Loop

Goal
human-defined
ReAct Loop
Reason
think
Act
tool call
Observe
result
loops until goal met
Tools / MCP
Memory · ~$0.10-1.00/task

Remove the human from the loop. Give the system persistent memory, tool access via MCP (Model Context Protocol), and multi-step planning. It reasons about a goal, acts, observes results, adjusts — a ReAct loop.

This is where things get powerful and dangerous. ASAPP found agents fail on multi-step tasks ~70% of the time. Inference costs multiply (5-50 LLM calls per task). And autonomous systems that can take real-world actions require fundamentally different governance than suggestion engines.

The companies deploying agents successfully started with copilots, learned where humans add value and where they don't, and gradually widened the autonomy boundary. They earned the right to automate.

What happens when you get it wrong

War Story

The $2.1M “Agent” That Was Really a Chatbot

A real scenario. Anonymized details, real architecture decisions, real consequences. This is what happens when you deploy agent-level autonomy with chatbot-level governance.

Week 1The Build

Engineering team at a Series C fintech builds an 'AI agent' for customer support escalation. The system reads tickets, queries internal docs, and drafts responses. Leadership calls it their 'autonomous support agent' on the earnings call.

1 / 5

Next issue

The Agentic AI Quality Crisis

57% of teams have agents in production. Only 37% evaluate if their outputs are correct. The quality gap nobody talks about.

Next issue: March 3 · Free · Unsubscribe anytime

Free · Every other Tuesday · 5-min read

So what should you actually build?

Decision Framework

What should you actually build?

The answer isn't always “agents.” Walk through these questions. Be honest — the right architecture is the simplest one that solves your actual problem.

What does your AI system need to do?

Start with the problem, not the technology. The right architecture follows from the requirements.

Governance Audit

7 questions before you deploy an agent.

Gartner expects 80%+ of unauthorized AI transactions to be internal violations — not external attacks. The risk is already inside the building. If you can't answer all seven, you're not ready.

The uncomfortable truth: Most organizations jumping to agents don't have the governance infrastructure to handle autonomous AI. The companies that deploy agents successfully started with copilots, learned where humans add value and where they don't, and gradually widened the autonomy boundary. They earned the right to automate. McKinsey calls it the “Gen AI Paradox” — 80% of companies with agents see no EBIT impact. The ones that do spent 70% of their effort on people and process, not model selection.