Most “AI Agents”
Are Just Chatbots
with Extra Steps
88% of AI pilots never reach production.
The #1 cause isn't bad models. It's wrong architecture.
By Saheb Singh · Enterprise AI, American Express. Ex-Google. CMU CS.
The Architecture Spectrum
Four things that get called “AI” — only one is an agent.
Gartner, 2025: AI Agents at Peak of Inflated Expectations. GenAI already in the Trough of Disillusionment.
An LLM in a pipeline ≠ an agent. Gartner: only ~130 of thousands of “agent” vendors are real.
Most 'AI Agents' Aren't Agents
An LLM in a pipeline ≠ an agent. Gartner: only ~130 of thousands of “agent” vendors are real.
Let's start with the uncomfortable truth: most 'AI agents' in production today are agentic workflows — deterministic pipelines with an LLM at one or two steps. Gartner estimates only ~130 of thousands of 'agentic AI' vendors are real. The rest are agent-washing.
They look like agents in demos. They're marketed as agents. But they can't adapt to novel situations, don't maintain state across sessions, and follow pre-defined orchestration paths. Deloitte found only 11% of enterprises have agents in production — and of those, Menlo Ventures found only 16% are truly agentic.
This isn't pedantic. The distinction determines your architecture, your risk model, your governance requirements, and your cost structure. Get it wrong, and you'll build chatbot-level guardrails for agent-level autonomy — or agent-level overhead for a problem a simple pipeline could solve.
A Stateless Function
f(prompt) → response
A chatbot is a pure function: input in, output out. No memory between calls. No tools. No planning. Gmail's Smart Reply, customer service FAQ bots, most 'AI-powered' support widgets — all chatbots. Used by billions.
This isn't a criticism. The question is whether your problem requires more than a stateless text transformation. If not, you've found the cheapest, lowest-risk architecture. At ~$0.001/query, chatbots are 100-1000x cheaper than agents.
Human-in-the-Loop
Add a context window and a human checkpoint. A copilot suggests, a human decides. GitHub Copilot doesn't push code — a developer hits Tab or Escape. The blast radius of a bad suggestion is zero until a human approves it.
McKinsey reports ~70% of Fortune 500 use Microsoft 365 Copilot. But here's the Gen AI Paradox: horizontal copilots deliver diffuse, hard-to-measure gains. The real value is in vertical, domain-specific copilots — and 90% of those are stuck in pilot mode.
Autonomous Reasoning Loop
Remove the human from the loop. Give the system persistent memory, tool access via MCP (Model Context Protocol), and multi-step planning. It reasons about a goal, acts, observes results, adjusts — a ReAct loop.
This is where things get powerful and dangerous. ASAPP found agents fail on multi-step tasks ~70% of the time. Inference costs multiply (5-50 LLM calls per task). And autonomous systems that can take real-world actions require fundamentally different governance than suggestion engines.
The companies deploying agents successfully started with copilots, learned where humans add value and where they don't, and gradually widened the autonomy boundary. They earned the right to automate.
What happens when you get it wrong
War Story
The $2.1M “Agent” That Was Really a Chatbot
A real scenario. Anonymized details, real architecture decisions, real consequences. This is what happens when you deploy agent-level autonomy with chatbot-level governance.
Engineering team at a Series C fintech builds an 'AI agent' for customer support escalation. The system reads tickets, queries internal docs, and drafts responses. Leadership calls it their 'autonomous support agent' on the earnings call.
Next issue
The Agentic AI Quality Crisis
57% of teams have agents in production. Only 37% evaluate if their outputs are correct. The quality gap nobody talks about.
Next issue: March 3 · Free · Unsubscribe anytime
Free · Every other Tuesday · 5-min read
So what should you actually build?
Decision Framework
What should you actually build?
The answer isn't always “agents.” Walk through these questions. Be honest — the right architecture is the simplest one that solves your actual problem.
What does your AI system need to do?
Start with the problem, not the technology. The right architecture follows from the requirements.
Governance Audit
7 questions before you deploy an agent.
Gartner expects 80%+ of unauthorized AI transactions to be internal violations — not external attacks. The risk is already inside the building. If you can't answer all seven, you're not ready.
The uncomfortable truth: Most organizations jumping to agents don't have the governance infrastructure to handle autonomous AI. The companies that deploy agents successfully started with copilots, learned where humans add value and where they don't, and gradually widened the autonomy boundary. They earned the right to automate. McKinsey calls it the “Gen AI Paradox” — 80% of companies with agents see no EBIT impact. The ones that do spent 70% of their effort on people and process, not model selection.