The AI Free Lunch Is Ending — Hussain Sehorewala

For the past two years, a quiet assumption has shaped how companies build with AI: that the compute behind it was effectively free. Send the entire document to the model. Let the assistant try again if it fails. Use the most expensive model for every task, just to be safe. Generate ten options and keep the best one.

That assumption was reasonable, because the costs were hidden. Flat subscriptions absorbed the spending. Investor money subsidized the rest. Most product teams never saw a real bill.

That era is closing. Companies are now generating tens of billions of AI requests every month, and the share running at over 100 billion per month is on track to triple between 2025 and 2028.⁰¹ Average AI spending per company jumped from $63,000 a month in 2024 to $85,500 a month in 2025 — a 36 percent increase in a single year. Nearly half of organizations now spend more than $100,000 a month on AI.⁰²

The conversation is shifting from access to AI — can we get it, can we use it — to economics. Which products are worth what they cost. Which workflows pay for themselves. Which features quietly drain the P&L.

The unit of value is no longer the user seat. It is the cost per useful outcome — per resolved ticket, per closed deal, per shipped feature. The shift most leaders haven't priced in yet

Why AI agents broke the math

A chatbot that answers a question is one cost profile. An AI agent — the kind that plans a task, looks things up, calls tools, fails, retries, and loops until it gets to an answer — is a fundamentally different one.

Agents make three to ten times more model calls than a simple chatbot for a single user request.⁰³ Each call processes the full conversation history. A single customer interaction that costs fourteen cents in a demo can become thousands of dollars per day at scale, because the agent re-reads everything every time it thinks.

This is why so many teams ship a beautiful pilot, scale to production, and watch the bill ten-times overnight. The demo costs were never the real costs. They were sample costs, in a controlled setting, with a small set of users behaving politely.

The 10× gap is real, and it's documented

This is not theoretical. Engineering teams are publishing exactly what they spent and exactly what they saved.

59→70%

ProjectDiscovery, a security AI agent company, cut its model bill by 59 percent in one engineering sprint, then reached 70 percent within weeks.⁰⁵

$720→$72

An independent developer reduced his monthly AI bill by 90 percent with one configuration change. Same product. Same usage. One-tenth the cost.⁰⁶

95%

The combined savings available today from caching plus batch processing — for any workload that tolerates a few hours of latency.⁰⁸

The pattern across every published case is consistent: an undisciplined AI workload costs roughly ten times what a disciplined one costs to deliver the same outcome. The same product. The same quality. The same users. Just better economics underneath.

This is the most important number in the AI industry that most leadership teams haven't internalized. Two companies with identical AI features can have radically different unit economics — one survives, the other doesn't, and the difference has nothing to do with the customer experience.

What separates the disciplined from the careless

You don't need to write any code to understand what's going on. The teams winning on AI economics are doing five things. The ones losing are doing none of them.

01They can answer "what did this feature cost yesterday?"

Most teams cannot. They get a single bill at the end of the month, broken down by model, not by product or feature or customer. So when costs spike, no one knows which feature did it, and no one can decide what to cut.

The teams winning have what amounts to a Stripe Dashboard for their AI spending — every request tagged with which feature triggered it, which customer it served, and what outcome it produced. Open-source tools like Langfuse and LiteLLM make this a one-week project, not a one-quarter project.

If your team cannot tell you, by tomorrow afternoon, what your three most expensive AI features were yesterday, you are flying blind. Everything else flows from fixing this first.

02They reuse work instead of redoing it

Every time an AI assistant reads a long document, a system prompt, or a knowledge base, the model does the same processing work over and over. The major AI providers now offer a discount — sometimes a steep one — when you let them remember that work between requests. Anthropic's version of this reuses cached content at 10 percent of the normal cost. OpenAI's saves up to 50 percent.¹⁰

Turning this on is, for many products, a single configuration setting. ProjectDiscovery reached 70 percent savings primarily by making sure their reused content was structured correctly to take advantage of it.⁰⁵ An independent developer mentioned earlier — the one who went from $720 to $72 — did effectively nothing else.

If your engineering team has not enabled this for your largest AI workflows, that is the cheapest decision you will make this quarter.

03They send the right work to the right model

Most teams pick one AI model and use it for everything — usually the most powerful one, just in case. This is the equivalent of flying every executive first class for every flight, including the one to the office across town.

The cost difference between models is enormous, even within the same family:

Cost per million tokens, current pricing¹¹

Claude Haiku fast / cheap

$1 / $5

Claude Sonnet balanced

$3 / $15

Claude Opus flagship

$5 / $25

GPT-4o mini cheapest tier

$0.15 / $0.60

GPT-4o balanced

$5 / $15

Routine tasks — categorizing an email, extracting fields from a document, summarizing a transcript — work beautifully on the cheap tier. Hard reasoning and code generation often need the expensive one. A workflow that runs 80 percent on the cheap model and 20 percent on the expensive one costs roughly one-fifth of running everything on the expensive one — with no quality loss on the routine 80 percent.

OpenAI's GPT-5 already does this routing internally for you.⁰³ The teams winning at AI economics are doing it explicitly across every workflow.

04They put hard limits on every AI agent

An autonomous agent without limits is not automation. It is financial risk wearing a UI.

The teams running agents in production give every agent a budget — a maximum number of steps before it must stop and report back, a maximum cost per task, a maximum runtime. When the limits are hit, the agent fails loudly instead of silently spending another two hundred dollars trying to recover.

If you operate AI agents today and cannot tell me the cost cap on a single task, you have a problem that has not yet revealed itself. It will. Probably on a weekend, when the agent gets stuck in a loop and no one notices for thirty hours.

05They measure cost per outcome, not cost per token

Token spending is a vanity metric. The real metric is cost per resolved customer ticket, cost per accepted code review, cost per qualified sales lead, cost per processed invoice. The thing the AI was supposed to do.

The CEO of a major cloud company recently observed that his most expensive token spenders are also his most productive employees — a $10,000-day of AI usage that ships a million dollars of work is a great trade.⁰¹ The reverse is also true: a $200-a-day agent that resolves nothing should be killed regardless of how cheap each individual call looks.

Without a cost-per-outcome view, every AI investment looks the same. With one, the winners and losers separate immediately.

A 30-day plan, in plain English

If you lead a product, run a team, or are responsible for an AI line in your budget — here is the order of operations.

Week	What changes	What it gets you
Week 1	Get visibility. Ask engineering: can we see, by feature and by customer, what we spent on AI yesterday? If the answer is no, fix that first.	The precondition for every other decision
Week 2	Turn on caching for the largest reused prompts and documents in your product. This is a configuration change, not a project.	50 to 90 percent off the affected workflows
Week 3	Identify the three workflows burning the most. Move the simple ones to a cheaper model. Verify the quality holds.	60 to 80 percent reduction on those flows
Week 4	Put cost and step limits on every agent. Define one outcome metric — per ticket, per deal, per task — that you'll track from now on.	Eliminates runaway spend; aligns engineering with the business

None of this is exotic. None of this requires a special team. ProjectDiscovery's 59 percent reduction came from one engineering sprint. The independent developer's 90 percent reduction was one parameter.

The real obstacle is not difficulty. It is that nobody is looking yet.

Cloud taught a generation of companies that infrastructure is a feature — that the teams who managed it well outpaced the teams who didn't. AI is teaching the same lesson, faster, with bigger numbers, and most leaders haven't recognized the test yet.

The free-lunch phase let everyone experiment. That phase did its job — it pulled millions of teams into AI and proved out hundreds of use cases. But the next phase will reward something different. Not the teams that use AI the most. The teams that use it with the most discipline.

The companies that get this right will have AI products with software-grade margins. The companies that don't will have AI products that look impressive in the demo and bleed money in production.

The gap between those two outcomes, today, is roughly one engineering sprint and four good decisions.

References & Further Reading

Deloitte, AI Token Economics for CFOs, January 2026 — deloitte.com
Bitskingdom, The True Cost of AI in 2025, August 2025 — bitskingdom.com
Zylos Research, AI Agent Cost Optimization: Token Economics and FinOps in Production, February 2026 — zylos.ai
AgentiveAIQ / Azilen, AI Agent Cost Per Month 2025 — agentiveaiq.com
ProjectDiscovery Engineering, How We Cut LLM Costs by 59% With Prompt Caching, 2026 — projectdiscovery.io
Du'An Lightfoot, Prompt Caching is a Must: $720 → $72 Monthly, September 2025 — Medium
Anthropic, Prompt Caching Documentation, 2026 — docs.claude.com
MetaCTO, Claude API Pricing 2026: Full Anthropic Cost Breakdown, March 2026 — metacto.com
BerriAI, LiteLLM (open source AI gateway, Apache 2.0) — github.com/BerriAI/litellm
Requesty, How Prompt Caching Cuts Costs by Up to 90% — requesty.ai
Anthropic, Pricing, official documentation, 2026 — docs.claude.com