The "unlimited AI" era is already over. So, what’s the alternative?
Remember unlimited cell phone minutes?
Before that, every plan had a bucket and a hard stop. You watched the clock, because going over cost real money. Then unlimited plans showed up and an entire generation stopped thinking about the meter.
AI access followed the same arc, just backward. It started subsidized, close enough to unlimited that nobody questioned it. Teams used it like the meter wasn’t running. For a while, it basically wasn’t.
It is now.
The pitch for AI adoption has always been some version of: it’s cheaper than people.
Nobody finished the sentence. Cheaper doing what, compared to what.
For a narrow set of tasks, repetitive, well-defined, low-judgment, that holds up fine. The problem is the pitch got generalized into something much bigger: hand AI as much of the work as possible, move fast, and figure out the unit economics later. Few companies sat down and calculated the real cost of letting AI do everything. Most just assumed it would obviously come out cheaper than a person, and scaled usage on that assumption.
That assumption is getting expensive to hold onto.
It’s not an isolated case. The industry already has a name for the pattern: tokenmaxxing. Meta and Amazon both built internal leaderboards that tracked the heaviest AI users and rewarded them for it, treating token volume as a stand-in for productivity. Both companies walked the practice back once it became clear that more tokens didn’t mean more value.2
A Deloitte study of a large healthcare enterprise found a quieter version of the same problem. Token usage grew 8 to 10 percent a month, compounding without anyone watching until it hit a trillion tokens over six months. That added up to more than $6 million in unplanned annual cost, discovered only after the fact, because nobody had set up the visibility to catch it while it was happening.3
Then there’s the number that should make any engineering leader stop scrolling. When GitHub changed its billing structure in June 2026, one developer’s monthly bill jumped from $29 to $750. Another went from $50 to $3,000. One company, running 80 developers, calculated that its new monthly AI spend would equal a full engineer’s annual salary.4
A team’s AI bill matching the cost of hiring one more person. Except the bill shows up every month, and there’s no new hire to show for it.
This isn’t only happening on the customer side. The companies selling AI access moved the same direction.
Starting in May 2026, Google switched Gemini to a compute-based usage model. Quotas now factor in prompt complexity, model choice, and conversation length, resetting on a five-hour window with a weekly ceiling layered on top. It’s the same structure Anthropic had already built for Claude.5
Two of the largest AI providers in the world reached the same conclusion independently: flat, predictable access doesn’t survive contact with real usage at scale.
When the people selling the resource start rationing it, that tells you something about where this was always heading.
It’s tempting to read all of this as “AI got too expensive.” That’s not quite the issue.
The real failure is that most organizations scaled AI usage without ever building the habit of asking whether a task needed a model at all, or just needed someone to think it through for five minutes. This applies as much to AI agents for business automation as it does to everyday coding assistants. Cisco’s chief product officer, Jeetu Patel, has described the discipline this requires as layered: tracking cost and usage at the infrastructure layer, the model layer, the application layer, and hardest of all, the agent layer, where autonomous systems can burn through a budget without anyone noticing until it’s gone.6
It’s the same failure we’ve seen play out elsewhere with AI. When teams accept output without scrutiny, understanding erodes. The token version of that habit is delegation without scrutiny: handing AI the whole task because deciding which part actually needed a human is more work than just letting the model run. The expensive habit was never using AI. It was using it without asking first.
Uber didn’t ban AI tools. It put a meter where one hadn’t existed before. That’s the model worth copying, broken into pieces that hold up in practice.
The unlimited AI era ran on the same assumption unlimited cell phone plans eventually broke: that usage without a meter is usage without a cost. It wasn’t true then. It isn’t true now. The companies that handle this well aren’t the ones cutting AI access. They’re the ones who put the meter back where it belongs, before the bill writes itself.