The Tokenator Is Coming For Your Budget.

The "unlimited AI" era is already over. So, what’s the alternative?

Ensolvers

Blog Edition

To learn more about this topic, click here.

AI & Engineering

The “unlimited AI” era is already over. Almost nobody noticed until the bill arrived.

Remember unlimited cell phone minutes?

Before that, every plan had a bucket and a hard stop. You watched the clock, because going over cost real money. Then unlimited plans showed up and an entire generation stopped thinking about the meter.

AI access followed the same arc, just backward. It started subsidized, close enough to unlimited that nobody questioned it. Teams used it like the meter wasn’t running. For a while, it basically wasn’t.

It is now.

The pitch nobody finished

The pitch for AI adoption has always been some version of: it’s cheaper than people.

Nobody finished the sentence. Cheaper doing what, compared to what.

For a narrow set of tasks, repetitive, well-defined, low-judgment, that holds up fine. The problem is the pitch got generalized into something much bigger: hand AI as much of the work as possible, move fast, and figure out the unit economics later. Few companies sat down and calculated the real cost of letting AI do everything. Most just assumed it would obviously come out cheaper than a person, and scaled usage on that assumption.

That assumption is getting expensive to hold onto.

Four companies. Same lesson.

Uber spent its entire 2026 AI tools budget by April, four months into the year. The fix was a hard cap: $1,500 per employee, per month, per coding tool.¹

It’s not an isolated case. The industry already has a name for the pattern: tokenmaxxing. Meta and Amazon both built internal leaderboards that tracked the heaviest AI users and rewarded them for it, treating token volume as a stand-in for productivity. Both companies walked the practice back once it became clear that more tokens didn’t mean more value.²

$6M+

Unplanned annual cost from token growth nobody tracked until it was too late

Deloitte, 2026 ³

$29 → $750

One developer’s monthly AI bill after a single billing change

GetDX.com, June 2026 ⁴

A Deloitte study of a large healthcare enterprise found a quieter version of the same problem. Token usage grew 8 to 10 percent a month, compounding without anyone watching until it hit a trillion tokens over six months. That added up to more than $6 million in unplanned annual cost, discovered only after the fact, because nobody had set up the visibility to catch it while it was happening.³

Then there’s the number that should make any engineering leader stop scrolling. When GitHub changed its billing structure in June 2026, one developer’s monthly bill jumped from $29 to $750. Another went from $50 to $3,000. One company, running 80 developers, calculated that its new monthly AI spend would equal a full engineer’s annual salary.⁴

A team’s AI bill matching the cost of hiring one more person. Except the bill shows up every month, and there’s no new hire to show for it.

The people selling AI are tightening the meter too

This isn’t only happening on the customer side. The companies selling AI access moved the same direction.

Starting in May 2026, Google switched Gemini to a compute-based usage model. Quotas now factor in prompt complexity, model choice, and conversation length, resetting on a five-hour window with a weekly ceiling layered on top. It’s the same structure Anthropic had already built for Claude.⁵

Two of the largest AI providers in the world reached the same conclusion independently: flat, predictable access doesn’t survive contact with real usage at scale.

When the people selling the resource start rationing it, that tells you something about where this was always heading.

This isn’t really a cost problem

It’s tempting to read all of this as “AI got too expensive.” That’s not quite the issue.

The real failure is that most organizations scaled AI usage without ever building the habit of asking whether a task needed a model at all, or just needed someone to think it through for five minutes. This applies as much to AI agents for business automation as it does to everyday coding assistants. Cisco’s chief product officer, Jeetu Patel, has described the discipline this requires as layered: tracking cost and usage at the infrastructure layer, the model layer, the application layer, and hardest of all, the agent layer, where autonomous systems can burn through a budget without anyone noticing until it’s gone.⁶

It’s the same failure we’ve seen play out elsewhere with AI. When teams accept output without scrutiny, understanding erodes. The token version of that habit is delegation without scrutiny: handing AI the whole task because deciding which part actually needed a human is more work than just letting the model run. The expensive habit was never using AI. It was using it without asking first.

How to avoid the surprise bill

Uber didn’t ban AI tools. It put a meter where one hadn’t existed before. That’s the model worth copying, broken into pieces that hold up in practice.

Action 01

Split the budget by layer

Infrastructure, model choice, application usage, and agent behavior each carry different cost profiles and fail in different ways. One combined “AI budget” line hides exactly where the money is going.

Action 02

Route tasks to the cheapest model that can actually do them

Not every job needs the most capable, most expensive model on the market. Sending simple tasks to lighter models and saving the expensive ones for work that genuinely needs them is one of the highest-leverage cost controls available, and one of the easiest to skip.

Action 03

Set alerts before the limit, not after

The healthcare enterprise in Deloitte’s research didn’t have a usage problem. It had a visibility problem. By the time the cost showed up in a report, there was nothing left to do but explain it.

Action 04

Put a human decision in front of the bill, not after it

Before scaling usage on a workflow, someone should be answering one specific question: does this need a model, or does it need five minutes of thinking? That question is cheap. Skipping it isn’t.

The unlimited AI era ran on the same assumption unlimited cell phone plans eventually broke: that usage without a meter is usage without a cost. It wasn’t true then. It isn’t true now. The companies that handle this well aren’t the ones cutting AI access. They’re the ones who put the meter back where it belongs, before the bill writes itself.

Sources

1. CBC News. Companies question cost of AI as tokenmaxxing spending adds up. June 2026.

2. Crypto Briefing, citing WIRED. Silicon Valley’s new “tokenomics” problem has nothing to do with crypto. June 2026.

3. Deloitte. Navigate the economics of AI: How tokenomics is reshaping AI costs and ROI. January 2026.

4. GetDX.com, citing Troy Gray and DX customer data. AI coding assistant pricing and ROI guide. June 2026.

5. Android Central. Google Gemini Quota Boost Explained. May 2026.

6. SDxCentral. Cisco touts AI tokenomics, but admits the market is “still in phase one.” June 2026.

How we think about this

The engineering team your next project needs probably looks different from the one you’ve been hiring for.

If you’re thinking about what kind of engineering talent actually moves the needle — not just headcount, but the right kind of judgment — that’s a conversation worth having before the spec is written.

How we approach software development expand_circle_right