You’re about to feel the AI money squeeze

Earlier this month, millions of OpenClaw users woke up to a sweeping mandate: The viral AI agent tool, which this year took the worldwide tech industry by storm, had been severely restricted by Anthropic.

Anthropic, like other leading AI labs, was under immense pressure to lessen the strain on its systems and start turning a profit. So if the users wanted its Claude AI to power their popular agents, they’d have to start paying handsomely for the privilege.

“Our subscriptions weren’t built for the usage patterns of these third-party tools,” wrote Boris Cherny, head of Claude Code, on X. “We want to be intentional in managing our growth to continue to serve our customers sustainably long-term. This change is a step toward that.”

The announcement was a sign of the times. Investors have poured hundreds of billions of dollars into companies like OpenAI and Anthropic to help them scale and build out their compute. Now, they’re expecting returns. After years of offering cheap or totally free access to advanced AI systems, the bill is starting to come due — and downstream, users are beginning to feel the pinch.

Over the past few years, most top AI labs have introduced new subscription tiers to court power users. OpenAI and Anthropic shifted their pricing plans for enterprise. OpenAI introduced in-platform advertisements. Anthropic, of course, restricted third-party tools.

In some ways, this is a tale as old as time, and particularly, a clear echo of the tech boom of the ’10s. Venture capitalists helped startups subsidize fast growth in all kinds of areas: ride-hailing apps, e-commerce, takeout and grocery delivery. Once companies cemented their power, they raised prices, added new revenue streams, and delivered a return to investors. Or they didn’t — and they crashed and burned.

But AI companies have gone through more investor money at a faster pace than any other sector in recent history. AI companies have broken ground on data centers around the world, dedicating billions of dollars with promises of better models, lower costs, and AI for everyone. Even stemming the flow of losses will be difficult — let alone making the kind of money investors are hoping for. “When you sink trillions of dollars into data centers, you’re going to expect a return,” said Will Sommer, a senior director analyst at Gartner, who specializes in economic forecasting and quantitative modeling.

“When you sink trillions of dollars into data centers, you’re going to expect a return.”

“Is the era of basically free or close-to-free AI kind of coming to an end here?” said Mark Riedl, a professor in the Georgia Tech School of Interactive Computing. “It’s too soon to say for certain, but there are some signs.”

Gartner’s Sommer studies long-term economic market trends related to generative AI, including calculating just how much money is at stake. Between 2024 and 2029, he said, Gartner estimates that capital investment in AI data centers will reach about $6.3 trillion — a “massive amount of money.”

To avoid a write-down of these assets, major AI model providers would ideally generate a return on invested capital (ROIC) of about 25 percent, Sommer said. (That’s about what Amazon, Microsoft, and Google tend to earn on their overall capital investments.) On the other hand, if the returns fall below 12 percent, institutional capital loses interest — there’s better money elsewhere, Sommer said. Below 7 percent, you’re in write-down territory, which is “an unmitigated disaster for all of the investors in this technology,” Sommer said.

To reach that bare minimum of 7 percent, Gartner forecasts that large AI companies would need to earn cumulatively close to $7 trillion in AI-driven revenue through 2029, which is close to $2 trillion per year by the end of the period. In order to achieve “historic returns,” the providers would need to earn nearly $8.2 trillion in the same period.

OpenAI has already made $600 billion in spending commitments through 2030, the company said in February, which Sommer says is already a “massive step down” from the $1.4 trillion it had planned before. Based on OpenAI’s revenue forecasts and potential compound annual growth, Sommer said that even in the best-case scenario, he predicts that the lab would only hit a fraction of the overall spend required to hit that 7 percent ROIC.

How do model providers like OpenAI make this money? By selling access to what are known as tokens. A token is essentially a unit of data input that an AI model can understand and process — it could be text, images, audio, or something else. One token is generally worth about four characters in the English language — the word “bathroom,” for instance, would likely be processed as two tokens. One paragraph in English is generally about 100 tokens, and a 1,500-word essay may be about 2,050 tokens, per an OpenAI estimate.

To hit investors’ revenue expectations, providers would need to process a “mind-bending” number of tokens, Sommer said.

By most measures, companies’ numbers are already pretty big. Google announced it was processing 1.3 quadrillion tokens in October, for instance. If you add all the providers’ estimates up, Sommer said, you get 100 to 200 quadrillion tokens a year. But to achieve the the $2 trillion in annual spend Gartner calculated, providers would need to be generating, by conservative estimates, a cumulative 10 sextillion tokens per year. (To make that slightly less abstract, a quadrillion has 15 zeros, and a sextillion has 21.) Even assuming a very generous profit margin of 10 percent per token, that would mean that token consumption between now and 2030 would need to grow by 50,000–100,000x.

To hit investors’ revenue expectations, providers would need to process a “mind-bending” number of tokens

Right now, constantly seeking more data centers and strapped for compute, companies aren’t capable of processing this many tokens. Even if they could, they’d face a problem: they’re likely taking a loss on them. Sommer estimates that if you only account for the direct cost of infrastructure and electricity, “every company is making very reasonable margins on every token.” But that margin is probably tighter or nonexistent with newer, more token-hungry models. And it’s eaten up completely by indirect operation costs, like building out more compute and the “ungodly” expense of constantly training the next big model.

“As soon as you then add all of the infrastructure that needs to be built for the next generation of model, and you look at how these models are going to scale, it becomes increasingly untenable,” Sommer said.

Sommer predicts that many companies “won’t be able to sustain their burn rate,” and says market consolidation is virtually inevitable — in his eyes, no more than two large language model providers in any regional market will survive. And the era where nearly every service has a fairly generous unpaid tier probably isn’t going to last.

“For the [labs] that have a lot of users that were free, I think the question was never really if you’d monetize the free tier but it was when, and how badly do you do it,” Jay Madheswaran, cofounder of legal AI startup Eve, which is a client of both OpenAI and Anthropic, told The Verge.

Even if you do find a way to square the math, building customer loyalty can be just as complicated. Top labs are constantly leapfrogging each other on model debuts, feature releases, strategy shifts, hiring announcements, and more. It can be tough to stay on top long enough to corner any part of the market — engineers and developers are famous for switching which model they’re using on any given day, and it’s easy to do so.

So labs are increasingly emphasizing the importance of locking users into their platform and tools. Anthropic, which primarily builds for enterprise clients, has been going all in on its coding efforts, and OpenAI has recently pledged to mirror Anthropic’s focus on coding and enterprise, ahead of both companies reportedly racing each other to IPO by the end of 2026.

For now, that competition is benefiting end users. “It’s an arms race where you cannot let up at all because the switching cost is zero,” said Soham Mazumdar, cofounder and CEO of Wisdom AI, adding, “As a common man, I’m going to be the winner longer-term.”

In the early days of AI, the bulk of compute costs went to training initial models, while inference (or performing tasks) was cheaper. As models have advanced and systems have added features, however, inference has gotten far more resource-intensive. AI agents, or tools that ideally can complete complex, multistep tasks on your behalf without constant hand-holding, now use vastly more tokens than the basic chatbot models did a few years back.

Reasoning models, which increasingly power AI agents, are notoriously expensive on the inference side as well, said Georgia Tech’s Riedl. These agents — such as popular open-source platform OpenClaw — are typically more efficient and effective than ones without reasoning, but they also expend far more tokens doing behind-the-scenes work the end user may not see. That may look like “thinking through” a lot of different potential paths, launching sub-agents to do portions of a task, or verifying the accuracy of different steps of the process.

“You put in your one-sentence prompt… and it’ll talk out loud to itself for thousands and thousands of tokens, thousands and thousands of words, maybe even tens of thousands when you get into coding,” Riedl said, adding, “If you have thousands or millions of people using these things every single day, the inference costs of just the users generating tons and tons of tokens all the time really outweighs the training side of things.” If model providers were making a straightforward profit on all these tokens and had the compute to handle them easily, that wouldn’t be a problem for them — but as things stand, it’s a strain.

“The use cases have exploded, and we’re out of capacity.”

“Anybody who was building agents in the past couple of years sort of saw this coming,” said Aaron Levie, CEO of Box, adding, “The use cases have exploded, and we’re out of capacity.”

Top AI labs have recently changed their policies on API usage and third-party tools — like Anthropic essentially banning the use of OpenClaw unless subscribers pay extra — due to the extra strain. “You’ve got these tools that are basically just sitting as background processors on everyone’s laptops and desktops, just continuously waking themselves up, generating some tokens, doing some stuff, and putting themselves back to sleep,” says Riedl.

And no matter what you’re doing with a reasoning-model-powered AI agent, there are likely going to be wasted tokens — meaning times that an AI model goes down a non-useful path and then backtracks, or checks on how something is going but doesn’t change anything, or even pauses to write itself a poem. In an era where labs are likely losing money on some tokens and companies are strapped for compute, the industry is trying to reduce wasted tokens and build more focused and targeted models.

Although it may be good for both paying customers and AI labs alike to make models use fewer tokens, it ironically works against the mission of massively increasing token usage. As Gartner’s Sommer puts it, pricing models may change significantly down the line, but right now, there’s a “narrow space on the treadmill” between short- and long-term goals.

Add this all up, and big AI companies are at a transition point: they’ve attracted huge numbers of users by offering free access, and now they need to keep those users while charging a lot more. “On one hand, they want to see more tokens being generated but they have to either suck up the costs, which they can sort of do as long as venture capital is flowing, or pass the costs back on to [customers],” Riedl said. “Maybe the economics are a little upside down right now.”

These days, OpenAI and Anthropic are often weighing the advantages of older flat-rate subscription plans and ones with metered fees. Both companies’ enterprise plans are now token-based, since usership is “uneven,” as Andrew Filev, founder of Zencoder, called it — one person may use it once or twice a week for a few minutes, while another is running five agents in the background around the clock.

For consumer chatbots, some monetization is taking the form of advertising

In consumer chatbots, some model makers are trying to mitigate this with advertising. OpenAI recently introduced ads within ChatGPT, which show up as a separate sidebar, and it’s reportedly working on a tool to track how well those ads work. (Anthropic famously decried the move in its 2026 Super Bowl ads.)

But for companies that build tools on top of models like GPT-5 or Claude Opus, the price of tokens is going up, and the extra cost is largely trickling down to their customers. Multiple tech companies The Verge spoke with said they, or their customers, are changing strategies to offset the new pricing. Some are considering moving fully or partially to open-source models, and some are using considerable time and resources to evaluate how expensive high-end models perform on certain tasks compared to cheaper alternatives.

David DeSanto, CEO of software company Anaconda, recently returned from a five-week trip around the world speaking to customers. He said that many were moving to self-host AI models — deploying their own within Amazon Bedrock or Google’s Vertex AI to have more control over the supply chain — or changing to open-source or open-weight models for a lot of their needs, since many such models have significantly improved on benchmarks as of late. Some companies also worry about the security of sending IP to a commercial frontier lab, so they only use ChatGPT or Claude models for “mission-critical applications,” he said.

“Everyone I spoke to had some version of this problem — their token usage has gone up, so their usage-based billing cost has gone up, or the tier they were on no longer has the same cap, and now they’re having to go to a more expensive tier to try to keep the same amount of usage per month as part of their flat rate,” DeSanto said.

Eve, a company that sells software to plaintiff lawyers, is constantly balancing quality and token costs, Madheswaran said — especially since Eve’s token usage has gone up 100x year-over-year to date. So it’s always switching between open-source models and varying ones from Anthropic and OpenAI.

But even a 1 percent regression in quality of output negatively impacts Eve’s customers “quite significantly,” Madheswaran said, which is why Eve spends a lot of internal resources tracking model quality. The company typically finds itself using the newer, more expensive reasoning models about 25–30% of the time, splitting the rest of its usage between Eve’s own open-source variants and smaller, cheaper models from leading labs. Madheswaran said the company has found that some cheap models are just as accurate as expensive ones, depending on the query.

“What open source is really doing is it’s putting pressure on these companies to make their cheaper models cheaper because their profit margins there are much, much better,” Madheswaran said.

“What open source is really doing is it’s putting pressure on these companies to make their cheaper models cheaper.”

Wisdom AI, which provides AI-powered data analysis, hasn’t had to pass on cost increases yet. The team is testing out how different models perform on different types of tasks, and then budgeting accordingly. Mazumdar said it’s been testing out Cerebras, which is popular for open-weight models, lately, “in anticipation of how expensive things will get” from the premier labs like OpenAI and Anthropic. “[Big AI companies] have been giving this away for free,” Mazumdar said. “What they’re trying to do is, the moment they sense there’s an enterprise at play, or there’s propensity to pay, they absolutely jack up the prices drastically.”

But he said there’s always a cost, especially on the coding front. “The reality is this: If you’re doing coding of any kind, then the open-source models simply don’t come close, and that’s the unfortunate reality of where we are today,” he said.

Box’s Levie believes the changes will play out over the next 24 months. He said the VC subsidized era of AI was likely necessary for growth — after all, if two companies with largely equal products are competing for the same customers, and one is offering a (subsidized) product at a lower price, the latter will obviously win out, at least in the short term. But now it’s time to build more efficiency into the system, and not everyone is going to survive it.

“The size of the market is so large that I think it actually will sort of all work out,” Levie said. “At an individual company level, you have to decide: Can you keep up with this flywheel, or are you going to be priced out based on an inability to raise capital or an inability to make the model more efficient for your tasks?”

Eve’s Madheswaran thinks the industry will soon move from focusing on the so-called “best” model to what works the best for a business’s personalized, niche use cases. “That’s my guess, and obviously I’m betting our entire company on it.”

Gartner’s Sommer likens the whole scenario to what he called the “stegosaurus paradox.” When scientists first discovered the stegosaurus fossil, he said, they didn’t understand how a large body could be supported by such a small head with a tiny mouth — and the theory they developed was that the stegosaurus would need to constantly be eating, and eating a highly nutritious diet.

“We see AI as kind of being the same deal,” Sommer said — for the stegosaurus (AI labs) to survive, then providers need to find more food for it (the entire global economy, not just the tech market) and it has to be highly nutritious, too (i.e., providers need to be able to earn a margin from it and stop subsidizing). If the stegosaurus paradox isn’t resolved, and the mouth is “too small for the body,” he said, it will lead to write-downs, falling valuations, dried-up financing, and a broad resetting of expectations for AI worldwide. Therefore, Sommer said, a sustainable business model “would require that genAI be infused in everything from billboards to checkout kiosks,” with providers taking a cut of all of those transactions.

“The free era was really a land grab — it’s a common strategy used by startups,” said Eve’s Madheswaran. “That’s just not a business model. You can’t do that for too long.”

Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.

Hayden Field

Leave a Reply Cancel reply

Related News

25 years later, is it time for a new iPod?

They Made D4vd a Star. Now They Want Him Convicted of Murder

Inside Microsoft’s wave of executive departures

Palantir Employees Are Starting to Wonder if They’re the Bad Guys