Model routing on AI is a problem for OpenAI and Anthropic


A new spending discipline is taking hold inside corporate America, as CFOs and boards start cracking down on inefficient artificial intelligence spending. The change has the potential to reshape the AI trade.

For the past two years, the playbook has been to default to the most powerful AI model and direct all queries through it, regardless of complexity. Now, with AI bills running far ahead of budgets, companies are starting to ask whether every task actually needs the frontier. Two leaders at the center of the AI buildout told CNBC this week that a solution is emerging: model routing.

What is model routing?

Routing is a tool that matches the job to the model, sending hard problems to the expensive frontier models and easy ones to cheaper, faster alternatives.

Scott Wu, CEO of Cognition, which makes the coding agent Devin, said the gains on routine work are enormous. For a lot of the boilerplate work, he said, companies can get five to 10 times better cost efficiency using models that are still good enough for the task.

Most companies today aren’t routing at all. Glean CEO Arvind Jain has estimated that roughly 95% of enterprise AI usage is still running on the most expensive frontier models, even for tasks that cheaper alternatives could easily handle. Wu gave the example of asking a model to name the third U.S. president. Each one, no matter how expensive, will tell you it was Thomas Jefferson. 

Arvind Jain, CEO of Glean, on SaaS Monster stage during day one of Web Summit 2022 at the Altice Arena in Lisbon, Portugal, on Nov. 2, 2022.

Harry Murphy | Sportsfile | Getty Images

Vendors under pressure

AI companies recognize the anxiety.

Cognition announced what it calls an AI productivity guarantee. if Devin delivers less engineering value than a customer is paying for, Cognition will fund usage up to $10 million until it’s up to par. Wu framed it as a way to cut through the noise on a metric that’s dogged the industry: return on investment.

Rather than measuring activity like tokens consumed or lines of code, Wu said, Cognition estimates the number of human engineering hours its agent actually saves and backs that estimate with a refund. You can spend billions of tokens and be doing nothing with it, he said. Companies should be striving for output, not activity.

If companies begin steering easy, high-volume work to cheaper open-source models out of China or elsewhere, then OpenAI and Anthropic stop getting paid for every task. They only get the more complex jobs. Both companies have built their businesses, and the IPO expectations around them, on the assumption of enormous demand at premium prices.

Patel doesn’t think that sinks the frontier labs, and says that cutting-edge technology will remain valuable. But he sees the pricing model shifting. The labs will have to get more efficient with how the models are used rather than simply charging more, which Patel predicts will lead to a concerted industry effort.

The question had been whether companies would keep spending as their AI bills climbed. It now appears that many will simply find a way to spend smartly. Pricing power is shifting from the companies selling premium AI toward the companies buying it.

The frontier labs will still command a premium for the hardest work. But how much of the market is the other stuff? The answer could go a long way to determining the valuations of the leading AI companies.

Choose CNBC as your preferred source on Google and never miss a moment from the most trusted name in business news.


Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top