How Shopify built an AI stack that doesn’t care which models survive

Shopify built an LLM proxy that gives every engineer access to multiple AI providers — with automatic failover when any one of them goes down, changes, or disappears. When Claude Fable 5 shut down, Shopify’s engineers didn’t go into panic mode. The proxy shifted them to Claude Opus or GPT 5.5 automatically, without interrupting their workflows.

“Fable looks amazing; we used it of course,” Farhan Thawar, Shopify’s head of engineering, says in a new VentureBeat Beyond the Pilot podcast. “When a model comes and then it goes, or it could be as innocuous as an update, the proxy allows us to spray across the different providers,” Thawar says.

Shopify buys tokens in bulk and all users connect to models through its proxy, Thawar says. This gives his team access to reporting and failover; when there’s an availability issue with one provider, users can be “automatically, seamlessly” transferred to another.

Enterprises can learn from this example and consider how a disruption might affect their business, Thawar says. At the very least, they should establish a solid backup plan. It’s important to have a system that allows for movement across models so enterprises are not “super tied” to a specific provider.

Distillation is another important strategy.

With distillation, a student model learns from a teacher model and typically becomes specialized in a narrower task. These small language models (SLMs) can be more beneficial than generalized, off-the-shelf models in some circumstances. For instance, Shopify’s flagship AI assistant, Sidekick, which performs numerous specialized subtasks for merchants so they can “remove toil” from their day-to-day.

Using smaller distilled models can be faster and cheaper than more generalized models, Thawar says. In some cases they have proven to be 2x cheaper and faster; in more extreme cases 30x cheaper and faster, he says.

But “it isn’t just about cost and latency, which are big; it’s about accuracy,” Thawar says.

Engineers feed the UDP their teacher model, training data, evals, and a target model — say, Opus 4.8 distilling down to Qwen 3.5. The pipeline runs for about a day, then returns an evaluation showing what the fine-tuned model actually achieved on speed, cost, and accuracy for that subtask. If the tradeoff looks good, the engineer deploys it — no approval process required. Shopify’s internal platform, Tangle, lets anyone visualize the pipeline as it runs.

Thawar says his “dream” is to eventually not give the distillation pipeline a target model at all. Instead, users could provide the teacher model with data and evals and the directive: ‘Based on your learnings over time, I want you to look at a different class of model, different sizes, different types, and you tell me what the right distillation target is.’

“Maybe we’ll get surprised. Maybe it’ll be such a small model it could run on a phone,” Thawar says. “Other times, maybe it comes back and says, ‘There isn’t a way to distill this down to anything better than what we have at the frontier.’”

Moving away from “AI reflexivity” to “AI leverage”

Shopify users can apply whatever harness they want: Claude Code, Codex, Cursor, GitHub Copilot for VS Code. “We expose everyone to the different harnesses so they can get a feel for what may or may not work in their workflow.”

But the company also implemented a usage dashboard; this allows Thawar’s team to ask interesting questions around not just token spend, but: Who’s using the most expensive tokens? Who’s spending more time on reasoning? What types of models are being used, and what disciplines and levels?

Regarding the “tokenmaxxing” question, Shopify does have “circuit breakers” in place. If a user has a model running for a long time (say, 10 hours) and it’s consuming a lot of tokens, they will get pinged, “Did you mean to spend this?”

As Thawar explains, sometimes the reply is “Oh, absolutely.” Other times it’s: ‘Whoa, I didn’t know that was running in the background. I totally forgot about it. I’d rather stop it now.’

The ultimate goal, as Thawar describes it, is to move from “AI reflexivity” to “AI leverage,” and get people to really think deeply about where they can benefit most from AI in their workflows.

Listen to the full podcast to hear more about:

  • Shopify’s philosophy of building infrastructure before features. As Thawar puts it: “We’ve always built more infra. We will continue to always build more infra.”

  • How Shopify’s internal AI agent, River, creates a “substrate of information” across the company.

  • How Thawar’s OpenClaw agent figured out he was traveling from his calendar — and what that moment told him about where agents are actually headed.

You can also listen and subscribe to Beyond the Pilot on Spotify, Apple or wherever you get your podcasts.


Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top