AI Infrastructure4 min read

Coinbase switches to Chinese AI models, halves AI spending

Coinbase now runs GLM 5.2 and Kimi 2.7, pays half what it used to while token use climbs, and raised caching hit from 5 to 60 percent.

The Brieftide

TL;DR

  • 01Coinbase now runs GLM 5.2 and Kimi 2.7, pays half what it used to while token use climbs, and raised caching hit from 5 to 60 percent.
  • 02Coinbase has moved production workloads to Chinese AI models, including GLM 5.2 and Kimi 2.7, and says it is paying half what it used to even as token usage rises.
  • 03The company still lets developers pick models, but Brian Armstrong says 91 percent of developers never hit their old limits.

Coinbase has moved production workloads to Chinese AI models, including GLM 5.2 and Kimi 2.7, and says it is paying half what it used to even as token usage rises. The company still lets developers pick models, but Brian Armstrong says 91 percent of developers never hit their old limits.

What did Coinbase change, exactly?

Coinbase now runs models such as GLM 5.2 and Kimi 2.7 and reports cutting its AI spending in half while token use rises. The company’s CEO Brian Armstrong told staff the switch to these models reduced costs even as the platform consumed more tokens; Armstrong also noted that 91 percent of developers never hit their old usage limits.

Beyond naming the models it runs, the company retains developer choice. The change is framed as a move to cheaper Chinese alternatives while preserving flexibility for teams that need different models.

How does Coinbase route requests and save money?

Coinbase uses an automatic routing system that picks the best model for each request based on task, price, and caching potential, and improved caching lifted the hit rate from 5 percent to 60 percent. The routing system evaluates cost and task fit, and better caching was a major lever: the caching hit rate rose from 5 to 60 percent after optimizations.

Developers were also instructed to keep context lean and start fresh sessions for new tasks, part of what the article calls context engineering. Coinbase makes each developer’s usage visible without capping it, and pairs that visibility with the expectation expressed by Armstrong: "The more you spend on AI, the more impact we expect."

Who else is making similar moves?

Other companies and founders are adopting Chinese models as cheaper options to Western labs. The CEO of startup Lindy recently moved to Deepseek v4, and Snowflake is testing Chinese models as cheaper alternatives to OpenAI and Anthropic. The article frames these shifts as part of broader pricing pressure on Western AI labs and notes a brewing price war between OpenAI and Anthropic.

OpenAI’s own product moves are mentioned: GPT-5.6-Sol costs the same as GPT-5.5 but is promoted as more token-efficient than Claude Fable and Mythos, and OpenAI is offering two weaker 5.6 variants at much lower prices.

Why it matters

These shifts create a pricing stress test for Western AI labs and could affect growth expectations tied to upcoming IPOs. By adopting cheaper Chinese models and squeezing costs through routing and caching, companies like Coinbase force a comparison on price and efficiency that Western providers must respond to; the article explicitly links those moves to pressure on OpenAI and Anthropic and to questions about the growth numbers labs need to justify their funding.

What to watch

Look for how Western providers respond on price and token efficiency, and whether the reported OpenAI pricing changes broaden. Also watch adoption signals from large platform users: Snowflake’s tests and other public moves to Chinese models will indicate whether the pricing pressure becomes widespread.

Advertisement

Written by The Brieftide · Source: The Decoder

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click
Advertisement