Coinbase switches to Chinese AI models, halves AI spending
Coinbase now runs GLM 5.2 and Kimi 2.7, pays half what it used to while token use climbs, and raised caching hit from 5 to 60 percent.
TL;DR
- 01Coinbase now runs GLM 5.2 and Kimi 2.7, pays half what it used to while token use climbs, and raised caching hit from 5 to 60 percent.
- 02Coinbase has moved production workloads to Chinese AI models, including GLM 5.2 and Kimi 2.7, and says it is paying half what it used to even as token usage rises.
- 03The company still lets developers pick models, but Brian Armstrong says 91 percent of developers never hit their old limits.
Coinbase has moved production workloads to Chinese AI models, including GLM 5.2 and Kimi 2.7, and says it is paying half what it used to even as token usage rises. The company still lets developers pick models, but Brian Armstrong says 91 percent of developers never hit their old limits.
What did Coinbase change, exactly?
Coinbase now runs models such as GLM 5.2 and Kimi 2.7 and reports cutting its AI spending in half while token use rises. The company’s CEO Brian Armstrong told staff the switch to these models reduced costs even as the platform consumed more tokens; Armstrong also noted that 91 percent of developers never hit their old usage limits.
Beyond naming the models it runs, the company retains developer choice. The change is framed as a move to cheaper Chinese alternatives while preserving flexibility for teams that need different models.
How does Coinbase route requests and save money?
Coinbase uses an automatic routing system that picks the best model for each request based on task, price, and caching potential, and improved caching lifted the hit rate from 5 percent to 60 percent. The routing system evaluates cost and task fit, and better caching was a major lever: the caching hit rate rose from 5 to 60 percent after optimizations.
Developers were also instructed to keep context lean and start fresh sessions for new tasks, part of what the article calls context engineering. Coinbase makes each developer’s usage visible without capping it, and pairs that visibility with the expectation expressed by Armstrong: "The more you spend on AI, the more impact we expect."
Who else is making similar moves?
Other companies and founders are adopting Chinese models as cheaper options to Western labs. The CEO of startup Lindy recently moved to Deepseek v4, and Snowflake is testing Chinese models as cheaper alternatives to OpenAI and Anthropic. The article frames these shifts as part of broader pricing pressure on Western AI labs and notes a brewing price war between OpenAI and Anthropic.
OpenAI’s own product moves are mentioned: GPT-5.6-Sol costs the same as GPT-5.5 but is promoted as more token-efficient than Claude Fable and Mythos, and OpenAI is offering two weaker 5.6 variants at much lower prices.
Why it matters
These shifts create a pricing stress test for Western AI labs and could affect growth expectations tied to upcoming IPOs. By adopting cheaper Chinese models and squeezing costs through routing and caching, companies like Coinbase force a comparison on price and efficiency that Western providers must respond to; the article explicitly links those moves to pressure on OpenAI and Anthropic and to questions about the growth numbers labs need to justify their funding.
What to watch
Look for how Western providers respond on price and token efficiency, and whether the reported OpenAI pricing changes broaden. Also watch adoption signals from large platform users: Snowflake’s tests and other public moves to Chinese models will indicate whether the pricing pressure becomes widespread.
Written by The Brieftide · Source: The Decoder
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in AI InfrastructureAI power use strains grids, data centers and AWS demand
Volatile power draw from AI workloads, including at AWS facilities, is increasing demand patterns that stress the electrical grid.
Anthropic launches Claude Science: AI workspace for researchers
A macOS and Linux workbench that bundles dozens of scientific databases and tools, a verification agent, and GPU scaling for lab-held data.
IEEE launches virtual training course on large language models
IEEE is offering a virtual training course that teaches engineers to use large language models as reasoning engines in development.
AI4SE and SE4AI: A decade review of AI in systems engineering
H. Sinan Bank, Daniel R. Herber and Thomas Bradley map three research phases and assess 1.