GLM-5.2 release: 1M-token context, long-horizon coding gains
GLM-5.2 delivers a solid 1M-token context, IndexShare cost cuts, effort-level coding modes and top open-source results on key coding.
TL;DR
- 01GLM-5.2 delivers a solid 1M-token context, IndexShare cost cuts, effort-level coding modes and top open-source results on key coding.
- 02GLM-5.2 arrives as a long-horizon workhorse, extending usable context to a solid 1M tokens while improving coding performance and inference efficiency.
- 03Z.AI says the model expands long-context training for coding-agent scenarios, introduces IndexShare to cut per-token FLOPs, and ships under an MIT license.
GLM-5.2 arrives as a long-horizon workhorse, extending usable context to a solid 1M tokens while improving coding performance and inference efficiency. Z.AI says the model expands long-context training for coding-agent scenarios, introduces IndexShare to cut per-token FLOPs, and ships under an MIT license.
What are the headline changes in GLM-5.2?
GLM-5.2 provides a solid 1M-token context, stronger coding capability with explicit effort-level control, an improved MTP layer for speculative decoding, and an MIT open-source license. It raises the maximum context length from 200K to 1M tokens, adds IndexShare to reduce per-token FLOPs by 2.9× at 1M context, and offers user-selectable effort levels to trade computation for higher performance.
Beyond raw context size, Z.AI expanded 1M-context training specifically for coding-agent trajectories: implementation-scale projects, automated research runs, performance optimization and complex debugging. The release emphasizes not just token count but sustained quality across long, messy agent traces.
How does GLM-5.2 perform on long-horizon and coding benchmarks?
GLM-5.2 ranks as the top open-source model on the long-horizon coding benchmarks cited, and it substantially outperforms GLM-5.1 on standard coding tests. On Terminal-Bench 2.1 GLM-5.2 scores 81.0 versus GLM-5.1 at 63.5, and on SWE-bench Pro GLM-5.2 scores 62.1 versus GLM-5.1 at 58.4. Against closed-source rivals, GLM-5.2 lands within a few points of Claude Opus 4.8 on Terminal-Bench 2.1 (Opus 4.8 is 85.0).
On long-horizon agent benchmarks, GLM-5.2 trails Opus 4.8 by only 1% on FrontierSWE, edges out GPT-5.5 by 1% and beats Opus 4.7 by 11% on the same benchmark. On PostTrainBench GLM-5.2 outperforms Opus 4.7 and GPT-5.5 and ranks second only to Opus 4.8. On the ultra-long-horizon SWE-Marathon it trails Opus 4.8 by 13% while remaining second only to the Opus series. Across these three benchmarks Z.AI positions GLM-5.2 as the highest-ranked open-source model.
How did Z.AI make 1M context practical and efficient?
GLM-5.2 uses IndexShare in its dense-sparse attention (DSA) and MTP components to reduce computation and training-inference mismatch, and it pairs architecture changes with inference-engine optimizations. IndexShare shares the indexer across every four sparse attention layers so topk indices from the first layer serve four layers, reducing indexer computation and topk operations and enabling training from 128K sequence lengths.
For speculative decoding the MTP layer was redesigned: IndexShare plus KV share and additional techniques improved acceptance length from a baseline 4.56 to 5.47, a 20% increase. Inference work focused on three areas: finer-grained memory management and parallelization built on LayerSplit to increase usable KV-cache, kernel optimizations to reduce context-length overhead, and CPU-side cache and scheduling improvements to reduce GPU pipeline stalls.
The post-training stack also changed. Z.AI used an infrastructure called slime to run agentic RL and OPD workflows, merging more than ten expert models into the final model in an OPD training run that took approximately two days. slime supports multiple rollout modes and routes training workloads into the serving side to accelerate rollout-to-production paths.
Why it matters
GLM-5.2 shifts the practical bottleneck for long prompts: context length is no longer just a model parameter but an engineering problem spanning indexers, KV-cache capacity and inference orchestration. By cutting per-token FLOPs (2.9× at 1M context), increasing speculative-decoding acceptance length by 20%, and optimizing KV-cache and CPU coordination, Z.AI is pushing long-horizon capability from a lab curiosity toward sustained agentic engineering work. The MIT license also makes GLM-5.2 immediately usable for open-source driven engineering projects.
What to watch
Measure adoption by watching which inference engines add the fine-grained KV-cache and LayerSplit-style optimizations Z.AI describes, and whether third-party benchmarks reproduce GLM-5.2’s Top open-source placement. Also track improvements on SWE-Marathon: Z.AI cites a 13% gap to Opus 4.8 there, so closing that gap would validate GLM-5.2’s long-horizon strategy.
| Item | |||||
|---|---|---|---|---|---|
| Terminal-Bench 2.1 | 81 | 63.5 | 85 | ||
| SWE-bench Pro | 62.1 | 58.4 | |||
| FrontierSWE (relative to GLM-5.2) | baseline | 1% higher | 11% lower | 1% lower | |
| MTP acceptance length (ablation) | final 5.47 (+20% vs baseline) | baseline 4.56 |
Written by The Brieftide · Source: Hugging Face
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Open Source AIZhipu AI GLM-5.2: 1M-token context, closes gap with Opus 4.8
GLM-5.2 ships under the MIT license with a stable one-million-token context and scores 74.4% on FrontierSWE, one point behind Opus 4.8.
OpenAI: PRC-linked influence operations target US AI debates
OpenAI says PRC-linked campaigns are using AI to push narratives on U.S. tech debates, data centers, tariffs and false ChatGPT claims.
OpenAI: LSEG scales trusted AI, empowers 4,000 staff
LSEG uses OpenAI to scale trusted AI across its global business, accelerating insights, shrinking release cycles and empowering 4.
Industrial policy OpenAI proposes for the Intelligence Age
OpenAI published a people-first industrial policy on June 9, 2026, and opened a pilot grants program with fellowships.