Open Source AI4 min read

Qwen3: open-source 235B Instruct, 1T 'max' beats Kimi and DeepSeek

Qwen3 is Apache License v2.0 open-weight family spanning 0.6B to 480B MoE; its 235B-Instruct ranks 8 on LMArena and a closed-source 1T.

The Brieftide

TL;DR

  • 01Qwen3 is Apache License v2.0 open-weight family spanning 0.6B to 480B MoE; its 235B-Instruct ranks 8 on LMArena and a closed-source 1T.
  • 02Qwen3 is published under Apache License v2.0, making the open-weight family developer- and commercially friendly, according to the author.
  • 03The family covers a wide spectrum of sizes and architectures: dense models start as small as 0.6B parameters, and the line-up includes mixture-of-experts models up to 480B parameters.

Qwen3, an open-weight model family, was initially released in May and updated in July, and on September 5 it released a 1T parameter "max" variant on its platform that beats Kimi K2, DeepSeek 3.1, and Claude Opus 4 on all major benchmarks; however, that 1T variant is closed-source for now.

What Qwen3 offers now

Qwen3 is published under Apache License v2.0, making the open-weight family developer- and commercially friendly, according to the author. The family covers a wide spectrum of sizes and architectures: dense models start as small as 0.6B parameters, and the line-up includes mixture-of-experts models up to 480B parameters. The open-weight 235B-Instruct variant is singled out for performance: as of the article date, that 235B-Instruct is ranked 8 on the LMArena leaderboard, tied with the proprietary Claude Opus 4.

The article notes the two open-weight LLMs ranked higher than Qwen3 are DeepSeek 3.1 and Kimi K2, described respectively as 3x and 4x larger than the 235B variant. On September 5, Qwen3 added a 1T parameter "max" variant on its platform that outperforms Kimi K2, DeepSeek 3.1, and Claude Opus 4 across major benchmarks; the developer chose to keep that 1T model closed-source for the time being.

Hands-on implementation and orientation

The author provides a from-scratch, code-first walkthrough of Qwen3 implemented in pure PyTorch. The write-up is explicit about being long and code-heavy, aiming to explain the model building blocks in executable form rather than only conceptual diagrams. Figure 1 in the piece previews both the dense and mixture-of-experts architectures that the hands-on code reimplements.

The write-up builds on prior comparative and conceptual work by the same author: The Big LLM Architecture Comparison and From GPT-2 to gpt-oss: Analyzing the Architectural Advances. The practical walkthrough is aimed at readers who want to understand and adapt the underlying components for experiments or projects, and it is published as paid content.

Why it matters

Qwen3’s combination of an Apache License v2.0 open-weight family and competitive leaderboard performance reinforces the role of permissively licensed models in practical developer and commercial use. The presence of many sizes from 0.6B dense to 480B MoE allows teams to trade compute and latency for capability, while the 235B-Instruct ranking shows an open-weight model matching proprietary systems on public leaderboards.

At the same time, the September 5 release of a closed-source 1T "max" variant that tops major benchmarks highlights a growing split between open-weight releases and closed high-capacity variants that are platform-limited. That split matters for organisations weighing reproducibility and full-stack control against the immediate performance advantages of the largest models.

What to watch

Watch whether the 1T "max" variant is made available under an open license or remains closed-source, and monitor LMArena and other benchmark leaderboards for whether the 1T variant’s dominance is reflected across public rankings. Also track how the community adopts the documented PyTorch implementations for the dense and mixture-of-experts variants across compute budgets.

Qwen3 family: dense, MoE and the 1T 'max' variant
Qwen3 family0.6B dense235B-Instruct (ranked 8 on LMArena)480B parameter Mixture-of-Experts1T parameter "max" (closed-source, Sept 5)
Advertisement

Written by The Brieftide · Source: Ahead of AI

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click
Advertisement