Two AI Metrics Diverged: Bounded vs Unbounded Capabilities
Fogelson et al. show that whether frontier AI keeps a permanent lead depends on whether capabilities are measured with bounded or unbounded.
TL;DR
- 01Fogelson et al. show that whether frontier AI keeps a permanent lead depends on whether capabilities are measured with bounded or unbounded.
- 02Fogelson et al. build on Gundlach et al. (2025b) and provide a formal classification of performance metrics by their functional form in relation to training and inference compute.
- 03They show bounded performance metrics always favor "meek models inheriting the earth", while many unbounded metrics let frontier models grow their lead without limit.
The paper Two AI Metrics Diverged: Will it Make All the Difference?, submitted to arXiv on 1 Jul 2026 (arXiv:2607.00913), argues that whether frontier AI models permanently outstrip smaller-budget models depends on which performance metrics we choose. Authors Alex Fogelson, Zachary A. Brown, Hans Gundlach, Jayson Lynch and Neil Thompson present mathematical conditions classifying metrics as bounded or unbounded and show each class implies a different long-run distribution of capability.
What did the authors do and what did they find?
Fogelson et al. build on Gundlach et al. (2025b) and provide a formal classification of performance metrics by their functional form in relation to training and inference compute. They show bounded performance metrics always favor "meek models inheriting the earth", while many unbounded metrics let frontier models grow their lead without limit. The paper was accepted into the 2026 ICML Technical AI Governance Research Workshop and lays out tight mathematical conditions to decide which metrics push outcomes toward proliferation or concentration.
The authors also point to empirical patterns: conventional validation loss shows a shrinking gap between frontier and smaller models, but on other metrics frontier models keep widening their lead forever, illustrating the divergence in real measurements.
How do bounded and unbounded metrics differ in practice?
Bounded metrics cap performance at an upper limit, meaning improvements saturate as compute scales; under such metrics capability tends to diffuse to smaller, cheaper models. Unbounded metrics grow without a fixed ceiling as compute increases, so frontier compute confers an ever-growing advantage and concentrates capability with wealthy actors.
The paper emphasizes that many commonly used bounded metrics have closely related unbounded counterparts and vice versa, so the choice of measurement can flip the policy-relevant conclusion. The authors give domain examples where this choice matters: if software engineering, synthetic biology or rhetorical persuasiveness are unbounded when measured in the terms we care about, frontier-level capability will likely concentrate; if they are bounded, meek models will spread such capabilities more widely.
Why it matters
This classification changes what policy and governance should aim for: choosing or designing performance metrics is not a neutral technical detail, it determines whether capabilities are expected to centralize or proliferate. If validation loss is used as the benchmark, the shrinking gap could be taken as evidence that smaller actors will keep up. If instead decision-makers care about an unbounded measure of persuasive effectiveness or biological design quality, the same compute scaling implies growing concentration of power.
The paper makes the policy implication concrete: determining the apt metric in a domain is a prerequisite for policy, because bounded and unbounded metrics may suggest opposing responses. That shifts focus from raw compute and model size to the semantics of measurement and which real-world outcomes metrics actually capture.
What to watch
Watch whether benchmark designers, regulators and research groups adopt bounded or unbounded formulations for the capabilities they care about, and whether communities explicitly map bounded metrics to real-world utility. The next concrete signals will be the metrics chosen in high-profile benchmarks for domains such as software engineering and synthetic biology, and follow-on papers testing whether those metrics behave as the paper's math predicts.
Additional details and provenance: the paper is on arXiv as arXiv:2607.00913 (submitted 1 Jul 2026) with an arXiv DOI linked in the record, and was accepted into the 2026 ICML Technical AI Governance Research Workshop. Authors listed are Alex Fogelson, Zachary A. Brown, Hans Gundlach, Jayson Lynch and Neil Thompson.
Note: the paper stresses careful interpretation of performance metrics, observing that related metrics can flip boundedness; its core technical contribution is the set of mathematical conditions that determine which class a given metric falls into.
| Item | |||
|---|---|---|---|
| Bounded | Performance has an upper limit relative to compute | Smaller models can reach similar performance; meek models proliferate into the hands of many (authors: bounded metrics always do) | |
| Unbounded | Performance can grow without a fixed ceiling as compute increases | Frontier models grow their lead forever; capability likely concentrates with a few wealthy actors | |
| Validation loss (example) | Empirically shows a shrinking gap between frontier and smaller models | Would suggest proliferation under that specific metric, though other metrics can show the opposite |
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Model CompressionProcedural Memory Distillation: PMD boosts benchmarks
An arXiv paper submitted 1 Jul 2026 introduces Procedural Memory Distillation (PMD).
Unconventional AI Un-0: oscillator model promises 1,000x lower
Naveen Rao's startup released Un-0, an image model on an oscillator-based architecture aiming for 1,000x inference power savings.
Agentic evolution: physically constrained foundation models
A multi-agent engine uses an Evolutionary Knowledge Graph to evolve Q-Enhance and MoE-Salient-AQ.
CompressKV: KV-cache compression keeps 97% with 3%
Semantic-retrieval-guided framework CompressKV preserves over 97% of full-cache performance on LongBench using 3% of KV storage.