Model CompressionJuly 2, 20265 min read

Two AI Metrics Diverged: Bounded vs Unbounded Capabilities

Fogelson et al. show that whether frontier AI keeps a permanent lead depends on whether capabilities are measured with bounded or unbounded.

The BrieftideJuly 2, 2026

TL;DR

01Fogelson et al. show that whether frontier AI keeps a permanent lead depends on whether capabilities are measured with bounded or unbounded.
02Fogelson et al. build on Gundlach et al. (2025b) and provide a formal classification of performance metrics by their functional form in relation to training and inference compute.
03They show bounded performance metrics always favor "meek models inheriting the earth", while many unbounded metrics let frontier models grow their lead without limit.

The paper Two AI Metrics Diverged: Will it Make All the Difference?, submitted to arXiv on 1 Jul 2026 (arXiv:2607.00913), argues that whether frontier AI models permanently outstrip smaller-budget models depends on which performance metrics we choose. Authors Alex Fogelson, Zachary A. Brown, Hans Gundlach, Jayson Lynch and Neil Thompson present mathematical conditions classifying metrics as bounded or unbounded and show each class implies a different long-run distribution of capability.

What did the authors do and what did they find?

Fogelson et al. build on Gundlach et al. (2025b) and provide a formal classification of performance metrics by their functional form in relation to training and inference compute. They show bounded performance metrics always favor "meek models inheriting the earth", while many unbounded metrics let frontier models grow their lead without limit. The paper was accepted into the 2026 ICML Technical AI Governance Research Workshop and lays out tight mathematical conditions to decide which metrics push outcomes toward proliferation or concentration.

The authors also point to empirical patterns: conventional validation loss shows a shrinking gap between frontier and smaller models, but on other metrics frontier models keep widening their lead forever, illustrating the divergence in real measurements.

How do bounded and unbounded metrics differ in practice?

Bounded metrics cap performance at an upper limit, meaning improvements saturate as compute scales; under such metrics capability tends to diffuse to smaller, cheaper models. Unbounded metrics grow without a fixed ceiling as compute increases, so frontier compute confers an ever-growing advantage and concentrates capability with wealthy actors.

The paper emphasizes that many commonly used bounded metrics have closely related unbounded counterparts and vice versa, so the choice of measurement can flip the policy-relevant conclusion. The authors give domain examples where this choice matters: if software engineering, synthetic biology or rhetorical persuasiveness are unbounded when measured in the terms we care about, frontier-level capability will likely concentrate; if they are bounded, meek models will spread such capabilities more widely.

Why it matters

This classification changes what policy and governance should aim for: choosing or designing performance metrics is not a neutral technical detail, it determines whether capabilities are expected to centralize or proliferate. If validation loss is used as the benchmark, the shrinking gap could be taken as evidence that smaller actors will keep up. If instead decision-makers care about an unbounded measure of persuasive effectiveness or biological design quality, the same compute scaling implies growing concentration of power.

The paper makes the policy implication concrete: determining the apt metric in a domain is a prerequisite for policy, because bounded and unbounded metrics may suggest opposing responses. That shifts focus from raw compute and model size to the semantics of measurement and which real-world outcomes metrics actually capture.

What to watch

Watch whether benchmark designers, regulators and research groups adopt bounded or unbounded formulations for the capabilities they care about, and whether communities explicitly map bounded metrics to real-world utility. The next concrete signals will be the metrics chosen in high-profile benchmarks for domains such as software engineering and synthetic biology, and follow-on papers testing whether those metrics behave as the paper's math predicts.

Additional details and provenance: the paper is on arXiv as arXiv:2607.00913 (submitted 1 Jul 2026) with an arXiv DOI linked in the record, and was accepted into the 2026 ICML Technical AI Governance Research Workshop. Authors listed are Alex Fogelson, Zachary A. Brown, Hans Gundlach, Jayson Lynch and Neil Thompson.

Note: the paper stresses careful interpretation of performance metrics, observing that related metrics can flip boundedness; its core technical contribution is the set of mathematical conditions that determine which class a given metric falls into.

How metric class changes long-run capability outcomes

Item
Bounded	Performance has an upper limit relative to compute	Smaller models can reach similar performance; meek models proliferate into the hands of many (authors: bounded metrics always do)
Unbounded	Performance can grow without a fixed ceiling as compute increases	Frontier models grow their lead forever; capability likely concentrates with a few wealthy actors
Validation loss (example)	Empirically shows a shrinking gap between frontier and smaller models	Would suggest proliferation under that specific metric, though other metrics can show the opposite

Written by The Brieftide · Source: arXiv

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

FreeOne email a dayEvery claim sourcedUnsubscribe in one click

Continue reading

Procedural Memory Distillation: PMD boosts benchmarks

An arXiv paper submitted 1 Jul 2026 introduces Procedural Memory Distillation (PMD).

The BrieftideDAILY BRIEF

Unconventional AI Un-0: oscillator model promises 1,000x lower

Naveen Rao's startup released Un-0, an image model on an oscillator-based architecture aiming for 1,000x inference power savings.

The BrieftideDAILY BRIEF

Agentic evolution: physically constrained foundation models

A multi-agent engine uses an Evolutionary Knowledge Graph to evolve Q-Enhance and MoE-Salient-AQ.

The BrieftideDAILY BRIEF

CompressKV: KV-cache compression keeps 97% with 3%

Semantic-retrieval-guided framework CompressKV preserves over 97% of full-cache performance on LongBench using 3% of KV storage.