PROPEL: Training Task Generators at the Learnable Frontier
PROPEL trains a lightweight activation probe to predict solver pass rate.
TL;DR
- 01PROPEL trains a lightweight activation probe to predict solver pass rate.
- 02PROPEL reduces the cost of training task generators by replacing repeated solver rollouts with a probe that predicts solver pass rates from generator activations.
- 03The paper, submitted to arXiv on 10 Jun 2026 by Lorenz Wolf and six coauthors, frames solver time as the bottleneck for producing tasks that are valid and just hard enough to train current agents.
PROPEL reduces the cost of training task generators by replacing repeated solver rollouts with a probe that predicts solver pass rates from generator activations. The paper, submitted to arXiv on 10 Jun 2026 by Lorenz Wolf and six coauthors, frames solver time as the bottleneck for producing tasks that are valid and just hard enough to train current agents.
What is PROPEL and how does it work?
PROPEL is a solver-amortized framework that trains a lightweight activation probe on a one-time labeled corpus of generated tasks and solver outcomes, then uses that probe as a proxy for solve rate during generator optimization. The probe predicts a target-solver pass rate from a frozen generator reference model, which reduces generator evaluation to a single forward pass rather than repeated solver rollouts. The authors motivate this by noting that direct generator optimization requires repeated solver rollouts per candidate and that for software-engineering tasks a single rollout can take tens of minutes.
How much improvement does PROPEL deliver?
Across math, code, and software-engineering tasks at multiple model scales, PROPEL shifts generation toward the targeted solve rate, often roughly doubling the share of tasks at the learnable frontier. For coding, generated tasks at the learnable frontier increase from 10.1% to 20.0% for a Qwen2.5-3B-Instruct solver and from 5.3% to 12.6% for a Qwen2.5-7B-Instruct solver. For software-engineering (SWE) tasks, PROPEL raises the share at the targeted solve rate from 9.8% to 19.6% for Qwen3.5-27B on repositories not seen during training of the probe and generator. The paper presents these results across multiple experiments, and complements them with 30 pages of text, 9 figures, and 12 tables.
Why it matters
Solver time and solver availability constrain the supply of useful training tasks as agent capabilities rise. PROPEL changes where compute is spent: instead of running slow solvers repeatedly during generator training, teams can train a one-time probe and then optimize generators with cheap forward passes. That reduces wall-clock cost for pushing task difficulty to the learnable frontier and makes it practical to target specific solve rates for different solver sizes and domains.
How did the authors validate generality?
The paper evaluates PROPEL on multiple domains and model scales and explicitly reports results on repositories that were not seen during training of the probe and generator. The Qwen3.5-27B SWE result—9.8% before versus 19.6% after—demonstrates the method on unseen codebases. The authors also report gains for two Qwen2.5 instruct-model solvers on coding tasks, showing the method across solver sizes.
What to watch
Look for whether probes trained on a one-time labeled corpus maintain accuracy as generator architectures or solver policies change, and whether the approach scales to solvers with different failure modes. The paper’s next concrete signals will be wider replication on additional solver families and published probe generalization metrics across changing generator distributions.
The paper is available as arXiv:2606.18284 and lists Lorenz Wolf, Connor Watts, Roger Creus Castanyer, Geoffrey Bradway, Maxwill Lin, Augustine N. Mavor-Parker, and Matthew Daborn-Sargent as authors.
| Item | |||
|---|---|---|---|
| Qwen2.5-3B-Instruct (coding) | 10.1 | 20.0 | |
| Qwen2.5-7B-Instruct (coding) | 5.3 | 12.6 | |
| Qwen3.5-27B (SWE, unseen repos) | 9.8 | 19.6 |
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Open Source AIZhipu AI GLM-5.2: 1M-token context, closes gap with Opus 4.8
GLM-5.2 ships under the MIT license with a stable one-million-token context and scores 74.4% on FrontierSWE, one point behind Opus 4.8.
OpenAI: PRC-linked influence operations target US AI debates
OpenAI says PRC-linked campaigns are using AI to push narratives on U.S. tech debates, data centers, tariffs and false ChatGPT claims.
OpenAI: LSEG scales trusted AI, empowers 4,000 staff
LSEG uses OpenAI to scale trusted AI across its global business, accelerating insights, shrinking release cycles and empowering 4.
Industrial policy OpenAI proposes for the Intelligence Age
OpenAI published a people-first industrial policy on June 9, 2026, and opened a pilot grants program with fellowships.