PreAct: Computer-using agents that replay tasks 8.5–13x faster
PreAct compiles successful UI runs into small state-machine programs that replay without per-step language-model calls.
TL;DR
- 01PreAct compiles successful UI runs into small state-machine programs that replay without per-step language-model calls.
- 02Replay runs contain no per-step language-model calls, and PreAct returns control to the agent whenever the screen diverges from the program's expected state.
- 03PreAct turns a successful, agent-driven interaction into a compact state-machine program that checks the screen at each state and executes transitions that act.
PreAct, a system described by Bojie Li in a paper submitted 16 Jun 2026, compiles successful UI runs into tiny state-machine programs so computer-using agents can replay previously solved tasks 8.5–13x faster. Replay runs contain no per-step language-model calls, and PreAct returns control to the agent whenever the screen diverges from the program's expected state.
What is PreAct and how does it work?
PreAct turns a successful, agent-driven interaction into a compact state-machine program that checks the screen at each state and executes transitions that act. The first time an agent solves a task, PreAct compiles that run into program-states that verify screen contents and transitions that perform the clicks and typing, then on later runs replays the program directly rather than invoking the agent.
Replay is guarded: at each step PreAct checks that the screen matches the program's expectations, and it hands control back to the agent if something is off. The system also enforces a store-time validation discipline: a freshly compiled program enters the store only if, when re-run from a clean state, an independent evaluator confirms it solved the task. This prevents programs that replay to their last step yet leave the task undone from entering the store.
How much faster is replay and where did PreAct help?
Replayed runs execute 8.5–13x faster than invoking the agent step by step, because replay avoids per-step language-model calls. The paper evaluates PreAct across three benchmarks — a mobile benchmark, a desktop benchmark, and a web benchmark — and reports that the store-time check produces net improvements worth 1.75–2.6 tasks per benchmark in the same direction on all three.
When no stored program fits a new run, PreAct falls back to exploring the task afresh, and that fallback brings PreAct in line with a strong record-and-replay baseline. The authors also tested selector mechanisms and runtime settings and report that prompt wording, runtime guardrails, and whether a language model or a plain embedding retriever selects which program to reuse did not materially affect the outcome.
Why it matters
PreAct addresses a practical inefficiency: current computer-using agents re-read and re-reason every step on repeated tasks, paying the full language-model cost each time. By compiling a successful run into a replayable program that verifies screens and only invokes the agent on mismatch, PreAct cuts per-run cost and latency, while its store-time validation guards against accumulating faulty programs. That combination targets both speed and reliability for repeated UI automation across mobile, desktop, and web contexts.
What to watch
Look for follow-up evaluations that publish the benchmarks and program-store behavior on diverse, real-world task sets, and for any released code or data that reproduces the 8.5–13x replay speedups and the 1.75–2.6 tasks-per-benchmark gains. Also watch whether future work refines the store validation policy or integrates different program selection heuristics beyond the embedding retriever and language-model-based selectors the paper tested.
Bibliographic note: the paper, titled "PreAct: Computer-Using Agents that Get Faster on Repeated Tasks," is authored by Bojie Li and was submitted to arXiv on 16 Jun 2026 (arXiv:2606.17929).
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Coding AgentsData2Story: CSV-to-article pipeline with seven AI agents
A Claude Code skill runs seven specialist agents to turn a CSV into a verifiable, interactive news article with an Inspector panel.
Vibe Coding: AI evaluation for greenfield software engineering
Callum Barbour's arXiv paper tests 'vibe coding' on isolated Python greenfield tasks using a custom evaluation suite.
CODA-BENCH benchmark: testing code agents on data tasks
CODA-BENCH places agents in a Kaggle-based Linux sandbox with 1,009 tasks across 31 communities and an average of 980 files per task.
SWE-Explore: benchmark shows AI coding agents miss key lines
SWE-Explore isolates code search from repair and finds agents hit the right files but cover only 14–19% of the lines that matter.