Skill-Guided Continuation Distillation raises GUI agents' success
SGCD mixes skill-guided continuations with expert trajectories and raised base models' success from the low-30% range to over 50% on.
TL;DR
- 01SGCD mixes skill-guided continuations with expert trajectories and raised base models' success from the low-30% range to over 50% on.
- 02The paper shows SGCD improves the success rate of three base models from the low-30% range to over 50% on the OSWorld-Verified benchmark.
- 03SGCD extracts skills from both successful and failed rollouts.
Zhimin Fan and 11 coauthors submitted a paper titled "Skill-Guided Continuation Distillation for GUI Agents" to arXiv on 17 Jun 2026, proposing SGCD, an iterative self-improvement framework for GUI agents. The paper shows SGCD improves the success rate of three base models from the low-30% range to over 50% on the OSWorld-Verified benchmark.
What is Skill-Guided Continuation Distillation and how does it work?
SGCD is an iterative self-improvement framework that deliberately exposes the policy to policy-induced off-trajectory states, then supplies supervision for those states using skill-guided continuations. The method first runs the plain policy without skill guidance for a few steps to reach realistic off-trajectory states, then a skill-guided policy completes the task from those states and produces successful continuations which are mixed with expert trajectories for supervision.
SGCD extracts skills from both successful and failed rollouts. The authors describe the collected skill elements as "Continuation Plans, Critical Targets, Failure Traps, and Success Criteria." Those elements serve as the guidance that lets a skill-guided policy finish tasks reliably from off-trajectory states and thereby provide new training signal where expert trajectories are absent.
How much improvement does SGCD deliver on GUI agents?
SGCD raised three base models' success rates on OSWorld-Verified from the low-30% range to over 50%, demonstrating measurable gains across different architectures. The paper frames this as closing a supervision gap: when a learned policy deviates from expert trajectories at execution time it encounters states with no expert demonstration; SGCD supplies continuations from those states so the policy can learn correct actions there.
The reported improvement is tied to mixing the produced successful continuations with original expert trajectories, giving direct supervision on policy-induced off-trajectory states that the experts did not cover. The authors position SGCD as generally applicable, since they apply it to three base models and observe the same directional gains on the OSWorld-Verified benchmark.
Why does this matter?
GUI agents trained only by behavior cloning depend on expert trajectories, leaving them blind to states created by their own mistakes. SGCD attacks that fundamental training mismatch by creating realistic off-trajectory states and then teaching the agent how to recover. That addresses a core failure mode of closed-loop agent execution, shifting the training distribution toward the states the agent actually visits during deployment.
Practically, moving success rates from the low-30% range to over 50% on a benchmark such as OSWorld-Verified implies fewer catastrophic failures in end-to-end GUI tasks, and a smaller gap between offline expert data and online behavior. For teams building interactive agents, SGCD offers a procedural way to harvest corrective continuations without requiring large additional expert demonstrations.
What to watch
Look for code or external evaluations linked from the paper's arXiv entry and for SGCD applied to larger or different GUI benchmarks beyond OSWorld-Verified. Verification that the same continuation extraction scheme scales to more complex interfaces or to additional base models will determine whether the gains generalize across modalities and real-world interactive systems.
References and concrete facts drawn from the paper: the method name Skill-Guided Continuation Distillation (SGCD); the procedure that runs a plain policy for a few steps to reach off-trajectory states then uses a skill-guided policy for successful continuations; the skill components named "Continuation Plans, Critical Targets, Failure Traps, and Success Criteria"; and the empirical claim that SGCD "improves the success rate of three base models from the low-30% range to over 50%" on OSWorld-Verified.
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Coding AgentsNVIDIA ENPIRE: AI coding agents teach robots GPU installs
ENPIRE let AI coding agents train robot arms to cut zip ties and insert GPUs.
CODA-BENCH benchmark: testing code agents on data tasks
CODA-BENCH places agents in a Kaggle-based Linux sandbox with 1,009 tasks across 31 communities and an average of 980 files per task.
SWE-Explore: benchmark shows AI coding agents miss key lines
SWE-Explore isolates code search from repair and finds agents hit the right files but cover only 14–19% of the lines that matter.
OpenAI acquires Ona to add persistent agents to Codex
The deal brings Ona's cloud development environments into Codex so agents can continue tasks for hours or days in customers' clouds.