Sony AI's coachable agents: UVFA gameplay framework (2026)
A 49-author Sony AI paper (arXiv:2607.00642) describes a UVFA-based coaching framework used in Horizon Forbidden West.
TL;DR
- 01A 49-author Sony AI paper (arXiv:2607.00642) describes a UVFA-based coaching framework used in Horizon Forbidden West.
- 02The paper, filed under cs.AI and cs.LG and available as an 8,673 KB PDF, demonstrates the approach in Horizon Forbidden West, Gran Turismo, and an open-source humanoid walking domain.
- 03Sony AI built a coaching framework that produces agents which respond to user-specified "styles" while still completing their core tasks.
Sony AI submitted a 49-author paper titled "Coachable agents for interactive gameplay" to arXiv on 1 Jul 2026 (arXiv:2607.00642), presenting a framework that lets end users choose agent behavior at run time. The paper, filed under cs.AI and cs.LG and available as an 8,673 KB PDF, demonstrates the approach in Horizon Forbidden West, Gran Turismo, and an open-source humanoid walking domain.
What did Sony AI build?
Sony AI built a coaching framework that produces agents which respond to user-specified "styles" while still completing their core tasks. The team combines universal value function approximators (UVFAs) with selected training scenarios, learning algorithms, and data augmentation to create agents that demonstrate coherent stylistic variation across car racing, stylized game combat, and humanoid walking.
The paper argues that conventional reinforcement learning typically yields one near-optimal behavior, so the framework focuses on enabling multiple, controllable behaviors. Demonstrations in two AAA titles, Horizon Forbidden West and Gran Turismo, plus an open-source humanoid test domain, show the same method working across diverse control problems.
How does the coaching framework work?
The framework trains agents using UVFAs together with carefully chosen training scenarios, learning algorithms, and data augmentation so style can be specified and adjusted at run time. UVFAs provide value predictions conditioned on goals or style parameters, the training scenarios expose the model to different stylistic trade-offs, and data augmentation and learning choices improve robustness.
Concretely, the paper describes combining these components so a single model encodes both task performance and style variation. At run time an end user can select the final behavior, enabling flexible control over performance without retraining. The authors report coherent adherence to style requests in all three tested domains while maintaining the agents ability to satisfy the main task.
Why it matters
Reinforcement learning agents that can only execute a single near-optimal strategy limit interactivity and designer control. By enabling multiple stylistic behaviors within one policy, the framework opens practical options for game designers, players, and robotics practitioners who want to influence how a task is performed without breaking core performance. Demonstrating the same approach in AAA titles and a humanoid domain suggests the methods address both high-fidelity entertainment environments and physics-based control problems.
Sony AI also packaged this work as a large collaborative effort, with authors based in Sony AI Zurich, North America, and Tokyo, which signals internal cross-domain investment in interactive, controllable agents. The paper frames "styles" as explicit modifications of a core task, and shows a path to put those controls in the hands of end users at run time.
What to watch
Check the paper's arXiv entry for linked code, data, and demos; the submission page lists sections for Code, Data and Media associated with the article. Future indicators of impact will be released artifacts tied to the submission, and any follow-up results applying the same UVFA-plus-augmentation approach to live player-facing tools or broader robotics benchmarks.
Additional facts: the paper appears as arXiv:2607.00642 (cs.AI, cs.LG), was submitted on 1 Jul 2026, and the PDF on arXiv is 8,673 KB. The author list begins with Roberto Capobianco and notes 48 other authors across Sony AI locations.
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Coding AgentsAgent4cs: Multi-agent code summarization, up to 38% gains
Agent4cs uses three cooperating agents to summarize large hierarchical codebases.
llm-coding-agent 0.1a0: GPT-5.5 coding agent and tools
Simon Willison published llm-coding-agent 0.1a0 on 2nd July 2026, a PyPI slop-alpha that exposes file.
Mnemosyne agentic transaction system: validation & repair
Mnemosyne implements Agentic Transaction Processing (ATP) to validate AI-generated actions under an executable constraint set C and repair.
Autoformalization: Agent Instructions to Policy-as-Code
A pipeline that uses an LLM generator-critic loop to turn prompts and policy text into Cedar policies, submitted 25 Jun 2026.