R2D-RL: RoboCup 2D Soccer RL environment and benchmark
R2D-RL links RCSS2D and HELIOS clients to a Python MARL interface with shared-memory sync, EPV reward shaping, and an 11-vs-11 benchmark.
TL;DR
- 01R2D-RL links RCSS2D and HELIOS clients to a Python MARL interface with shared-memory sync, EPV reward shaping, and an 11-vs-11 benchmark.
- 02R2D-RL, a reinforcement learning environment for robot soccer, was submitted to arXiv on 17 Jun 2026 by Haobin Qin, Baofeng Zhang, Hidehisa Akiyama and Keisuke Fujii (arXiv:2606.18786).
- 03R2D-RL is a bridge between the competition-oriented RCSS2D server-client architecture and modern Python MARL workflows, exposing simulation features as a reinforcement learning environment.
R2D-RL, a reinforcement learning environment for robot soccer, was submitted to arXiv on 17 Jun 2026 by Haobin Qin, Baofeng Zhang, Hidehisa Akiyama and Keisuke Fujii (arXiv:2606.18786). It connects RoboCup 2D Soccer Simulation (RCSS2D) and HELIOS-based player clients to a Python multi-agent reinforcement learning interface using shared-memory communication and cycle-level synchronization.
What is R2D-RL?
R2D-RL is a bridge between the competition-oriented RCSS2D server-client architecture and modern Python MARL workflows, exposing simulation features as a reinforcement learning environment. The environment supports full-field and scenario-based training, configurable opponents, Base discrete and Hybrid parameterized action spaces, action masks, "expected possession value (EPV)-based reward shaping", and parallel execution, and it ships front-goal scenarios plus an 11-vs-11 full-field benchmark with baseline results.
The paper lists the core goals as easing integration of RCSS2D into Python toolchains and providing both scenario and full-match benchmarks. The submission file on arXiv is identified as arXiv:2606.18786 and was uploaded on 17 Jun 2026 (submission size 6,181 KB in the record).
How does R2D-RL connect RCSS2D to Python MARL workflows?
R2D-RL uses shared-memory communication and cycle-level synchronization to attach HELIOS-based player clients and RCSS2D to a Python interface, allowing step-level coordination between simulator and learning code. Shared memory passes simulator state and actions; cycle-level synchronization enforces the simulator loop timing and deterministic interaction.
The environment exposes multiple action-space choices, including a Base discrete option and a Hybrid parameterized option, and supports action masks so agents can avoid invalid moves. It also provides EPV-based reward shaping to enrich sparse match rewards. The authors supply front-goal scenarios for focused training and an 11-vs-11 full-field benchmark to evaluate full-match tactics; baseline results accompany those benchmarks.
Why it matters
R2D-RL makes a mature RoboCup platform accessible to Python-first MARL researchers by removing the friction of the competition-oriented server-client design. That lowers the barrier to training multi-agent policies on long-horizon, partially observable, cooperative-and-adversarial tasks such as 11-vs-11 soccer. The inclusion of EPV-based reward shaping and configurable action spaces tackles two central RL pain points: sparse rewards and large discrete-continuous action combinations.
The package also standardizes scenario-based evaluation alongside a full-field benchmark, which can help compare algorithms on both tactical subproblems and complete-match performance.
What to watch
Check the code repository linked in the paper for examples and replication; the arXiv record notes that code is available at the URL provided in the submission. Watch for follow-up papers or community baselines that adopt the 11-vs-11 benchmark and for benchmark results that provide detailed metrics beyond the initial baseline results the authors published.
References and notes
- Paper: "R2D-RL: A RoboCup 2D Soccer Environment for Multi-Agent Reinforcement Learning", Haobin Qin, Baofeng Zhang, Hidehisa Akiyama, Keisuke Fujii, arXiv:2606.18786, submitted 17 Jun 2026.
- The arXiv entry describes support for RCSS2D, HELIOS-based players, shared-memory communication, cycle-level synchronization, Base discrete and Hybrid parameterized action spaces, action masks, EPV-based reward shaping, parallel execution, front-goal scenarios, and an 11-vs-11 full-field benchmark with baseline results.
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Open Source AIZhipu AI GLM-5.2: 1M-token context, closes gap with Opus 4.8
GLM-5.2 ships under the MIT license with a stable one-million-token context and scores 74.4% on FrontierSWE, one point behind Opus 4.8.
OpenAI: PRC-linked influence operations target US AI debates
OpenAI says PRC-linked campaigns are using AI to push narratives on U.S. tech debates, data centers, tariffs and false ChatGPT claims.
OpenAI: LSEG scales trusted AI, empowers 4,000 staff
LSEG uses OpenAI to scale trusted AI across its global business, accelerating insights, shrinking release cycles and empowering 4.
Industrial policy OpenAI proposes for the Intelligence Age
OpenAI published a people-first industrial policy on June 9, 2026, and opened a pilot grants program with fellowships.