Open Source AI4 min read

R2D-RL: RoboCup 2D Soccer RL environment and benchmark

R2D-RL links RCSS2D and HELIOS clients to a Python MARL interface with shared-memory sync, EPV reward shaping, and an 11-vs-11 benchmark.

The Brieftide

TL;DR

  • 01R2D-RL links RCSS2D and HELIOS clients to a Python MARL interface with shared-memory sync, EPV reward shaping, and an 11-vs-11 benchmark.
  • 02R2D-RL, a reinforcement learning environment for robot soccer, was submitted to arXiv on 17 Jun 2026 by Haobin Qin, Baofeng Zhang, Hidehisa Akiyama and Keisuke Fujii (arXiv:2606.18786).
  • 03R2D-RL is a bridge between the competition-oriented RCSS2D server-client architecture and modern Python MARL workflows, exposing simulation features as a reinforcement learning environment.

R2D-RL, a reinforcement learning environment for robot soccer, was submitted to arXiv on 17 Jun 2026 by Haobin Qin, Baofeng Zhang, Hidehisa Akiyama and Keisuke Fujii (arXiv:2606.18786). It connects RoboCup 2D Soccer Simulation (RCSS2D) and HELIOS-based player clients to a Python multi-agent reinforcement learning interface using shared-memory communication and cycle-level synchronization.

What is R2D-RL?

R2D-RL is a bridge between the competition-oriented RCSS2D server-client architecture and modern Python MARL workflows, exposing simulation features as a reinforcement learning environment. The environment supports full-field and scenario-based training, configurable opponents, Base discrete and Hybrid parameterized action spaces, action masks, "expected possession value (EPV)-based reward shaping", and parallel execution, and it ships front-goal scenarios plus an 11-vs-11 full-field benchmark with baseline results.

The paper lists the core goals as easing integration of RCSS2D into Python toolchains and providing both scenario and full-match benchmarks. The submission file on arXiv is identified as arXiv:2606.18786 and was uploaded on 17 Jun 2026 (submission size 6,181 KB in the record).

How does R2D-RL connect RCSS2D to Python MARL workflows?

R2D-RL uses shared-memory communication and cycle-level synchronization to attach HELIOS-based player clients and RCSS2D to a Python interface, allowing step-level coordination between simulator and learning code. Shared memory passes simulator state and actions; cycle-level synchronization enforces the simulator loop timing and deterministic interaction.

The environment exposes multiple action-space choices, including a Base discrete option and a Hybrid parameterized option, and supports action masks so agents can avoid invalid moves. It also provides EPV-based reward shaping to enrich sparse match rewards. The authors supply front-goal scenarios for focused training and an 11-vs-11 full-field benchmark to evaluate full-match tactics; baseline results accompany those benchmarks.

Why it matters

R2D-RL makes a mature RoboCup platform accessible to Python-first MARL researchers by removing the friction of the competition-oriented server-client design. That lowers the barrier to training multi-agent policies on long-horizon, partially observable, cooperative-and-adversarial tasks such as 11-vs-11 soccer. The inclusion of EPV-based reward shaping and configurable action spaces tackles two central RL pain points: sparse rewards and large discrete-continuous action combinations.

The package also standardizes scenario-based evaluation alongside a full-field benchmark, which can help compare algorithms on both tactical subproblems and complete-match performance.

What to watch

Check the code repository linked in the paper for examples and replication; the arXiv record notes that code is available at the URL provided in the submission. Watch for follow-up papers or community baselines that adopt the 11-vs-11 benchmark and for benchmark results that provide detailed metrics beyond the initial baseline results the authors published.

References and notes

  • Paper: "R2D-RL: A RoboCup 2D Soccer Environment for Multi-Agent Reinforcement Learning", Haobin Qin, Baofeng Zhang, Hidehisa Akiyama, Keisuke Fujii, arXiv:2606.18786, submitted 17 Jun 2026.
  • The arXiv entry describes support for RCSS2D, HELIOS-based players, shared-memory communication, cycle-level synchronization, Base discrete and Hybrid parameterized action spaces, action masks, EPV-based reward shaping, parallel execution, front-goal scenarios, and an 11-vs-11 full-field benchmark with baseline results.
Advertisement

Written by The Brieftide · Source: arXiv

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click
Advertisement