DigenRL: Disaggregated RL for Diffusion Visual LLMs, 1.56–2.10x
DigenRL disaggregates rollout and training for diffusion-based generative LLMs, boosting throughput 1.56–2.10x versus veRL-Omni and GenRL.
TL;DR
- 01DigenRL disaggregates rollout and training for diffusion-based generative LLMs, boosting throughput 1.56–2.10x versus veRL-Omni and GenRL.
- 02DigenRL is a disaggregated reinforcement learning framework for diffusion-based generative large language models, submitted to arXiv on 23 Jun 2026.
- 03GAP and TSP change how diffusion models are partitioned for parallel execution so rollout and training can overlap more effectively.
DigenRL is a disaggregated reinforcement learning framework for diffusion-based generative large language models, submitted to arXiv on 23 Jun 2026. The system targets the inefficiencies of colocated execution and, in experiments on three hardware testbeds with 16–32 GPUs, DigenRL achieved 1.56–2.10x throughput improvements over veRL-Omni and GenRL.
How does DigenRL speed diffusion RL?
DigenRL speeds diffusion-based generative LLM training by separating rollout and training resources and adding three pipeline optimizations: generation-axis pipeline and time-step parallelism, trainer-assisted generation, and a tightly one-step constrained asynchronous strategy. The paper introduces generation-axis pipeline (GAP) and time-step parallelism (TSP) to enable finer-grained pipelining between rollout and training, an elastic trainer-assisted generation (TAG) method that lets trainer GPUs dynamically assist rollouts, and a one-step constrained asynchronous strategy to use the pipeline tail bubble.
GAP and TSP change how diffusion models are partitioned for parallel execution so rollout and training can overlap more effectively. TAG allows trainer-side GPU resources to temporarily execute rollout generations when idle, reducing idle time in a disaggregated deployment. The asynchronous constraint reduces synchronization stalls while keeping correctness for the one-step interactions the authors target.
How was DigenRL evaluated and compared?
The authors ran experiments on three hardware testbeds using clusters of 16–32 GPUs and four generative models: HunyuanVideo-13B, Wan2.1-14B, FLUX.1-12B, and QwenImage-20B. Across those setups, DigenRL produced throughput improvements in the range 1.56–2.10x compared with state-of-the-art diffusion RL systems veRL-Omni and GenRL. The paper is 14 pages long and includes 18 figures and 1 table documenting the results.
The evaluation emphasizes heterogeneous GPU support and flexible resource allocation, contrasting with veRL-Omni which, the paper notes, relies on colocated execution that couples rollout and training resources and limits independent scaling. The authors position DigenRL to accommodate heterogeneous GPUs and to facilitate more efficient task scheduling in disaggregated architectures.
Why it matters
Disaggregating rollout and training removes the requirement that the same machines handle both tasks, which can free teams to scale compute pools independently and to mix GPU types. The paper’s techniques directly target the common performance problem in disaggregated RL systems: execution bubbles created by mismatched rates of rollout and training. Allowing trainer GPUs to assist rollouts and applying finer-grained pipelining reduces those bubbles, which explains the measured 1.56–2.10x throughput range. For teams training diffusion-oriented generative LLMs, that translates to better utilization of GPU fleets and a clearer path to heterogeneous deployment.
What to watch
Check the paper’s Code, Data and Media section and the external toggles listed on the arXiv page such as Hugging Face, DagsHub, Replicate and Hugging Face Spaces for code or demos linked to the submission. Independent reproductions on public testbeds and per-system breakdowns of the reported 1.56–2.10x gains versus veRL-Omni and GenRL will be the next concrete signals to confirm how broadly the improvements apply.
| Item | |||
|---|---|---|---|
| Throughput improvement (x) | 1.56–2.10 | 1.56–2.10 | |
| Testbed GPUs | 16–32 | 16–32 | |
| Models used | HunyuanVideo-13B; Wan2.1-14B; FLUX.1-12B; QwenImage-20B | HunyuanVideo-13B; Wan2.1-14B; FLUX.1-12B; QwenImage-20B | |
| Paper length / figures / tables | 14 pages; 18 figures; 1 table | 14 pages; 18 figures; 1 table |
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Multimodal AIAmazon Nova embeddings beat Cohere for Vexcel aerial search
Amazon Nova Multimodal Embeddings, evaluated on Vexcel imagery via Amazon Bedrock.
LLMs: gpt-4o, gpt-4.1-mini and claude-sonnet-4.6 study
Analysis of 21,000 multi-turn conversations finds human-like behaviors vary by model and user and can be modulated by system prompts.
ThinkDeception: Progressive RL framework for multimodal deception
ThinkDeception on arXiv uses MLLMs, a step-by-step multimodal Chain of Thought dataset and a four-tier progressive RL trainer for.
Reliability-Aware Inference reduces visual hallucinations in MLLMs
A retrieval-augmented, reliability-aware framework lifted ImageNet-100 accepted accuracy from 85.84% to 88.88% (89.04% coverage) and cut.