Open Source AI5 min read

LaGO: Latent Action Guidance boosts online RL success rates

LaGO uses pretrained LLMs as latent action priors to guide online policy optimization.

The Brieftide

TL;DR

  • 01LaGO uses pretrained LLMs as latent action priors to guide online policy optimization.
  • 02The paper, by Kuan-Yen Liu, Ren-Jyun Huang and Ti-Rong Wu, was submitted to arXiv on 23 Jun 2026 and accepted at the ICML 2026 Workshop on Large Language Models for Planning (LM4Plan).
  • 03The authors report the paper is 9 pages with 2 figures.

LaGO, Latent Action Guidance for Online Reinforcement Learning, is a framework that uses a pretrained large language model as a latent action prior to softly guide online policy optimization rather than as a direct controller. The paper, by Kuan-Yen Liu, Ren-Jyun Huang and Ti-Rong Wu, was submitted to arXiv on 23 Jun 2026 and accepted at the ICML 2026 Workshop on Large Language Models for Planning (LM4Plan). The authors report the paper is 9 pages with 2 figures.

What is LaGO and how does it work?

LaGO places a pretrained LLM alongside a learning policy and treats the model as a latent action prior that softly biases online policy updates, rather than demanding exact action outputs from the LLM. In practice the LLM supplies guidance in a latent space used during policy optimization; the framework is explicitly designed to avoid using the LLM as an explicit planner or controller that must generate precise actions.

The paper frames this design as a response to prior work that relied on LLMs for direct control, which can be unreliable in action-level generation. LaGO’s setup lets the online RL algorithm retain primary control while drawing on the LLM’s sequential-decision knowledge as a probabilistic prior during optimization.

How did LaGO perform on benchmarks?

LaGO improved average success rates across both a discrete-control benchmark and a continuous-control benchmark: on CLEVR-Robot the average success rate rose from 15.1% to 27.2%, and on Meta-World it rose from 2.7% to 15.2% compared with Vanilla PPO. The experiments, run on CLEVR-Robot (discrete control) and Meta-World (continuous control), show LaGO consistently improves both reward and success rate over Vanilla PPO.

The authors also report an analysis that links guidance effectiveness to LLM strength: stronger pretrained LLMs provided more effective guidance in their experiments. The paper does not claim LaGO removes the need for online learning; instead it positions the LLM as a complementary source of planning knowledge that the optimizer incorporates softly.

Why it matters

LaGO’s approach reduces reliance on LLMs to act as precise controllers, which is significant because direct control demands exact action generation and can be brittle. By using LLMs as priors during online optimization, the method captures planning knowledge without making the LLM the execution layer. That design could make LLM-derived knowledge easier to integrate into conventional RL pipelines and may lower failure modes tied to action-level hallucination from LLMs.

The reported numerical gains are concrete: a near doubling of average success on CLEVR-Robot and a more than fivefold gain on Meta-World compared with Vanilla PPO, indicating the approach can matter across both discrete and continuous control tasks.

What to watch

Whether LaGO’s improvements scale with larger or different pretrained LLMs is the immediate signal to follow, because the paper’s analysis ties guidance quality to LLM strength. Also watch for tests beyond Vanilla PPO and the two benchmarks used here to see if the latent-prior pattern generalizes across algorithms and domains.

Paper and provenance

The full manuscript is available on arXiv as arXiv:2606.24669, submitted 23 Jun 2026. The authors list Kuan-Yen Liu, Ren-Jyun Huang and Ti-Rong Wu, and note the paper was accepted at the ICML 2026 Workshop on Large Language Models for Planning (LM4Plan). The document runs 9 pages and contains 2 figures.

LaGO vs Vanilla PPO: reported average success rates
Item
CLEVR-Robot15.1%27.2%
Meta-World2.7%15.2%
Advertisement

Written by The Brieftide · Source: arXiv

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click
Advertisement