Ghost Attractor Networks: 2.3M decoder beats 1.07B Diffusion
Ghost Attractor Networks, a 2.3-million-parameter dynamical decoder, matches a 1.07-billion-parameter Diffusion Transformer with 462× fewer.
TL;DR
- 01Ghost Attractor Networks, a 2.3-million-parameter dynamical decoder, matches a 1.07-billion-parameter Diffusion Transformer with 462× fewer.
- 02Ghost Attractor Networks, a dynamical decoder by Tianyu Wang, Ying Wang, Zhihao Liu, Xi Vincent Wang and Lihui Wang, was submitted to arXiv on 16 Jun 2026.
- 03The paper proposes a learned potential-drift latent dynamics that builds a basin-attractor geometry for closed-loop sequential generation and tests it as a robotic action decoder.
Ghost Attractor Networks, a dynamical decoder by Tianyu Wang, Ying Wang, Zhihao Liu, Xi Vincent Wang and Lihui Wang, was submitted to arXiv on 16 Jun 2026. The paper proposes a learned potential-drift latent dynamics that builds a basin-attractor geometry for closed-loop sequential generation and tests it as a robotic action decoder.
How does Ghost Attractor Networks work?
Ghost Attractor Networks is a dynamical decoder whose latent evolves under a learned potential with drift to produce a basin-attractor structure by construction; mode transitions arise as saddle-node bifurcations with ghost-attractor escape, and a hierarchical phase-space decomposition separates first-order basin convergence from second-order proprioceptive refinement. The authors motivate the potential-drift form with three desiderata: multi-modality, decoder-level single-pass switching, and constant memory, and train Ghost end-to-end with a behavioral-cloning and contrastive objective.
How does it perform versus alternatives?
A 2.3-million-parameter Ghost matches the offline accuracy of a 1.07-billion-parameter Diffusion Transformer while using 462 times fewer parameters and delivering 32 times lower latency; on held-out latent dynamics, its gradient norm decayed by 67 percent across five integration steps on 1,430 samples. The paper reports that the 2.3M Ghost beats five alternative 2M-parameter decoders (MLP, Neural ODE, CVAE, Transformer, 1-step Diffusion) on offline mean squared error by 5.9 to 29 percent.
In closed-loop evaluation on the LIBERO-10 benchmark, phase conditioning on Ghost's basin-structured latent produced a 13.5 percentage-point success-rate gain over a feed-forward MLP baseline, and persistent-latent ensembling reached a 95.7 percent final success rate.
Why does the basin-structured latent matter?
A structured latent with stable basins gives the decoder explicit geometry for phase-conditioned action generation and for carrying information across steps without storing long histories. That geometry lets Ghost implement mode switching via dynamical bifurcations rather than by iterating large decoders or expanding memory, yielding the parameter and latency reductions reported. For robotics tasks that impose closed-loop control constraints, the paper’s results show a practical route to lower-cost decoders that retain or improve task accuracy.
What to watch
See whether Ghost's basin-structured approach replicates these gains beyond LIBERO-10 and across more diverse robotics tasks and deployment settings, and whether the gradient-flow contraction measured on 1,430 held-out samples sustains over longer horizons. The next confirming signals would be matched offline accuracy reported on additional benchmarks, or comparable closed-loop success rates when phase conditioning and persistent-latent ensembling are applied elsewhere.
References and provenance
The description, numbers and claims here are drawn from the arXiv submission "Ghost Attractor Networks: Basin-Structured Dynamical Decoders for Closed-Loop Sequential Generation," submitted 16 Jun 2026 by Tianyu Wang, Ying Wang, Zhihao Liu, Xi Vincent Wang and Lihui Wang (arXiv:2606.18315).
| Item | ||||
|---|---|---|---|---|
| Parameters | 2.3 million | 1.07 billion | approximately 2 million | |
| Relative parameter factor | 462× fewer (vs Diffusion Transformer) | baseline | baseline | |
| Latency | 32× lower (vs Diffusion Transformer) | baseline | baseline | |
| Offline accuracy / MSE | Matches Diffusion Transformer; 5.9%–29% lower MSE vs five 2M decoders | matches (baseline) | baseline (MLP, Neural ODE, CVAE, Transformer, 1-step Diffusion) | |
| Latent dynamics metric | Gradient norm decayed by 67% across five integration steps on 1,430 held-out samples | n/a | n/a | |
| LIBERO-10 closed-loop | Phase conditioning +13.5 percentage points vs MLP; persistent-latent ensembling 95.7% final success rate | n/a | MLP baseline (for comparison) |
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Coding AgentsData2Story: CSV-to-article pipeline with seven AI agents
A Claude Code skill runs seven specialist agents to turn a CSV into a verifiable, interactive news article with an Inspector panel.
Adobe creative agents arrive in Photoshop, Premiere, and more
Firefly-powered AI assistants automate multi-step production tasks across Creative Cloud and plug into ChatGPT, Claude.
CODA-BENCH benchmark: testing code agents on data tasks
CODA-BENCH places agents in a Kaggle-based Linux sandbox with 1,009 tasks across 31 communities and an average of 980 files per task.
SWE-Explore: benchmark shows AI coding agents miss key lines
SWE-Explore isolates code search from repair and finds agents hit the right files but cover only 14–19% of the lines that matter.