InfoDelphi: Multi-agent deliberation improves forecasting
Partitioning evidence into shared public and disjoint private subsets cuts error correlation and improves forecasts on a 375-question.
TL;DR
- 01Partitioning evidence into shared public and disjoint private subsets cuts error correlation and improves forecasts on a 375-question.
- 02The authors report a theoretical argument that decomposing evidence into public and private subsets reduces inter-agent error correlation.
- 03The paper also states that removing information asymmetry largely eliminates the deliberation gains, establishing diverse inputs as the key enabler of effective multi-agent reasoning.
InfoDelphi, introduced in a paper submitted to arXiv on 2 Jul 2026 by Yuante Li and five coauthors, is a multi-agent deliberation framework that partitions evidence into shared public and disjoint private subsets so agents exchange exclusive knowledge only via deliberation. On PolyGym, a benchmark of 375 binary forecasting questions derived from real-world prediction markets, InfoDelphi outperforms the strongest single-agent and multi-agent baselines by 12--18% in Brier score and 4--8 percentage points in accuracy.
What did the authors test and find?
InfoDelphi was evaluated on PolyGym, a set of 375 binary forecasting questions, and produced clear improvements over baselines: a 12--18% reduction in Brier score and a 4--8 percentage point lift in accuracy. The paper frames these results around one central intervention: designing information asymmetry so each agent receives both public evidence and private, disjoint evidence that must be communicated through deliberation.
The authors report a theoretical argument that decomposing evidence into public and private subsets reduces inter-agent error correlation. Empirical experiments then show InfoDelphi combining relevance-aware evidence routing, rationale-based iterative deliberation, and confidence-weighted aggregation beats both the best single-agent and multi-agent baselines. The paper also states that removing information asymmetry largely eliminates the deliberation gains, establishing diverse inputs as the key enabler of effective multi-agent reasoning.
How does InfoDelphi work?
InfoDelphi routes evidence so each agent sees a shared public subset plus a disjoint private subset, enforces iterative deliberation of rationales, and aggregates outputs weighted by agent confidence. Relevance-aware evidence routing decides which pieces become public and which remain private; agents produce rationales during iterative exchange; final forecasts combine agent outputs using confidence-weighted aggregation.
This pipeline is intended to force genuine belief revision across agents. When all agents receive identical evidence, the paper argues deliberation collapses into herding rather than meaningful revision. InfoDelphi’s partitioned evidence makes private information reachable only through deliberation, which the authors show theoretically reduces error correlation between agents and empirically yields the reported Brier score and accuracy gains on PolyGym.
Why does this matter?
Designing what information each agent receives changes the value of multi-agent deliberation. The paper demonstrates that diversity of input, not merely having multiple agents, is the mechanism that unlocks improved probabilistic forecasts. For practitioners using LLM ensembles or multi-agent systems for forecasting, the result reframes a common design choice: identical evidence can produce herding and little benefit, whereas intentionally disjoint evidence can yield measurable calibration and accuracy improvements.
That shift matters because it points to a low-level, implementable intervention—how you partition evidence—that can materially change ensemble performance without changing model size or base capability. The authors’ concrete gains on a 375-question, real-world-derived benchmark link the idea to practical forecasting tasks rather than toy problems.
What to watch
Watch for replication of the PolyGym results and for open-source implementations of InfoDelphi’s routing and aggregation components. The paper highlights two concrete signals: whether other teams see similar 12--18% Brier score improvement on comparable benchmarks, and whether ablations that remove private evidence consistently erase deliberation gains. Confirmation on those points would validate information asymmetry as a general design principle for multi-agent forecasting.
Authors and identifiers: the paper is titled "Diverse Evidence, Better Forecasts: Multi-Agent Deliberation Under Information Asymmetry," authored by Yuante Li, Yicheng Tao, Kate Zhang, Taozhi Wang, Gefei Gu, and Yaxin Zhou, posted to arXiv as arXiv:2607.01661 (submitted 2 Jul 2026).
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Multimodal AIMIT Masked IRL: LLMs help robots clarify and ignore cues
MIT’s Masked IRL uses two LLMs to clarify vague prompts, cut demonstration data nearly fivefold.
Multimodal LLM evaluation: four missing capabilities (2026)
A paper by Po-han Li et al. finds benchmarks miss temporal-spatial coherence, physical-world understanding.
ReMMD: Multilingual Multi-Image Benchmark and Agent Release
ReMMD introduces ReMMDBench (500 samples, 2,756 images) and ReMMD-Agent; GPT-5.2 yields 41.80% accuracy and 39.12% macro-F1.
Amazon Nova embeddings beat Cohere for Vexcel aerial search
Amazon Nova Multimodal Embeddings, evaluated on Vexcel imagery via Amazon Bedrock.