DivInit improves agentic search on multi-hop QA by 5-7 points
DivInit, a training-free first-turn intervention, selects diverse initial queries and raises multi-hop QA performance by 5-7 points.
TL;DR
- 01DivInit, a training-free first-turn intervention, selects diverse initial queries and raises multi-hop QA performance by 5-7 points.
- 02The method draws n candidate first queries from a single model call, selects k < n diverse seeds, and runs those as parallel trajectories.
- 03DivInit targets that redundancy without retraining the model, by reorganizing how the first queries are initialized.
DivInit, introduced by Sidhaarth Murali and five coauthors on arXiv (arXiv:2606.17209, submitted 15 Jun 2026), is a training-free change to the first turn of agentic search that reduces retrieval overlap and improves multi-hop question answering. The method draws n candidate first queries from a single model call, selects k < n diverse seeds, and runs those as parallel trajectories.
What is DivInit and how does it work?
DivInit is a training-free intervention applied only at the first turn: instead of sampling k independent first queries, it draws n candidates from a single call, picks k < n diverse seeds, and runs them as parallel trajectories. That change forces diversity among the initial queries so the threads retrieve less overlapping evidence, avoiding the common failure mode where similar first queries lead to redundant retrieval and conditioning across rollouts.
The paper frames this as a breadth-scaling fix: test-time scaling often increases depth or breadth, and standard parallel sampling shows diminishing returns because of query redundancy at the first turn. DivInit targets that redundancy without retraining the model, by reorganizing how the first queries are initialized.
How was DivInit evaluated and what were the results?
Across five open-weight models and eight benchmarks, DivInit consistently improved over standard parallel sampling, producing average gains of five to seven points on multi-hop QA at matched compute. The authors attribute standard parallel sampling's limited returns to similar first queries across rollouts, which retrieve overlapping evidence and cause later turns to be conditioned on that shared retrieval.
The paper is 15 pages with 8 figures and is under review at EMNLP 2026. The authors list is Sidhaarth Murali, João Coelho, Jingjie Ning, João Magalhães, Bruno Martins, and Chenyan Xiong. The arXiv entry notes that code is available via a URL provided in the paper.
Why does this matter?
DivInit shows that changing how experiments are launched can be as consequential as changing model weights. By removing redundancy at the first turn, parallel rollouts become more informationally efficient: each trajectory is likelier to retrieve distinct evidence, which yields stronger conditioning for subsequent steps. That matters for users and researchers who scale agentic search with breadth rather than depth, because it promises measurable gains without more compute or model changes.
This is especially relevant for multi-hop QA, where assembling distinct chains of retrieval is central to success. The paper’s reported average improvement of five to seven points on multi-hop QA indicates the intervention meaningfully affects downstream task scores across multiple open-weight models and a set of eight benchmarks.
What to watch
Watch for the EMNLP 2026 review outcome listed in the submission metadata, and for community replication using the code the authors linked from the paper. If independent evaluations reproduce the paper’s average 5–7 point gains across models and benchmarks, DivInit could become a standard preprocessing step for breadth-scaled agentic search.
| Item | ||
|---|---|---|
| Average improvement on multi-hop QA (matched compute) | — | 5-7 points |
| Models evaluated | 5 open-weight models | 5 open-weight models |
| Benchmarks | 8 benchmarks | 8 benchmarks |
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in AI InfrastructureIEEE launches virtual training course on large language models
IEEE is offering a virtual training course that teaches engineers to use large language models as reasoning engines in development.
AI4SE and SE4AI: A decade review of AI in systems engineering
H. Sinan Bank, Daniel R. Herber and Thomas Bradley map three research phases and assess 1.
Amazon's AWS may sell Trainium chips to challenge Nvidia
AWS executives say selling Trainium to third parties is possible, with Andy Jassy estimating a potential ~$50 billion annual run rate.
Hyperscalers AI spending to outpace cash flow by Q3 2026
Epoch AI data shows infrastructure spending growing ~70% annually versus operating cash flow at ~23%, with a crossover around Q3 2026.