When Rules Learn: Self-Evolving Agent for Legal Case Retrieval
An LLM-based agent iteratively creates and tests query-rewriting rules to boost BM25 on Chinese benchmark LeCaRD-v2.
TL;DR
- 01An LLM-based agent iteratively creates and tests query-rewriting rules to boost BM25 on Chinese benchmark LeCaRD-v2.
- 02In practice the agent produces rule candidates, schedules experiments that combine rules, observes retrieval results, and removes rules that fail to help alignment between queries and relevant cases.
- 03The paper emphasizes the method works "without any parameter training," relying instead on iterative rule discovery and empirical validation against LeCaRD-v2.
When Rules Learn, submitted 15 Jun 2026, introduces a self-evolving framework that equips an LLM-based agent to generate, validate and prune rule-driven query rewrites to improve BM25 retrieval for legal cases. The paper, authored by Mingxu Tao, Jiawei Hu, Xian Zhou, Wenpeng Hu, Jiajun Cheng, Yunbo Cao, Zhunchen Luo and Guotong Geng, is marked "To appear in ACL 2026."
What is the self-evolving framework?
The framework equips an LLM-based agent with an automatic evaluation environment that iteratively creates rewriting rules, plans validation experiments over rule combinations, and eliminates ineffective rules using historical feedback, improving BM25 without any parameter training. In practice the agent produces rule candidates, schedules experiments that combine rules, observes retrieval results, and removes rules that fail to help alignment between queries and relevant cases.
How does the agent interact with BM25 and the benchmark?
The agent drives rule-driven query rewriting that feeds BM25, then validates outcomes on the Chinese legal case retrieval benchmark LeCaRD-v2; the authors report that this self-evolving approach outperforms non-evolutionary baselines, including human-designed rules and greedy rule selection, especially when the system is powered by a high-capacity core LLM. The paper emphasizes the method works "without any parameter training," relying instead on iterative rule discovery and empirical validation against LeCaRD-v2.
How did the authors evaluate the method?
Evaluation used the LeCaRD-v2 dataset as the testbed and compared the self-evolving pipeline to baselines described in the paper: hand-crafted rule sets and a greedy rule selection strategy. The authors report consistent gains over those non-evolutionary baselines and conduct detailed analyses to probe which agent behaviors produce improvements, attributing gains to the LLM's ability to leverage past experimental results and to discard failing rules.
Why it matters
Legal case retrieval demands precise lexical alignment between queries and precedent cases, and the field still finds BM25 a strong baseline despite progress from dense retrieval. This paper shows a pragmatic path to improve a classic lexical retriever without re-training model parameters: have an LLM discover and validate rewriting rules automatically. That approach can lower the engineering barrier to improved retrieval because it repurposes an LLM as a planner and experimenter rather than as a retriever to be re-trained.
What to watch
The paper is slated "To appear in ACL 2026," making the conference presentation the next concrete milestone. Watch for the ACL materials and any accompanying release of the rule sets, evaluation scripts or experimental logs the authors describe in the arXiv entry.
References and concrete facts drawn from the submission: the paper titled "When Rules Learn: A Self-Evolving Agent for Legal Case Retrieval" was submitted on 15 Jun 2026 to arXiv, lists eight authors (Mingxu Tao; Jiawei Hu; Xian Zhou; Wenpeng Hu; Jiajun Cheng; Yunbo Cao; Zhunchen Luo; Guotong Geng), targets the LeCaRD-v2 Chinese legal case retrieval benchmark, and claims improvements over human-designed rules and greedy rule selection while operating "without any parameter training."
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Coding AgentsCODA-BENCH benchmark: testing code agents on data tasks
CODA-BENCH places agents in a Kaggle-based Linux sandbox with 1,009 tasks across 31 communities and an average of 980 files per task.
SWE-Explore: benchmark shows AI coding agents miss key lines
SWE-Explore isolates code search from repair and finds agents hit the right files but cover only 14–19% of the lines that matter.
OpenAI acquires Ona to add persistent agents to Codex
The deal brings Ona's cloud development environments into Codex so agents can continue tasks for hours or days in customers' clouds.
OpenAI Academy launches three courses for practical AI work
Three new Academy courses teach practical AI skills, repeatable workflows, and how to apply agents in everyday work.