Coding AgentsJune 17, 20264 min read

When Rules Learn: Self-Evolving Agent for Legal Case Retrieval

An LLM-based agent iteratively creates and tests query-rewriting rules to boost BM25 on Chinese benchmark LeCaRD-v2.

The BrieftideJune 17, 2026

TL;DR

01An LLM-based agent iteratively creates and tests query-rewriting rules to boost BM25 on Chinese benchmark LeCaRD-v2.
02In practice the agent produces rule candidates, schedules experiments that combine rules, observes retrieval results, and removes rules that fail to help alignment between queries and relevant cases.
03The paper emphasizes the method works "without any parameter training," relying instead on iterative rule discovery and empirical validation against LeCaRD-v2.

When Rules Learn, submitted 15 Jun 2026, introduces a self-evolving framework that equips an LLM-based agent to generate, validate and prune rule-driven query rewrites to improve BM25 retrieval for legal cases. The paper, authored by Mingxu Tao, Jiawei Hu, Xian Zhou, Wenpeng Hu, Jiajun Cheng, Yunbo Cao, Zhunchen Luo and Guotong Geng, is marked "To appear in ACL 2026."

What is the self-evolving framework?

The framework equips an LLM-based agent with an automatic evaluation environment that iteratively creates rewriting rules, plans validation experiments over rule combinations, and eliminates ineffective rules using historical feedback, improving BM25 without any parameter training. In practice the agent produces rule candidates, schedules experiments that combine rules, observes retrieval results, and removes rules that fail to help alignment between queries and relevant cases.

How does the agent interact with BM25 and the benchmark?

The agent drives rule-driven query rewriting that feeds BM25, then validates outcomes on the Chinese legal case retrieval benchmark LeCaRD-v2; the authors report that this self-evolving approach outperforms non-evolutionary baselines, including human-designed rules and greedy rule selection, especially when the system is powered by a high-capacity core LLM. The paper emphasizes the method works "without any parameter training," relying instead on iterative rule discovery and empirical validation against LeCaRD-v2.

How did the authors evaluate the method?

Evaluation used the LeCaRD-v2 dataset as the testbed and compared the self-evolving pipeline to baselines described in the paper: hand-crafted rule sets and a greedy rule selection strategy. The authors report consistent gains over those non-evolutionary baselines and conduct detailed analyses to probe which agent behaviors produce improvements, attributing gains to the LLM's ability to leverage past experimental results and to discard failing rules.

Why it matters

Legal case retrieval demands precise lexical alignment between queries and precedent cases, and the field still finds BM25 a strong baseline despite progress from dense retrieval. This paper shows a pragmatic path to improve a classic lexical retriever without re-training model parameters: have an LLM discover and validate rewriting rules automatically. That approach can lower the engineering barrier to improved retrieval because it repurposes an LLM as a planner and experimenter rather than as a retriever to be re-trained.

What to watch

The paper is slated "To appear in ACL 2026," making the conference presentation the next concrete milestone. Watch for the ACL materials and any accompanying release of the rule sets, evaluation scripts or experimental logs the authors describe in the arXiv entry.

References and concrete facts drawn from the submission: the paper titled "When Rules Learn: A Self-Evolving Agent for Legal Case Retrieval" was submitted on 15 Jun 2026 to arXiv, lists eight authors (Mingxu Tao; Jiawei Hu; Xian Zhou; Wenpeng Hu; Jiajun Cheng; Yunbo Cao; Zhunchen Luo; Guotong Geng), targets the LeCaRD-v2 Chinese legal case retrieval benchmark, and claims improvements over human-designed rules and greedy rule selection while operating "without any parameter training."

Self-evolving rule framework data flow

Written by The Brieftide · Source: arXiv

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

FreeOne email a dayEvery claim sourcedUnsubscribe in one click

Continue reading

CODA-BENCH benchmark: testing code agents on data tasks

CODA-BENCH places agents in a Kaggle-based Linux sandbox with 1,009 tasks across 31 communities and an average of 980 files per task.

The BrieftideDAILY BRIEF

SWE-Explore: benchmark shows AI coding agents miss key lines

SWE-Explore isolates code search from repair and finds agents hit the right files but cover only 14–19% of the lines that matter.

The BrieftideDAILY BRIEF

OpenAI acquires Ona to add persistent agents to Codex

The deal brings Ona's cloud development environments into Codex so agents can continue tasks for hours or days in customers' clouds.

The BrieftideDAILY BRIEF

OpenAI Academy launches three courses for practical AI work

Three new Academy courses teach practical AI skills, repeatable workflows, and how to apply agents in everyday work.