Small On-Premises LLMs: Overrefusal in Criminal Legal Prompts
An arXiv paper finds authority-style role prefixes raise refusal rates by 2–20x in small on-premises LLMs used for criminal legal prompts.
TL;DR
- 01An arXiv paper finds authority-style role prefixes raise refusal rates by 2–20x in small on-premises LLMs used for criminal legal prompts.
- 02They measured refusal rates on criminal legal prompts across multiple small LLMs when those prompts were framed with different contextual prefixes.
- 03Authority-style prefixes systematically increased refusal rates, by between 2x and 20x compared with a no-prefix baseline.
Anastasiia Kucherenko and three coauthors published a paper on arXiv on 23 Jun 2026 showing that small, on-premises large language models (LLMs) prompted with criminal legal tasks can change behavior sharply depending on contextual framing. The authors tested several modern small LLMs they identify as “most likely to be used as on-device assistants” and measured how often the models refused to assist when given different role prefixes and jailbreak prompts.
What did the researchers test and why?
They measured refusal rates on criminal legal prompts across multiple small LLMs when those prompts were framed with different contextual prefixes. The team focused on small LLMs intended for on-device or on-premises use and framed prompts with authority-style role prefixes (for example, "you are acting as an assistant of the national supreme court" or "... defense lawyer") and with a known role-play jailbreak prefix. The study aims to anticipate bias that could arise if assistants selectively refuse assistance on certain topics, which could slow or skew case processing even when LLMs are used only for translation or reformulation.
How large were the effects on refusal rates?
Authority-style prefixes systematically increased refusal rates, by between 2x and 20x compared with a no-prefix baseline. That is the paper's headline quantitative finding: authority framings produced a 2--20x rise in refusal rates over the baseline with no prefix. The authors present this range as the core measurement of instability: small on-prem LLMs did not simply shift slightly, they could multiply refusal frequency by an order of magnitude depending on the framing.
How did the role-play jailbreak prefix behave?
The role-play jailbreak prefix produced mixed results: in some models the jailbreak sharply increased refusals, in others it barely shifted refusal rates. The paper contrasts this inconsistent effect with the systematic effect of authority-style prefixes. That mixed outcome indicates different small LLMs react heterogeneously to jailbreak-style contexts, while authority cues reliably push several models toward overrefusal.
Why it matters
Selective overrefusal can create a throughput and access bias even when LLMs are used for low-risk tasks such as translation and reformulation, the authors warn. If models running on local devices systematically refuse certain legal queries when prompted in institutional voice, some cases may take longer to process or receive uneven automated assistance. The finding also means that an institutional user who naturally includes role metadata or authority cues could unintentionally trigger behavior that changes what assistance is offered across cases.
What the paper does not claim
The authors frame the work as an investigation into potential bias in on-device assistants rather than a full evaluation of safety or correctness. They report instability under contextual framings and call for further investigation to minimize opportunities for bias. The paper does not present model-by-model numeric breakdowns in the abstract; it summarizes the effects as a 2--20x increase for authority prefixes and mixed results for the jailbreak prefix.
What to watch
Look for the full paper's detailed tables or follow-up studies that publish per-model refusal rates and the exact prompt sets; the abstract signals a broad 2--20x authority-prefix effect, but model-level heterogeneity is only sketched as "mixed" for the jailbreak prefix. Also watch for work that tests whether those refusal changes affect downstream legal workflows such as translation speed or reformulation coverage.
Paper and authors: "LLMs Prompted for Legal Context Object More: Overrefusal from Small On-Premises LLMs in Criminal Legal Context," submitted to arXiv on 23 Jun 2026 by Anastasiia Kucherenko, François Brouchoud, Dimitri Percia David, and Andrei Kucharavy.
| Item | ||
|---|---|---|
| No-prefix baseline | Baseline refusal rate | Baseline |
| Authority-style prefix (e.g. "you are acting as an assistant of the national supreme court") | Systematically increases refusals | Increase of 2–20x over no-prefix baseline |
| Known role-play jailbreak prefix | Mixed effects across models | Sharply increases refusals in some models; barely shifts them in others |
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Foundation ModelsAge of LLM benchmark: 1v1 reasoning, diplomacy, reliability
Arnaud Ricci's Age of LLM runs 54 matches and 5,258 actions to test 15 LLMs under fog of war, diplomacy and strict JSON reliability.
BIM-Edit: Benchmarking LLMs for IFC-based BIM Editing
BIM-Edit evaluates LLMs on 324 IFC editing tasks across 11 real models and 36 synthetic scenes; the top model averages 49.5%.
QMFOL benchmark: QMFOLBench with 2880 logic instances
QMFOL generates monadic first-order logic problems and ships QMFOLBench with 2880 instances to measure LLM deductive reasoning across.
DeFAb: Defeasible Abduction Benchmark, 372,648+ instances
DeFAb converts four decades of publicly funded knowledge bases into 372.