Instruction Bleed: Cross-Module Interference in Agentic Systems
The paper defines compositional behavioral leakage and shows in 144 trials on Claude Sonnet 4.6 that content edits produce measurable.
TL;DR
- 01The paper defines compositional behavioral leakage and shows in 144 trials on Claude Sonnet 4.6 that content edits produce measurable.
- 02Compositional behavioral leakage, or CBL, is interference between concatenated prompt modules that share a model's context window.
- 03The authors define CBL as behavioral leakage that arises from architectural non-isolation, specifically noting that transformer self-attention provides no formal boundary between concatenated modules.
Instruction Bleed, a paper by Ching-Yu Lin and Yifan Liu submitted 24 June 2026, formalizes a failure mode the authors call "compositional behavioral leakage" and measures it in a deployed job-evaluation agent. The study runs 144 trials on Claude Sonnet 4.6 using a reusable three-channel protocol and reports that perturbing non-focal modules by content produces a detectable paired effect (Cohen's d = 0.63, bootstrap 95% CI excluding zero).
What is compositional behavioral leakage?
Compositional behavioral leakage, or CBL, is interference between concatenated prompt modules that share a model's context window. The authors define CBL as behavioral leakage that arises from architectural non-isolation, specifically noting that transformer self-attention provides no formal boundary between concatenated modules.
Lin and Liu present CBL as distinct from other failure axes: it is orthogonal to adversarial injection, cognitive degradation, multi-agent fault propagation, and privacy leakage. They frame CBL as a system-class characteristic that can silently alter module behavior even when modules share no variables or executable dependency.
How did the authors measure interference in practice?
They probed a deployed job-evaluation agent, Claude Sonnet 4.6, with a three-channel perturbation protocol across 144 trials and measured paired effects between focal and non-focal modules. The protocol perturbs non-focal modules along three axes: volume, content, and form; only the content channel produced a detectable paired effect, quantified as Cohen's d = 0.63 with a bootstrap 95% confidence interval excluding zero.
The experiment used a reusable three-channel protocol designed to perturb non-focal modules and observe downstream shifts in behavior. The authors report that no recommendation flipped during these trials, characterizing the observed regime as sub-threshold: invisible to standard QA yet capable of compounding across the thousands of decisions a deployed agent makes. The paper runs to 8 pages and includes 2 tables; it also contributes a falsifiable prediction set and a system-class characterization alongside the protocol.
Why does this matter?
CBL exposes a blind spot in common evaluation approaches for prompt-composed agentic systems: small, content-level edits outside a focal module can nudge other modules without triggering standard QA alarms. That matters because the effect can accumulate across many decisions, producing meaningful behavioral drift even when individual recommendations remain unchanged. For teams assembling prompt-composed agents, the finding implies evaluations that treat concatenated modules as independent will miss a measurable source of error.
The paper stakes out measurement, not mitigation: its primary contribution is operational. The authors supply an operational definition, a reusable protocol, and a falsifiable prediction set, and they conclude that cross-module interference measurement should be a required part of prompt-composed agent evaluation.
What to watch
The paper was accepted to the ICML 2026 Workshop on Failure Modes in Agentic AI (FAGEN), Seoul, South Korea. Watch for followups that adopt the authors' three-channel protocol, for independent replications on other deployed agents, and for evaluation suites to add cross-module interference checks to catch sub-threshold behavioral leakage before it compounds in production.
Assemble prompt-composed agent
Concatenate focal and non-focal prompt modules into a single context window for Claude Sonnet 4.6.
Perturb non-focal modules
Apply controlled perturbations along three channels: volume, content, and form.
Run trials
Execute the protocol on the deployed job-evaluation agent for 144 trials.
Measure paired effects
Compare focal-module outputs against baseline and compute effect sizes; report Cohen's d = 0.63 with bootstrap 95% CI excluding zero for the content channel.
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Coding AgentsAutoformalization: Agent Instructions to Policy-as-Code
A pipeline that uses an LLM generator-critic loop to turn prompts and policy text into Cedar policies, submitted 25 Jun 2026.
Agentic Analysis: LLM Pipeline compares ERC-8004 and Google A2A
An LLM-powered pipeline analyzes 4,323 governance participation records across ERC-8004 (permissionless.
Data2Story: CSV-to-article pipeline with seven AI agents
A Claude Code skill runs seven specialist agents to turn a CSV into a verifiable, interactive news article with an Inspector panel.
Vibe Coding: AI evaluation for greenfield software engineering
Callum Barbour's arXiv paper tests 'vibe coding' on isolated Python greenfield tasks using a custom evaluation suite.