Coding Agents4 min read

Instruction Bleed: Cross-Module Interference in Agentic Systems

The paper defines compositional behavioral leakage and shows in 144 trials on Claude Sonnet 4.6 that content edits produce measurable.

The Brieftide

TL;DR

  • 01The paper defines compositional behavioral leakage and shows in 144 trials on Claude Sonnet 4.6 that content edits produce measurable.
  • 02Compositional behavioral leakage, or CBL, is interference between concatenated prompt modules that share a model's context window.
  • 03The authors define CBL as behavioral leakage that arises from architectural non-isolation, specifically noting that transformer self-attention provides no formal boundary between concatenated modules.

Instruction Bleed, a paper by Ching-Yu Lin and Yifan Liu submitted 24 June 2026, formalizes a failure mode the authors call "compositional behavioral leakage" and measures it in a deployed job-evaluation agent. The study runs 144 trials on Claude Sonnet 4.6 using a reusable three-channel protocol and reports that perturbing non-focal modules by content produces a detectable paired effect (Cohen's d = 0.63, bootstrap 95% CI excluding zero).

What is compositional behavioral leakage?

Compositional behavioral leakage, or CBL, is interference between concatenated prompt modules that share a model's context window. The authors define CBL as behavioral leakage that arises from architectural non-isolation, specifically noting that transformer self-attention provides no formal boundary between concatenated modules.

Lin and Liu present CBL as distinct from other failure axes: it is orthogonal to adversarial injection, cognitive degradation, multi-agent fault propagation, and privacy leakage. They frame CBL as a system-class characteristic that can silently alter module behavior even when modules share no variables or executable dependency.

How did the authors measure interference in practice?

They probed a deployed job-evaluation agent, Claude Sonnet 4.6, with a three-channel perturbation protocol across 144 trials and measured paired effects between focal and non-focal modules. The protocol perturbs non-focal modules along three axes: volume, content, and form; only the content channel produced a detectable paired effect, quantified as Cohen's d = 0.63 with a bootstrap 95% confidence interval excluding zero.

The experiment used a reusable three-channel protocol designed to perturb non-focal modules and observe downstream shifts in behavior. The authors report that no recommendation flipped during these trials, characterizing the observed regime as sub-threshold: invisible to standard QA yet capable of compounding across the thousands of decisions a deployed agent makes. The paper runs to 8 pages and includes 2 tables; it also contributes a falsifiable prediction set and a system-class characterization alongside the protocol.

Why does this matter?

CBL exposes a blind spot in common evaluation approaches for prompt-composed agentic systems: small, content-level edits outside a focal module can nudge other modules without triggering standard QA alarms. That matters because the effect can accumulate across many decisions, producing meaningful behavioral drift even when individual recommendations remain unchanged. For teams assembling prompt-composed agents, the finding implies evaluations that treat concatenated modules as independent will miss a measurable source of error.

The paper stakes out measurement, not mitigation: its primary contribution is operational. The authors supply an operational definition, a reusable protocol, and a falsifiable prediction set, and they conclude that cross-module interference measurement should be a required part of prompt-composed agent evaluation.

What to watch

The paper was accepted to the ICML 2026 Workshop on Failure Modes in Agentic AI (FAGEN), Seoul, South Korea. Watch for followups that adopt the authors' three-channel protocol, for independent replications on other deployed agents, and for evaluation suites to add cross-module interference checks to catch sub-threshold behavioral leakage before it compounds in production.

Three-channel probe protocol used to measure CBL
  1. 01

    Assemble prompt-composed agent

    Concatenate focal and non-focal prompt modules into a single context window for Claude Sonnet 4.6.

  2. 02

    Perturb non-focal modules

    Apply controlled perturbations along three channels: volume, content, and form.

  3. 03

    Run trials

    Execute the protocol on the deployed job-evaluation agent for 144 trials.

  4. 04

    Measure paired effects

    Compare focal-module outputs against baseline and compute effect sizes; report Cohen's d = 0.63 with bootstrap 95% CI excluding zero for the content channel.

Advertisement

Written by The Brieftide · Source: arXiv

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click
Advertisement