Forced Deferral attack: Manipulating routing in MLLM cascades
A new paper introduces the Forced Deferral Attack (FDA), an adversarial image trigger that lowers weak-model confidence and routes queries.
TL;DR
- 01A new paper introduces the Forced Deferral Attack (FDA), an adversarial image trigger that lowers weak-model confidence and routes queries.
- 02The authors name the technique the Forced Deferral Attack, or FDA, and evaluate it across datasets, model families, and deferral metrics.
- 03The paper frames the problem around MLLM cascades: systems that first query a weak but cheaper model and defer to a strong model when the weak model's output is unconfident.
Forced Deferral: Manipulating Routing Decisions in Multimodal LLM Cascades, a paper submitted 13 Jun 2026 by Zhongye Liu, Yaopei Zeng, Yurui Chang and Lu Lin (arXiv:2606.15308), describes an attack that forces multimodal model cascades to send queries to expensive strong models. The authors name the technique the Forced Deferral Attack, or FDA, and evaluate it across datasets, model families, and deferral metrics.
What the paper shows
The paper frames the problem around MLLM cascades: systems that first query a weak but cheaper model and defer to a strong model when the weak model's output is unconfident. The authors note that this design saves compute but creates a new attack surface because the weak model's confidence controls compute allocation.
FDA is an adversarial image attack that lowers the weak model's confidence and causes cascades to route queries to the strong model. The attack learns a universal border trigger by optimizing a temperature-flattened objective. That objective pushes the weak model's token distribution on triggered inputs toward less concentrated targets constructed from the weak model's clean responses. In other words, FDA does not directly target answer correctness; it deliberately manipulates the weak model's confidence distribution so the cascade defers more often.
The paper reports that across datasets, model families, and deferral metrics, FDA consistently increases routing to strong models. The authors also state that FDA outperforms two baseline approaches they compared against: image-perturbation and prompt-injection baselines. Those results underline that cascades can be led to allocate more compute to the strong model without altering the correctness of answers on the weak model's clean inputs.
How the cascade and attack interact
In the cascade setup described, a query first goes to a weak, cheaper MLLM. A deferral metric derived from the weak model's confidence determines whether the system accepts the weak model's output or routes the query to a stronger, more expensive model. Because that confidence score controls compute allocation, an adversary can target confidence rather than the content of an answer.
FDA implements that strategy via an image-based universal border trigger. The trigger is optimized with a temperature-flattened objective so that, when present, the weak model's token distribution becomes less peaked. The paper describes constructing less concentrated target distributions from the weak model's responses on clean inputs and optimizing toward those targets. The result is more frequent deferral decisions and increased strong-model usage.
Why it matters
The paper demonstrates a class of attacks that manipulate compute allocation rather than correctness. That matters because it shows attackers can force higher-cost model usage without altering end-user outputs in obvious ways. For systems that cascade models to save compute, the result is unintended strong-model invocation triggered by an adversary-controlled input. The attack therefore links model security to operational cost and resource allocation in a direct way.
The authors' experiments across datasets, model families, and deferral metrics indicate the vulnerability is not confined to a single benchmark or metric. That breadth raises questions about how robust current deferral policies are against targeted confidence manipulation.
What to watch
Look for follow-up work reproducing FDA's effect on deployed cascades and for defensive research that hardens deferral metrics against confidence manipulation. The next concrete signal will be whether cascaded systems still show increased strong-model routing when exposed to universal border triggers optimized with temperature-flattened objectives.
Paper metadata: arXiv:2606.15308, submitted 13 Jun 2026, authors Zhongye Liu, Yaopei Zeng, Yurui Chang, Lu Lin.
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Foundation ModelsCross-Modal Representation Alignment for Time-to-Event Modeling
A foundation model framework aligns CT imaging and longitudinal EHR with four fusion strategies.
Good Explanations and LLMs: prior beliefs shape explainability
Louis Mahon, Elliot Ford and Callum Hackett propose a definition that factors interlocutors' prior beliefs and show why LLM outputs resist.
BioNeMo Recipes: LoRA fine-tunes ESM2-3B and Evo2-1B on RTX 6000
BioNeMo Recipes show LoRA adapters let ESM2-3B and Evo2-1B be fine-tuned while training only ~1% of parameters on an NVIDIA RTX 6000.
Thousand Token Wood: five labs' small-model finance drama
Thousand Token Wood v2 runs each agent on a different lab's small model and adds insider tips, a truth firewall, bounded memory.