Foundation Models3 min read

Forced Deferral attack: Manipulating routing in MLLM cascades

A new paper introduces the Forced Deferral Attack (FDA), an adversarial image trigger that lowers weak-model confidence and routes queries.

The Brieftide

TL;DR

  • 01A new paper introduces the Forced Deferral Attack (FDA), an adversarial image trigger that lowers weak-model confidence and routes queries.
  • 02The authors name the technique the Forced Deferral Attack, or FDA, and evaluate it across datasets, model families, and deferral metrics.
  • 03The paper frames the problem around MLLM cascades: systems that first query a weak but cheaper model and defer to a strong model when the weak model's output is unconfident.

Forced Deferral: Manipulating Routing Decisions in Multimodal LLM Cascades, a paper submitted 13 Jun 2026 by Zhongye Liu, Yaopei Zeng, Yurui Chang and Lu Lin (arXiv:2606.15308), describes an attack that forces multimodal model cascades to send queries to expensive strong models. The authors name the technique the Forced Deferral Attack, or FDA, and evaluate it across datasets, model families, and deferral metrics.

What the paper shows

The paper frames the problem around MLLM cascades: systems that first query a weak but cheaper model and defer to a strong model when the weak model's output is unconfident. The authors note that this design saves compute but creates a new attack surface because the weak model's confidence controls compute allocation.

FDA is an adversarial image attack that lowers the weak model's confidence and causes cascades to route queries to the strong model. The attack learns a universal border trigger by optimizing a temperature-flattened objective. That objective pushes the weak model's token distribution on triggered inputs toward less concentrated targets constructed from the weak model's clean responses. In other words, FDA does not directly target answer correctness; it deliberately manipulates the weak model's confidence distribution so the cascade defers more often.

The paper reports that across datasets, model families, and deferral metrics, FDA consistently increases routing to strong models. The authors also state that FDA outperforms two baseline approaches they compared against: image-perturbation and prompt-injection baselines. Those results underline that cascades can be led to allocate more compute to the strong model without altering the correctness of answers on the weak model's clean inputs.

How the cascade and attack interact

In the cascade setup described, a query first goes to a weak, cheaper MLLM. A deferral metric derived from the weak model's confidence determines whether the system accepts the weak model's output or routes the query to a stronger, more expensive model. Because that confidence score controls compute allocation, an adversary can target confidence rather than the content of an answer.

FDA implements that strategy via an image-based universal border trigger. The trigger is optimized with a temperature-flattened objective so that, when present, the weak model's token distribution becomes less peaked. The paper describes constructing less concentrated target distributions from the weak model's responses on clean inputs and optimizing toward those targets. The result is more frequent deferral decisions and increased strong-model usage.

Why it matters

The paper demonstrates a class of attacks that manipulate compute allocation rather than correctness. That matters because it shows attackers can force higher-cost model usage without altering end-user outputs in obvious ways. For systems that cascade models to save compute, the result is unintended strong-model invocation triggered by an adversary-controlled input. The attack therefore links model security to operational cost and resource allocation in a direct way.

The authors' experiments across datasets, model families, and deferral metrics indicate the vulnerability is not confined to a single benchmark or metric. That breadth raises questions about how robust current deferral policies are against targeted confidence manipulation.

What to watch

Look for follow-up work reproducing FDA's effect on deployed cascades and for defensive research that hardens deferral metrics against confidence manipulation. The next concrete signal will be whether cascaded systems still show increased strong-model routing when exposed to universal border triggers optimized with temperature-flattened objectives.

Paper metadata: arXiv:2606.15308, submitted 13 Jun 2026, authors Zhongye Liu, Yaopei Zeng, Yurui Chang, Lu Lin.

MLLM cascade and Forced Deferral Attack (FDA) flow
Input image (possible adversary trigger)Adversary (adds universal border trigger)Weak, cheaper MLLMDeferral metric (weak-model confidence)Routing decisionStrong, expensive MLLMFinal output / compute allocation
Advertisement

Written by The Brieftide · Source: arXiv

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click
Advertisement