Multimodal AIJuly 3, 20264 min read

Hidden Forgetting in MLLMs: RCL reduces evidence drift

A replay-free reliance-constrained continual learning (RCL) method preserves answers while cutting modality reliance drift and hidden.

The BrieftideJuly 3, 2026

TL;DR

01A replay-free reliance-constrained continual learning (RCL) method preserves answers while cutting modality reliance drift and hidden.
02The paper introduces RCL, a replay-free reliance-constrained continual learning framework, and evaluates it across CoIN, COAST, MCITlib and an evidence-sensitive multimodal stream.
03Hidden forgetting is a failure mode where final accuracy remains intact but the model silently shifts which evidence channels it uses to justify answers.

Hidden Forgetting in Continual Multimodal Learning: When Accuracy Survives but Grounding Fails, submitted 2 Jul 2026 by Qianyu Chen, Canran Xiao and Runxuan Tang, identifies a failure mode where models keep correct answers while changing the evidence they rely on. The paper introduces RCL, a replay-free reliance-constrained continual learning framework, and evaluates it across CoIN, COAST, MCITlib and an evidence-sensitive multimodal stream.

What is hidden forgetting?

Hidden forgetting is a failure mode where final accuracy remains intact but the model silently shifts which evidence channels it uses to justify answers. The paper names this phenomenon "hidden evidence-use forgetting" and contrasts it with standard continual learning metrics that only check whether old answers remain correct. Hidden forgetting describes cases where correctness survives but grounding in visual, textual, OCR, chart or document evidence erodes or flips to different channels.

How does RCL work?

RCL freezes the previous checkpoint as a behavioral reference, estimates teacher and student evidence-reliance profiles using counterfactual channel interventions, and jointly optimizes task learning, prediction preservation and reliance preservation without adding inference-time cost. In practice RCL operates replay-free: it does not add replay buffers at training time and it preserves reliance by comparing a frozen teacher checkpoint to the current student via interventions that isolate evidence channels.

The method has three explicit objectives. First, task learning to acquire new skills. Second, prediction preservation to keep previously correct outputs. Third, reliance preservation to keep which modalities and channels the model actually uses. The authors position these goals as necessary to prevent silent shifts in evidence use even when accuracy stays high.

How was RCL evaluated and what changed?

RCL was tested across CoIN, COAST, MCITlib and an evidence-sensitive multimodal stream. Across those datasets the paper reports that RCL consistently improves final performance and reduces forgetting when compared with replay-free baselines, PEFT, routing approaches and memory-assisted baselines. The authors further report that RCL substantially lowers modality reliance drift, dominant evidence flips and hidden forgetting rates. The evaluations emphasize evidence-path preservation rather than only answer retention.

Why it matters

Continual adaptation of multimodal large language models can hide degradations in how models ground answers, creating models that answer correctly but for the wrong reasons. That undermines trust in model explanations and weakens safe use in tasks that require verifiable visual or document evidence. The paper argues preserving the evidence path behind correct answers is a different technical objective from preserving accuracy, and shows a concrete training recipe that targets that objective without adding inference-time overhead.

What to watch

Watch for RCL being applied beyond the evaluated benchmarks and for public code and datasets tied to this work. The submission appears on arXiv as arXiv:2607.02020 (v1), and the paper lists code, data and media toggles alongside the preprint, which suggests follow-up artifacts may appear for others to reproduce the reliance-profiling interventions and comparisons against PEFT, routing and memory-assisted baselines.

Written by The Brieftide · Source: arXiv

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

FreeOne email a dayEvery claim sourcedUnsubscribe in one click

Continue reading

MIT Masked IRL: LLMs help robots clarify and ignore cues

MIT’s Masked IRL uses two LLMs to clarify vague prompts, cut demonstration data nearly fivefold.

The BrieftideDAILY BRIEF

Multimodal LLM evaluation: four missing capabilities (2026)

A paper by Po-han Li et al. finds benchmarks miss temporal-spatial coherence, physical-world understanding.

The BrieftideDAILY BRIEF

ReMMD: Multilingual Multi-Image Benchmark and Agent Release

ReMMD introduces ReMMDBench (500 samples, 2,756 images) and ReMMD-Agent; GPT-5.2 yields 41.80% accuracy and 39.12% macro-F1.

The BrieftideDAILY BRIEF

Amazon Nova embeddings beat Cohere for Vexcel aerial search

Amazon Nova Multimodal Embeddings, evaluated on Vexcel imagery via Amazon Bedrock.