Multimodal AI3 min read

MIT method boosts LLM uncertainty estimates without losses

A training technique teaches models to say "I’m not sure," improving calibration and cutting hallucinations while preserving accuracy.

The Brieftide

TL;DR

  • 01A training technique teaches models to say "I’m not sure," improving calibration and cutting hallucinations while preserving accuracy.
  • 02MIT researchers have developed a training method that makes large language models say "I’m not sure" when appropriate, improving their uncertainty estimates without reducing task accuracy.
  • 03The team published the results on April 22, 2026, demonstrating improved calibration and fewer hallucinations across multiple reasoning benchmarks.

MIT researchers have developed a training method that makes large language models say "I’m not sure" when appropriate, improving their uncertainty estimates without reducing task accuracy. The team published the results on April 22, 2026, demonstrating improved calibration and fewer hallucinations across multiple reasoning benchmarks.

The technique adds an abstention-aware objective to model training so models explicitly learn when to decline answering rather than provide a confident but incorrect reply. The researchers evaluated the approach on standard reasoning tasks and compared it to baseline models of similar size and inference settings, finding substantially better confidence alignment with little or no drop in measured task performance.

How the training works

The core idea is to augment the model's output space with an abstention option and train with a loss that rewards correct answers and correct abstentions. During training, examples are labeled not only for correctness but also for whether the model should attempt to answer given the available context. The loss penalizes confidently incorrect predictions and encourages the model to assign low confidence when an answer cannot be reliably produced from the input.

Implementation uses the same model architectures and decoding pipelines as the baselines. The abstention signal is implemented as an additional token or output head indicating uncertainty, and calibration is measured on held-out validation sets. The training regime includes a modest curriculum that gradually introduces ambiguous or out-of-distribution examples so the model learns to withhold answers under uncertainty rather than only when an explicit abstention label is present.

Benchmarks and results

Evaluations reported by the team show marked improvements in calibration metrics such as expected calibration error and in practical measures like hallucination rate on reasoning-style prompts. Task accuracy across arithmetic and logic problems remained effectively unchanged, indicating the model does not sacrifice correctness to gain better uncertainty estimates.

In head-to-head comparisons the abstention-trained models produced fewer confidently wrong answers and more appropriate "I’m not sure" responses on questions designed to elicit hallucinations. The training method also reduced downstream error propagation in multi-step reasoning tests by prompting the model to stop or request clarification when intermediate steps were ambiguous.

The researchers tested the method on several model sizes and found the gains in calibration were consistent, though the absolute number of abstentions varied with model capacity and prompt design. The team notes that prompt engineering and downstream application constraints will influence optimal abstention thresholds.

Why it matters

Improved confidence estimates address a fundamental source of hallucination in reasoning contexts by giving models a principled option to decline when information is insufficient. Systems that must avoid silently making up facts, such as tutoring tools, code assistants, and medical summarization aids, stand to gain immediate safety and usability benefits. The approach also enables more reliable chained reasoning and simpler human oversight because abstentions are easier to detect and handle than subtle misstatements.

Baseline vs Abstention-trained model (selected metrics)
Item
Expected calibration error (ECE)0.120.05
Task accuracy87.0%86.8%
Hallucination rate28%12%
Abstention rate1%9%
Advertisement

Written by The Brieftide · Source: MIT News · AI

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click
Advertisement