Good Explanations and LLMs: prior beliefs shape explainability
Louis Mahon, Elliot Ford and Callum Hackett propose a definition that factors interlocutors' prior beliefs and show why LLM outputs resist.
TL;DR
- 01Louis Mahon, Elliot Ford and Callum Hackett propose a definition that factors interlocutors' prior beliefs and show why LLM outputs resist.
- 02The paper, arXiv:2606.14838 in cs.AI, argues that explainability must account for an interlocutor's prior beliefs and explores why that requirement makes LLM outputs difficult to explain.
- 03The authors offer a definition of good explanations inspired by the notion of counterfactual explanations.
Louis Mahon, Elliot Ford and Callum Hackett submitted a paper to arXiv on 12 Jun 2026 that proposes a formal definition of what makes an explanation "good" and applies it to AI outputs, particularly large language models. The paper, arXiv:2606.14838 in cs.AI, argues that explainability must account for an interlocutor's prior beliefs and explores why that requirement makes LLM outputs difficult to explain.
What the paper says
The authors offer a definition of good explanations inspired by the notion of counterfactual explanations. They extend that idea by insisting explanations should not be judged solely by their structural or causal form, but also by how they interact with the recipient's existing beliefs. Concretely, the paper argues one must "take into account the interlocutor's prior beliefs in each fact that could be offered in an explanation." The authors then explore the ramifications of this definition for AI explainability and focus attention on the challenges posed by large language model outputs.
The submission is listed under Computer Science > Artificial Intelligence (cs.AI) and is available through arXiv as arXiv:2606.14838. The archived record links to a PDF and gives a DOI via DataCite (pending registration).
Definition and the role of prior beliefs
The core move in the paper is to combine counterfactual-style explanations with an epistemic constraint: explanations should change an interlocutor's beliefs in a targeted way. Rather than treating an explanation as a standalone artifact, the authors frame it as a relation between candidate explanatory facts and the recipient's prior credences in those facts. That reframing shifts the task away from listing causes or model internals toward selecting facts that will actually revise an interlocutor's state of belief.
This emphasis on prior beliefs affects both what counts as explanatory content and how that content must be presented. The paper argues that a fact that is informative for one interlocutor may be redundant or misleading for another, depending on their prior beliefs. The implication is that a universally "good" explanation requires tailoring to the audience's epistemic state.
Why LLM outputs are difficult to explain
Mahon, Ford and Hackett apply their definition to outputs from large language models and identify specific frictions. LLM outputs commonly arise from complex, distributed patterns in training data and model weights; the paper suggests that those origins are problematic because supplying internal model facts will not reliably shift an interlocutor's beliefs in the intended way. In short, the authors explore why the usual technical descriptions or provenance traces attached to LLM responses may fail to serve as good explanations under their prior-belief-sensitive definition.
The paper does not stop at diagnosis. It examines implications for explainability practice by showing how the prior-belief requirement complicates attempts to produce explanations that are both faithful to the system and effective for diverse users. The authors use this analysis to argue that explainability efforts need to incorporate models of the recipient's knowledge and expectations when selecting explanatory facts.
Why it matters
Tying explanations to interlocutors' prior beliefs reframes explainability from a purely system-centric task into an interactional one. That matters because many current explainability proposals assume that exposing internal mechanisms or data provenance alone suffices. If Mahon, Ford and Hackett are correct, those exposures will often miss the mark: they may be accurate but uninformative for the person who needs to understand the output. The shift forces researchers and toolmakers to consider audience modeling and communicative effectiveness as part of technical explainability work.
What to watch
Look for follow-up work testing this definition empirically: studies that measure how different explanation facts change recipients' beliefs for LLM outputs would validate or refute the paper's claims. Also watch whether explainability toolkits begin to include interlocutor models or belief-estimation components as part of their pipelines.
References
Title: A Definition of Good Explanations and the Challenges Explaining LLM Outputs Authors: Louis Mahon, Elliot Ford, Callum Hackett arXiv:2606.14838, submitted 12 Jun 2026 (cs.AI)
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Foundation ModelsCross-Modal Representation Alignment for Time-to-Event Modeling
A foundation model framework aligns CT imaging and longitudinal EHR with four fusion strategies.
Forced Deferral attack: Manipulating routing in MLLM cascades
A new paper introduces the Forced Deferral Attack (FDA), an adversarial image trigger that lowers weak-model confidence and routes queries.
BioNeMo Recipes: LoRA fine-tunes ESM2-3B and Evo2-1B on RTX 6000
BioNeMo Recipes show LoRA adapters let ESM2-3B and Evo2-1B be fine-tuned while training only ~1% of parameters on an NVIDIA RTX 6000.
Thousand Token Wood: five labs' small-model finance drama
Thousand Token Wood v2 runs each agent on a different lab's small model and adds insider tips, a truth firewall, bounded memory.