ProvenanceGuard paper: source-aware verifier, block F1 0.802
ProvenanceGuard verifies claim-to-source attribution for MCP-based LLM agents.
TL;DR
- 01ProvenanceGuard verifies claim-to-source attribution for MCP-based LLM agents.
- 02ProvenanceGuard, introduced on arXiv on 16 Jun 2026 by Ander Alvarez, Santhiya Rajan, Samuel Mugel and Román Orús, is a source-aware factuality verifier for agents that use the Model Context Protocol.
- 03The system consumes MCP traces with stable tool IDs and source IDs, decomposes answers into atomic claims, and returns per-claim verdicts plus an answer-level allow or block decision.
ProvenanceGuard, introduced on arXiv on 16 Jun 2026 by Ander Alvarez, Santhiya Rajan, Samuel Mugel and Román Orús, is a source-aware factuality verifier for agents that use the Model Context Protocol. The system consumes MCP traces with stable tool IDs and source IDs, decomposes answers into atomic claims, and returns per-claim verdicts plus an answer-level allow or block decision.
How does ProvenanceGuard verify provenance?
ProvenanceGuard routes each atomic claim to source-specific evidence, checks support with natural language inference and a token-alignment proxy, compares the routed source with the answer's stated attribution, and emits per-claim allow/block decisions; blocked answers can be repaired with retrieval-augmented answer revision and then re-verified. The pipeline requires captured MCP traces containing stable tool IDs, source IDs, and raw outputs, and it explicitly targets cross-source conflation where a claim is supported somewhere but attributed to the wrong source.
ProvenanceGuard's steps are: decompose answers into atomic claims, route claims to candidate sources, verify claim support with NLI plus a token-alignment proxy, compare the verifier's routed source against the answer's stated attribution, and produce per-claim verdicts and an overall allow or block decision. When an answer is blocked the system can apply a retrieval-augmented revision and run verification again.
How did it perform on medical MCP traces?
On a 40-trace held-out split, ProvenanceGuard achieved block F1 0.802 and source accuracy 0.858 over 260 source-eligible claims, and on a harder multi-source benchmark it reached block F1 0.846 while source-plus-relation accuracy dropped to 0.229. The authors evaluated the system on 281 medical-domain MCP-agent traces; a 266-trace adjudicated subset produced 2,325 LLM-assisted claim labels split by trace, and 361 held-out labels were human-verified.
The paper highlights two performance patterns. First, ProvenanceGuard outperforms source-blind baselines that do not emit claim-to-source IDs on the held-out split, producing high block detection and source accuracy. Second, exact source ownership remains difficult when sources are semantically close: the multi-source benchmark shows strong block detection (F1 0.846) but weak source-plus-relation accuracy (0.229). Repair-and-reverify resolved all blocked answers in the full trace set, commonly by falling back to conservative revisions. In 50 controlled clinical conflation probes ProvenanceGuard detected all injected attribution swaps with no retained wrong attribution.
Why it matters
Cross-source conflation is a distinct factuality failure mode for tool-using LLM agents: a claim can be supported somewhere in the evidence pool while being attributed to the wrong tool or record. ProvenanceGuard demonstrates that verifying source attribution is an independent axis of factuality verification, not covered by pooled-evidence checks alone. In medical settings the difference matters because claims linked to the wrong source could mislead downstream decisions; the paper's medical-domain evaluation and the 50 controlled clinical probes show that provenance checks can catch attribution swaps that pooled verification would miss.
What to watch
Watch whether future work narrows the gap on source-plus-relation accuracy, which fell to 0.229 on the multi-source benchmark, and whether repair-and-reverify strategies generalize beyond the evaluated medical traces. Also track extensions that test ProvenanceGuard-style verification on other MCP tool mixes and on larger, more diverse trace collections.
Authors and submission details: the paper "ProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM Agents" was submitted to arXiv on 16 Jun 2026 by Ander Alvarez, Santhiya Rajan, Samuel Mugel and Román Orús. The evaluation numbers cited above come from the paper's reported experiments on 281 traces and the specified held-out splits and benchmarks.
| Item | |||
|---|---|---|---|
| Block F1 | 40-trace held-out | 80 | |
| Source accuracy | 40-trace held-out (source-eligible claims) | 86 | |
| Block F1 (multi-source benchmark) | multi-source | 85 | |
| Source-plus-relation accuracy | multi-source | 23 | |
| Evaluated MCP traces | full evaluation set | 281 | |
| Adjudicated subset claim labels | 266-trace adjudicated subset | 2325 | |
| Held-out human-verified labels | held-out labels | 361 | |
| Clinical conflation probes detected | controlled probes | 50/50 detected |
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Multimodal AILLMs: gpt-4o, gpt-4.1-mini and claude-sonnet-4.6 study
Analysis of 21,000 multi-turn conversations finds human-like behaviors vary by model and user and can be modulated by system prompts.
ThinkDeception: Progressive RL framework for multimodal deception
ThinkDeception on arXiv uses MLLMs, a step-by-step multimodal Chain of Thought dataset and a four-tier progressive RL trainer for.
Visual-Seeker: visual-native multimodal search surpasses rivals
Zhengbo Zhang and 12 co-authors submitted Visual-Seeker on 13 Jun 2026.
Gemma 4 12B: unified, encoder-free multimodal model for laptops
Google DeepMind’s 12B model brings encoder-free vision and native audio to laptops, runs on 16GB memory and is released under Apache 2.0.