Multimodal AIJune 25, 20264 min read

Generative Causal Testing: Microsoft Research explains cortex

GCT distills LLM-based brain-prediction models into short explanations and verifies them by generating fMRI-tested stories that drive.

The BrieftideJune 25, 2026

TL;DR

01GCT distills LLM-based brain-prediction models into short explanations and verifies them by generating fMRI-tested stories that drive.
02The approach, described in a paper accepted in Nature Neuroscience, confirmed targeted cortical responses in experiments where three subjects returned to the scanner.
03GCT works in two concrete stages: explanation, then verification.

Microsoft Research and collaborators introduced generative causal testing, or GCT, a two-step method that turns LLM-based brain-prediction models into short verbal explanations and then verifies them with fMRI. The approach, described in a paper accepted in Nature Neuroscience, confirmed targeted cortical responses in experiments where three subjects returned to the scanner.

How does generative causal testing (GCT) work?

GCT works in two concrete stages: explanation, then verification. First, researchers start from a predictive model for a single voxel or region and extract the short phrases that most strongly drive its predicted response; an LLM then summarizes those driver words into a concise verbal explanation, often a single phrase such as "food preparation" or "location names." Second, the same or another LLM writes new stories whose paragraphs are constructed to match that explanation; subjects hear those synthetic stories in the scanner and the regions response to the "driving" paragraphs is compared against baseline.

The method closes the loop between model and experiment. If a target regions activity to its driving paragraphs is significantly greater than baseline, the explanation passes a causal test rather than remaining a correlational claim. The paper frames this as translating uninterpretable predictive models back into concise, testable scientific hypotheses.

What did GCT find in the brain?

GCT confirmed known selectivity, separated neighboring place-processing regions, and revealed tiny prefrontal micro-regions tuned to highly specific concepts. In mapping results, the explanation "Locations" produced strong responses in the place areas retrosplenial cortex (RSC), occipital place area (OPA), and parahippocampal place area (PPA). The explanation "food preparation" activated a region in ventral occipital cortex near the fusiform face area (FFA).

GCT also disentangled three neighboring place regions previously treated as similar. Differential stimuli crafted by the method showed RSC responds more strongly to proper noun location names such as Tokyo or Connecticut rather than to general location language, a nuance a raw predictive model could not alone provide. Scanning a grid of candidate prefrontal sites and keeping the most stable locations, the team surfaced micro-regions selective for dialogue (words like "said" or "told"), clock times (examples like "one o'clock"), and numeric measurements (phrases like "50 feet"). Across all three subjects the synthetic stories reliably drove their target regions above baseline, and explanations were most trustworthy where the underlying brain-prediction models were strongest.

Why it matters

GCT offers a practical path from black-box prediction to readable, testable theory. Predictive LLMs have been the best tools for forecasting how language evokes brain activity, but their learned parameters are not themselves scientific explanations. GCT converts model features into concise hypotheses and immediately evaluates them with generated experiments, reducing the gap between accurate prediction and human-interpretable explanation. For neuroscience this promises a faster, hypothesis-rich mapping of cortex. The authors also emphasize the broader point that generate-and-verify workflows can extend to other fields where predictive models outrun understanding.

What to watch

Look for the peer-reviewed paper in Nature Neuroscience and the project's code on GitHub for replication and extension. The next concrete signals will be independent labs using LLM-generated stimuli to replicate the RSC place-name selectivity and the reported prefrontal micro-regions, or applying GCT to other sensory or cognitive systems.

Two-step GCT workflow

01
Explanation
Extract the short phrases that most strongly drive a predictive model for a voxel or region; an LLM summarizes those phrases into a concise verbal explanation (for example "food preparation").
02
Verification
Use an LLM to generate new stories whose paragraphs are designed to activate the explained concept; have subjects hear them in the scanner and compare the regions response to baseline.

Written by The Brieftide · Source: Microsoft Research

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

FreeOne email a dayEvery claim sourcedUnsubscribe in one click

Continue reading

ReMMD: Multilingual Multi-Image Benchmark and Agent Release

ReMMD introduces ReMMDBench (500 samples, 2,756 images) and ReMMD-Agent; GPT-5.2 yields 41.80% accuracy and 39.12% macro-F1.

The BrieftideDAILY BRIEF

Amazon Nova embeddings beat Cohere for Vexcel aerial search

Amazon Nova Multimodal Embeddings, evaluated on Vexcel imagery via Amazon Bedrock.

The BrieftideDAILY BRIEF

LLMs: gpt-4o, gpt-4.1-mini and claude-sonnet-4.6 study

Analysis of 21,000 multi-turn conversations finds human-like behaviors vary by model and user and can be modulated by system prompts.

The BrieftideDAILY BRIEF

ThinkDeception: Progressive RL framework for multimodal deception

ThinkDeception on arXiv uses MLLMs, a step-by-step multimodal Chain of Thought dataset and a four-tier progressive RL trainer for.

How does generative causal testing (GCT) work?

What did GCT find in the brain?

Why it matters

What to watch

Explanation

Verification

Continue reading

ReMMD: Multilingual Multi-Image Benchmark and Agent Release

Amazon Nova embeddings beat Cohere for Vexcel aerial search

LLMs: gpt-4o, gpt-4.1-mini and claude-sonnet-4.6 study

ThinkDeception: Progressive RL framework for multimodal deception