Multimodal AIMarch 13, 20264 min readvia Berkeley AI Research

SPEX: Berkeley's method identifies LLM interactions at scale

Berkeley AI Research released SPEX, a toolkit and benchmark that isolates pairwise and higher-order interactions inside large language.

The Brieftide

March 13, 2026

TL;DR

01Berkeley AI Research released SPEX, a toolkit and benchmark that isolates pairwise and higher-order interactions inside large language.
02Berkeley AI Research released SPEX on March 13, 2026, a toolkit and evaluation suite designed to identify and quantify interactions inside large language models.
03SPEX frames interaction discovery as a staged pipeline.

Berkeley AI Research released SPEX on March 13, 2026, a toolkit and evaluation suite designed to identify and quantify interactions inside large language models. The project includes algorithms for enumerating candidate interactions, a scoring pipeline that combines targeted perturbations and group ablations, and an open benchmark with baseline results and tooling for visualization.

How SPEX works

SPEX frames interaction discovery as a staged pipeline. First, candidate interaction sets are generated from model inputs, intermediate representations, or neurons using heuristics and statistical screening. Next, the pipeline runs targeted perturbations and grouped ablations on those candidates while measuring downstream effects on model outputs. Statistical tests and scoring rules rank interactions by effect size and significance. The designers emphasize scalability: SPEX prunes the search space with screening heuristics and uses sampling to estimate scores for large candidate sets.

The toolkit is model-agnostic by design. It accepts logits, hidden activations, attention maps, or any writable model hooks and can operate on synthetic circuits and production language models. The code provides standardized metrics for pairwise and higher-order effects, plus visualization components to inspect interaction structure at different layers and granularities.

SPEX also ships with a benchmark dataset and evaluation protocol intended to compare methods for interaction detection. The benchmark mixes synthetic tasks where ground-truth interactions are known with real-language tasks intended to surface meaningful model behaviors. Baseline numbers in the release show how screening and grouped ablation together improve precision of detected interactions compared with simple one-by-one perturbations.

Results, release and limitations

BAIR published the SPEX code and benchmark alongside a technical paper detailing the pipeline, scoring choices, and evaluation methodology. The release includes scripts to reproduce baseline experiments, visualization notebooks, and APIs to plug SPEX into common model frameworks. The team highlights that SPEX is practical on models and datasets of nontrivial size by using staged pruning and Monte Carlo estimates, though running full higher-order sweeps remains computationally intensive.

The authors caution about common confounders. Correlated features, distributed representations, and indirect causal chains can produce apparent interaction effects that are difficult to disentangle. SPEX provides statistical diagnostics to flag ambiguous cases, but the toolkit does not eliminate fundamental limits on identifying causal structure from observational probes and interventions alone.

Why it matters

SPEX supplies a repeatable, open protocol for locating interactions that shape LLM outputs, giving researchers and engineers a shared language and metrics for comparison. That standardization can speed work on model interpretability, failure analysis, and targeted mitigation by making interaction hypotheses easier to generate and test. For developers, the pipeline clarifies where interventions or simpler model edits might reduce unwanted behaviors or improve robustness.

SPEX pipeline components

Primary source

Berkeley AI Research

bair.berkeley.edu

Read the original

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

FreeNo adsNo trackingUnsubscribe in one click