Coding AgentsJune 26, 20265 min read

auto-psych: Agent-driven theory discovery and experiments

An agent-based system called auto-psych generates, tests and refines probabilistic cognitive models by running crowdsourced experiments.

The BrieftideJune 26, 2026

TL;DR

01An agent-based system called auto-psych generates, tests and refines probabilistic cognitive models by running crowdsourced experiments.
02The authors implemented this pipeline in a computational cognitive science testbed: judgments of which sequences of coin flips seem subjectively more random.
03They emphasize that psychology is especially amenable to this approach because many theories are expressible as code and crowdsourcing platforms allow programmatic, large-scale human data collection.

Ben Prystawski and six coauthors submitted a paper to arXiv on 24 Jun 2026 describing auto-psych, an agent-driven system that generates and tests theories of human cognition by running online experiments. The system uses nested agent loops to conjecture probabilistic models, design crowdsourced experiments, collect human data and analyze results, and in three independent sequences of human experiments it produced theories that fit participants' responses better than theories drawn from the scientific literature.

How does auto-psych work?

Auto-psych runs two nested discovery loops: an inner loop that conjectures, fits and critiques probabilistic cognitive models, and an outer loop that designs experiments, launches them online and analyzes the data. The inner loop proposes candidate models of behavior and evaluates them against available data, while the outer loop chooses which experiments to run on crowdsourcing platforms, collects human responses programmatically and feeds the new data back to the inner loop for refinement.

The authors implemented this pipeline in a computational cognitive science testbed: judgments of which sequences of coin flips seem subjectively more random. They emphasize that psychology is especially amenable to this approach because many theories are expressible as code and crowdsourcing platforms allow programmatic, large-scale human data collection. The paper states that data collection is a major bottleneck for automated scientific cycles, and auto-psych directly targets that bottleneck by independently launching online survey experiments.

What did the experiments find?

The system reliably recovered ground-truth theories from synthetic data and, in three independent sequences of human experiments, discovered theories that fit the collected data better than literature-derived theories. The paper reports both simulation-based validation, where ground-truth models were recovered via systematic experimentation, and three separate human experiment runs that produced superior-fitting models relative to prior published theories.

The submission to arXiv is documented as a 30 page manuscript with five figures. The authors present the coin-flip judgment case study as a demonstration of feasibility: the nested structure of agents was critical to model performance, meaning that the two-loop architecture materially contributed to the system's ability to find better-fitting theories.

Why it matters

Auto-psych addresses a concrete bottleneck: collecting human behavioral data at scale for iterative theory testing. By combining programmatic experiment launches with automated model proposal and critique, the system shortens the loop between hypothesis and evidence. For computational cognitive science, where models can be encoded as executable code, this approach reduces the manual labor required to run surveys and analyze results, and it enables systematic exploration of model space that would be slow by hand.

The finding that the nested agent structure is critical suggests simple automation is not enough; effective scientific automation needs coordinated components that both invent models and choose informative experiments. That constraint points to an engineering as well as scientific challenge for future systems that aim to automate parts of the research process.

What to watch

Watch whether the auto-psych architecture generalizes beyond the coin-flip randomness case to other cognitive tasks where theories are codified and crowdsourced data can be collected. Also watch whether subsequent work replicates the claim that nested agent loops outperform conventional literature-driven model selection when applied to diverse experimental domains.

References and paper details: the arXiv submission is "auto-psych: Automating the science of mind using agent-driven theory discovery and experimentation" by Ben Prystawski, Kushin Mukherjee, Daniel Wurgaft, Linas Nasvytis, Michael Y. Li, Noah D. Goodman and Michael C. Frank, posted 24 Jun 2026. The authors present simulation validation and three independent human experiment sequences and document the work across a 30 page manuscript with five figures.

auto-psych system architecture: nested loops and data flow

Written by The Brieftide · Source: arXiv

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

FreeOne email a dayEvery claim sourcedUnsubscribe in one click

Continue reading

Autoformalization: Agent Instructions to Policy-as-Code

A pipeline that uses an LLM generator-critic loop to turn prompts and policy text into Cedar policies, submitted 25 Jun 2026.

The BrieftideDAILY BRIEF

Agentic Analysis: LLM Pipeline compares ERC-8004 and Google A2A

An LLM-powered pipeline analyzes 4,323 governance participation records across ERC-8004 (permissionless.

The BrieftideDAILY BRIEF

Data2Story: CSV-to-article pipeline with seven AI agents

A Claude Code skill runs seven specialist agents to turn a CSV into a verifiable, interactive news article with an Inspector panel.

The BrieftideDAILY BRIEF

Vibe Coding: AI evaluation for greenfield software engineering

Callum Barbour's arXiv paper tests 'vibe coding' on isolated Python greenfield tasks using a custom evaluation suite.