CreativityNeuro: Steering LLM Weights to Boost Divergent Thinking
CreativityNeuro uses data-free contrastive weight steering to raise DAT scores by up to 14 human percentile points and reduce mode collapse.
TL;DR
- 01CreativityNeuro uses data-free contrastive weight steering to raise DAT scores by up to 14 human percentile points and reduce mode collapse.
- 02CreativityNeuro, a data-free contrastive weight-steering method submitted to arXiv on 1 Jul 2026, improves divergent thinking in large language models and reduces measures of mode collapse.
- 03The paper reports up to a 14 human percentile point gain on the Divergent Association Task and significant gains in human evaluations of creativity across longer-form tasks.
CreativityNeuro, a data-free contrastive weight-steering method submitted to arXiv on 1 Jul 2026, improves divergent thinking in large language models and reduces measures of mode collapse. The paper reports up to a 14 human percentile point gain on the Divergent Association Task and significant gains in human evaluations of creativity across longer-form tasks.
What is CreativityNeuro and how does it work?
CreativityNeuro is a weight-space intervention that steers a model's parameters via contrastive objectives without using behavioral data, re-training, or gradient-based fine-tuning. The method operates in weight space rather than on activations; the authors describe it as a data-free way to alter model behavior by contrastive weight steering, and they contrast it with activation steering in their experiments.
The paper frames divergent thinking as a target behavior and addresses the artificial hivemind effect, where LLMs produce similar responses to open-ended prompts. CreativityNeuro aims to broaden response diversity by shifting the model's weights along contrastive directions designed to increase novelty and variety.
How did it perform on creativity tests?
CreativityNeuro improved performance on multiple creativity assessments: it boosted scores on the Divergent Association Task (DAT) by up to 14 human percentile points and, in a large-scale human evaluation of N=720, delivered significant improvements in originality, surprise, and creativity on the Alternative Uses Test (AUT) and the Task Task. Across all three tasks the method demonstrably reduced measures of mode collapse.
The paper also compares weight-space steering to activation steering. Activation steering achieved comparable performance to CreativityNeuro on the DAT, but it did not transfer to the AUT and Task Task. That contrast demonstrates that the weight-space intervention generalized better to more open-ended and longer-form creative tasks in the authors' evaluations.
How do the authors validate their claims?
The core numeric result given is the DAT improvement: up to 14 human percentile points. The human evaluation component covered the AUT and the Task Task with a total sample size of N=720. The authors report significant improvements in subjective axes: originality, surprise, and creativity. They additionally measure reductions in mode collapse across the DAT, AUT, and Task Task, though the paper does not provide exact numeric values for those mode collapse metrics in the abstract.
The authors present a head-to-head distinction: activation steering matched CreativityNeuro on the vocabulary-space DAT but failed to transfer its gains to the human-evaluated AUT and Task Task, whereas CreativityNeuro transferred across tasks. The method requires no behavioral data, re-training, or gradient-based fine-tuning, which the authors emphasize as a practical advantage.
Why it matters
If weight-space steering can consistently broaden a model's outputs without additional data collection or re-training, it offers a practical intervention for use cases that need more divergent or creative output. The transferable gains from DAT to human-evaluated AUT and Task Task suggest the change is not solely a narrow artifact of a single benchmark. Reducing mode collapse while improving measures like originality and surprise addresses a common critique of LLMs in creative settings: they converge on repetitive or safe answers.
What to watch
The paper was accepted at the ICML 2026 Workshop on Creativity & Generative AI; scrutiny at that venue will be the first public test of replication and detailed methodology. Key follow-ups to watch are full numeric breakdowns of the mode collapse metrics, replication of the N=720 human evaluation in other settings, and whether the contrastive weight-steering approach scales across different model families or sizes.
References: Samuel Schapiro, Core Francisco Park, Felix Sosa, Lav R. Varshney, "CreativityNeuro: Steering Language Model Weights to Improve Divergent Thinking and Reduce Mode Collapse," arXiv:2607.01433, submitted 1 Jul 2026; accepted at ICML 2026 Workshop on Creativity & Generative AI.
| Item | ||||
|---|---|---|---|---|
| CreativityNeuro | Improves DAT by up to 14 human percentile points | Significant improvements in originality, surprise, and creativity (N=720) | Demonstrably reduces measures of mode collapse across all three tasks | |
| Activation steering | Achieves comparable performance to CreativityNeuro on the DAT | Does not transfer to AUT and Task Task | Not reported as reducing mode collapse across all tasks in the abstract |
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Open Source AIOpenAI joins Appia Foundation to build shared AI standards
OpenAI supports evaluation frameworks, safety practices and global cooperation through the Appia Foundation.
Zhipu AI GLM-5.2: 1M-token context, closes gap with Opus 4.8
GLM-5.2 ships under the MIT license with a stable one-million-token context and scores 74.4% on FrontierSWE, one point behind Opus 4.8.
OpenAI: PRC-linked influence operations target US AI debates
OpenAI says PRC-linked campaigns are using AI to push narratives on U.S. tech debates, data centers, tariffs and false ChatGPT claims.
OpenAI: LSEG scales trusted AI, empowers 4,000 staff
LSEG uses OpenAI to scale trusted AI across its global business, accelerating insights, shrinking release cycles and empowering 4.