EMORSION: How audio parameters alter emotion and immersion
A proof-of-concept study manipulated frequency, dynamics and directionality across four film scenes and found subtle audio changes shift.
TL;DR
- 01A proof-of-concept study manipulated frequency, dynamics and directionality across four film scenes and found subtle audio changes shift.
- 02EMORSION, a proof-of-concept study submitted 29 May 2026, examined how film audio design shapes audience emotion and immersion.
- 03The project created multiple alternative mixes for four film scenes and measured viewer responses with questionnaires, heart rate monitoring and video-based motion tracking.
EMORSION, a proof-of-concept study submitted 29 May 2026, examined how film audio design shapes audience emotion and immersion. The project created multiple alternative mixes for four film scenes and measured viewer responses with questionnaires, heart rate monitoring and video-based motion tracking.
What did they change and measure?
The study manipulated three core audio parameters: frequency (pitch), dynamics (loudness), and directionality (spatial placement), across four film scenes. The scenes came from two horror and two drama productions, balanced between mainstream and independent films; for each scene the researchers produced multiple manipulated mixes plus a control mix. Three audience groups viewed the scenes, with each group exposed to one manipulated mix alongside the control mix, and responses were assessed with self-report questionnaires, physiological measures including heart rate monitoring, and video-based motion tracking.
What did they find?
The protocol captured measurable and interpretable differences across audio conditions, showing that even subtle changes in audio design can shape emotional perception and immersion. The authors report that unconventional mixes tended to produce greater variability in audience interpretation, while conventional immersive mixes were associated with stronger cross-audience agreement. The multimodal framework combining self-report, heart rate data and motion tracking proved feasible for detecting those differences across the manipulated conditions.
Who did the work and where was it presented?
The paper is authored by Nelly Garcia, Ruby Crocker, Bleiz M Del Sette, Fabrizio Smeraldi, Charalampos Saitis, George Fazekas and Joshua Reiss, and is listed with arXiv identifier arXiv:2606.18266. The submission notes AES Europe 2026 as a comment, indicating the work was prepared for that venue.
Why it matters
EMORSION demonstrates a practical experimental protocol for quantifying how specific audio decisions affect viewers, linking subjective reports with physiological and motion-based measures. That combination matters because it moves audio design research beyond anecdote and single-measure studies: the paper shows a replicable setup capable of distinguishing mixes that produce uniform audience agreement from mixes that generate divergent interpretations. Filmmakers, sound designers and researchers who need evidence of perceptual impact now have a tested method rather than purely qualitative judgment.
What to watch
The authors recommend larger-scale studies to characterise the role of specific audio parameters in shaping audience experience; the next milestone is scaling the EMORSION protocol beyond four scenes and three audience groups to map which parameters reliably drive particular emotional or immersion outcomes.
References and identifiers
- Title: EMORSION: Examining the Impact of Audio Parameters on Emotional Responses and Immersion in Film
- Authors: Nelly Garcia; Ruby Crocker; Bleiz M Del Sette; Fabrizio Smeraldi; Charalampos Saitis; George Fazekas; Joshua Reiss
- arXiv:2606.18266 [cs.HC], submitted 29 May 2026
- Comment: AES Europe 2026
Output
Manipulating frequency produced measurable differences in emotional perception and immersion captured by questionnaires, heart rate monitoring and video-based motion tracking; unconventional mixes increased variability in audience interpretation while conventional immersive mixes showed stronger cross-audience agreement.
Each scenario summarizes the study's reported outcomes when one core audio parameter was manipulated.
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Multimodal AILLMs: gpt-4o, gpt-4.1-mini and claude-sonnet-4.6 study
Analysis of 21,000 multi-turn conversations finds human-like behaviors vary by model and user and can be modulated by system prompts.
ThinkDeception: Progressive RL framework for multimodal deception
ThinkDeception on arXiv uses MLLMs, a step-by-step multimodal Chain of Thought dataset and a four-tier progressive RL trainer for.
Visual-Seeker: visual-native multimodal search surpasses rivals
Zhengbo Zhang and 12 co-authors submitted Visual-Seeker on 13 Jun 2026.
Gemma 4 12B: unified, encoder-free multimodal model for laptops
Google DeepMind’s 12B model brings encoder-free vision and native audio to laptops, runs on 16GB memory and is released under Apache 2.0.