Loading…
Schedule as of May 16, 2022 - subject to change

Default Time Zone is CEST - Central European Summer Time
You can change your view to your time zone (look for "Timezone" on the right)


LIVESTREAMS : A and B


ON DEMAND VIDEOS (previous days)
 
Type: Perception clear filter
Thursday, May 28
 

10:30am CEST

Effect of an Active Acoustic Reinforcement System on Musical Performance in a Recording Studio
Thursday May 28, 2026 10:30am - 11:00am CEST
This work presents the results of a perceptual study
investigating the influence on musicians of a virtual
acoustics system installed in the live room of a
professional recording studio. The study focused on
analyzing relationships between a selection of objective
acoustic parameters (T30, STLate, LJ); subjective
perceptions of 19 solo
musicians performing under 11 different acoustic
conditions. The experiment was conducted using the VAT
(Virtual Acoustic Technology) system; the VAT Suite
software developed at the Immersive Media Laboratory
(IMLab) in the Sound Recording Department at McGill
University. Correlations between quantitative;
qualitative analyses
show that musicians’ preferences converge on conditions
with T30 ≈ 1 s,; that late; lateral energy increases
the perception of spatiality, providing a positive balance
between clarity; acoustic support. However, longer
reverberation reduces comfort; executive control.
Authors
avatar for Gianluca Grazioli

Gianluca Grazioli

Montreal, Canada, McGill University
avatar for Richard King

Richard King

McGill University, McGill University
Montreal
WW

Wieslaw Woszczyk

McGill University
Thursday May 28, 2026 10:30am - 11:00am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

11:30am CEST

The efficacy of phantom image perception: an active listener perspective.
Thursday May 28, 2026 11:30am - 12:00pm CEST
A “phantom image” is the illusion of an independent sound
source created by two or more loudspeakers. Most often
created by manipulating level differences between
stereophonic channels (aka, “panning”), the effect is used
to create a sense of auditory space between loudspeakers
; is largely taken for granted. In recent years,
surround; immersive audio systems have attempted to
utilize phantom image processing to render audio objects in
desired positions across multiple loudspeaker arrays. This
research examined the efficacy of phantom image perception
horizontally; vertically from an active listener
perspective. After listening to a target loudspeaker,
listeners (n = 442) were asked to move a phantom sound to a
position to match that of the target loudspeaker. The
listener’s phantom placement was then compared to the
target,; subjects were allowed “correct” their phantom
position. The horizontal experiment was based on a
standard stereophonic 60° loudspeaker array with the target
loudspeaker at 15° off center. The vertical experiment
utilized elevated loudspeakers in a 60° arc with the target
loudspeaker elevated 10° above the horizon (lower
loudspeaker). Results show nearly universal “undershoot” in
horizontal placement error on first attempts with gradual
improvement over trials that coalesced around the projected
target location. However, after repeated tries, final
perceptual image locations were spread over 2/3 of the
sound-field around the target loudspeaker. In the vertical
trials perceptual locations were spread across the entire
sound field in all three trials; failed to show any
patterns of coalescence around the target loudspeaker.
Authors
avatar for Song Hui CHON

Song Hui CHON

Associate Professor, Belmont University
Associate Professor of Audio Engineering Technology, interested in the perception and cognition of music and sound, especially timbre and attention. An amateur historical keyboardist. And my first name sounds like "song-he" as in "The song he sang was beautiful."
WB

Wesley Bulla

Belmont University
Thursday May 28, 2026 11:30am - 12:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

A New Reference Target Curve for Studio Headphones
Thursday May 28, 2026 1:30pm - 2:00pm CEST
Target curves for the sound signature of headphones are a
helpful design target during the development process. While
a lot of attention has been made to fi nd target curves that
match the listening preference of consumers, equivalents
for studio headphones date back to the 90’s. In the context
of music production a mutual target or even standard is
essential as to make mixing; mastering more
gear-independent. This becomes even more important since
the main tool for sound engineers shifts from loudspeakers
in professional environments such as acoustically treated
studios to headphones, often additionally equipped with
virtualization algorithms. This enables them to be more fl
exible; to rely less on potentially expensive
loudspeaker setups. The diffuse fi eld target curve that is
currently still the only standardized target curve for
studio headphones is often reported to not match a real
loudspeaker-equivalent of studio environments. In this
paper, we approach to find a new standard target curve for
studio headphones emulating the frequency response of a
loudspeaker setup in modern studio environments.
For this, we give an overview of current target curves;
match them to their equivalent loudspeaker setups.
Based on that we propose a new methodology for a
measurement-based target curve incorporating typical
panning paradigms of music signals based on measurements
inside multiple control rooms. To verify the results, we
conduct listening tests with professionals in multiple
studio environments.
Authors
avatar for Jonas Foerster

Jonas Foerster

Signal Processing Engineer, beyerdynamic GmbH & Co. KG
Passionate about Headphones, Signal Processing and their interaction.

Focus on headphone target curves, spatial audio and ANC
LK

Lukas Keppler

beyerdynamic GmbH & Co. KG
Thursday May 28, 2026 1:30pm - 2:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:00pm CEST

Personalized VR for hearing research with embedded devices
Thursday May 28, 2026 2:00pm - 2:30pm CEST
Deep learning has significantly improved speech enhancement
performance in controlled laboratory conditions, yet these
advances rarely translate into robust real-world benefit
for hearing aid users. Current algorithms are trained;
evaluated in simplified acoustic scenarios, neglecting
multimodal cues, user interaction, environmental dynamics,
; the strict latency; power constraints of embedded
devices. As a result, a persistent gap remains between
algorithmic performance; everyday listening experience.
This position paper reviews recent progress in speech
enhancement, embedded Artificial Intelligence hardware,;
hearing aid systems,; argues for a shift toward
ecologically valid evaluation; hardware-aware design. We
propose virtual reality as a reproducible, multisensory
benchmarking platform enabling joint assessment of human
perception; algorithmic processing. This perspective
outlines a research roadmap toward adaptive, context-aware,
; practically deployable hearing technologies.
Authors
RS

Romain Serizel

LORIA - Laboratoire Lorrain de Recherche en Informatique etnses Applications
SS

Stefania Serafin

Department of Engineering Technology and Didactics,nTechnical University of Denmark
Thursday May 28, 2026 2:00pm - 2:30pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:00pm CEST

The Perception; Measurement of Nonlinear Distortion in Headphones
Thursday May 28, 2026 2:00pm - 2:30pm CEST
Few studies exist on the perception; measurement of
nonlinear distortion in headphones. This paper reports the
detection thresholds; perceived sound quality from real
distortion in headphones. Five different distortion
measurements were made on the headphones to determine how
well they predict audibility; quality. Music samples
were binaurally recorded on six headphones at playback
levels ranging from 85 to +110 dBA at 3 dB increments. The
recordings were reproduced at a normal playback level (83
dBA) through a reference headphone with low distortion. The
headphone recordings were post-processed to remove both
level; frequency response differences so only nonlinear
distortions; residual noise remained. In a second test,
listeners rated the similarity in quality of headphones
relative to an undistorted reference; a hidden version
of it. The results provide evidence audible distortion in
headphones with music occurs at significantly higher
playback levels (104 to 112 dBA SPL) than what is
considered typical; safe. The percentage of measured THD
in the headphone had the highest correlation with the
detection thresholds while the non-coherent distortion with
music best predicted the similarity ratings. We discuss the
results; the practical implications they might have on
future headphone design, testing; measurement.
Authors
avatar for Sean Olive

Sean Olive

Audio Consultant, Sean Olive Audio Consulting
United States
Thursday May 28, 2026 2:00pm - 2:30pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:00pm CEST

Perceptual Model Considering Comodulation Masking Release by Postmasking Adaptation
Thursday May 28, 2026 2:00pm - 2:30pm CEST
This work presents a perceptual model based on a complex
IIR filterbank. The filterbank with a frequency resolution
of 4 bands per Bark consists of 104 filters whose slopes
are designed to take spectral masking effects into account.
The filter outputs are used to obtain masking thresholds
with the following post processing. To obtain resonable
masking thresholds from the spreading outputs, a post
masking stage is required. Here, we propose a comodulation
dependent adaptation of the postmasking decay to model
Comodulation Masking Release (CMR) effects. This approach
explicitely considers the dip-listening effect known from
literature. The final masking thresholds are obtained by
weighting the postmasking outputs by a tonality dependent
gain, controlled using spectral flatness estimation. A
listening test compares the proposed method to an already
known approach using direct CMR based modification of the
masking threshold gains.
Authors
BE

Bernd Edler

International Audio Laboratories Erlangen, Germany
FS

Fabian Schaller

Fraunhofer IIS, Erlangen, Germany
Thursday May 28, 2026 2:00pm - 2:30pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:30pm CEST

EMORSION – Examining the Impact of Audio Features on Emotional Responses; Immersion in Film.
Thursday May 28, 2026 2:30pm - 3:00pm CEST
EMORSION is an exploratory study examining how film audio
design shapes audience emotion; immersion. It was
conducted using scenes from four films in the horror (2)
; drama (2) genres, with two mainstream; two
independent productions. For each scene, multiple
alternative audio mixes were created by systematically
manipulating three core aspects of audio design; frequency
(pitch), dynamics (loudness),; directionality (spatial
placement). Three audience groups were exposed to the
scenes in a cinema setting, with each group experiencing
either one manipulated audio mix; a control mix.
Audience responses were assessed through a multimodal
framework combining self-reported emotion; immersion via
a questionnaire,; physiological measures, including
heart rate monitoring; video-based motion tracking.
Results show that subtle changes in audio design
significantly affect emotional perception; immersion.
Unconventional mixes produced greater variability in
interpretation, while conventional immersive mixes led to
stronger agreement across audiences. Notably, participants
often reported perceived visual changes despite no
alterations to the visual content.
Authors
CS

Charalampos Saitis

Queen Mary University of London
GF

George Fazekas

Queen Mary University of London
avatar for Josh Reiss

Josh Reiss

Professor, Queen Mary University of London
Josh Reiss is Professor of Audio Engineering with the Centre for Digital Music at Queen Mary University of London. He has published more than 200 scientific papers (including over 50 in premier journals and 6 best paper awards) and co-authored two books. His research has been featured... Read More →
avatar for Nelly Garcia

Nelly Garcia

PhD Researcher, Queen Mary University of London
I'm Nelly Garcia.
I'm an engineer in communications and electronics with the specialty in acoustics.
Now, I'm a PhD Researcher at the Centre for Digital Music (C4DM) at Queen Mary University of London.
My main interest is sound design, ways to create sounds from scratch, optimize the workflow of a sound designer and innovative ways to label, categorise or access samples... Read More →
avatar for Ruby Crocker

Ruby Crocker

Queen Mary University of London
Thursday May 28, 2026 2:30pm - 3:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

3:00pm CEST

Deep-Learning-Driven Sensory Profiling of Headphone Target Curves with Adaptive Listening Test Validation
Thursday May 28, 2026 3:00pm - 3:30pm CEST
Identifying robust headphone target curves is challenging
when preference data from untrained listeners are
interpreted without explicit perceptual structure. This
work presents a methodological framework in which deep-
learning-driven sensory-profile analysis serves as the
primary interpretive layer for listening data.
Candidate target curves are generated using an Interactive
Differential Evolution (IDE) listening experiment that
combines paired comparisons with a second- stage
absolute-rating task, enabling continuous exploration of the
perceptually relevant tuning space while reducing cognitive
load. Converged gain sets are analyzed using a Virtual
Listener Panel (VLP), a Deep Learning (DL) model trained on
large-scale expert evaluations to predict perceptual
attributes from rendered musical material. Predicted
attributes are reported as relative scores along key sensory
dimensions, including bass strength, timbral balance,;
brilliance, enabling exploration of sensory clusters,
perceptual trade-offs,; potential families of target
tunings.
Adaptive listening data from three culturally distinct
listener panels (Denmark, Japan,; Colombia; 20
participants
per site) support the DL-based interpretation. Convergence
is quantified as a reduction in population variance,
; cross-site analyses assess the similarity of clustering
structures; the consistency of relationships between
preference; sensory attributes. Overall, the framework
provides a scalable, perceptually grounded approach to
interpreting listener-preference data when developing
headphone target curves.
Authors
avatar for Gabriele Ravizza

Gabriele Ravizza

Perceptual Audio Evaluation Specialist, FORCE Technology
▪  Acoustics, psychoacoustics, product development, and digital communication as an Audio Engineer in the consumer electronics industry.
▪ Currently employed as a specialist at FORCE Technology's SenseLab department, contributing to enhancing sound quality in a wide range of consumer electronics products, collaborating with audio companies from across the globe... Read More →
avatar for Julian Villegas

Julian Villegas

University of Aizu, University of Aizu
Japan
Thursday May 28, 2026 3:00pm - 3:30pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

3:00pm CEST

Emergence; Spatial Directionality of Sa Quintina in the Sacred Vocal Tradition of Castelsardo, Sardinia, Italy: An Early-Stage Sonological–Acoustical Study
Thursday May 28, 2026 3:00pm - 3:30pm CEST
Sa quintina is a distinctive emergent vocal phenomenon
almost exclusively associated with the sacred polyphonic
singing tradition of Castelsardo, perceived as an
autonomous “fifth voice” arising during collective
performance by four male singers. Although widely
acknowledged in ethnomusicological literature, its
formation mechanisms remain only partially explored within
audio engineering; acoustical research.
This paper presents an early-stage, descriptive sonological
case study proposing new hypotheses on the formation;
spatial reinforcement of sa quintina. The phenomenon is
interpreted as a physically grounded, measurable outcome of
harmonic fusion; spatial interference, observable
through spectral energy distribution; coherence. It is
hypothesized to emerge from a converging set of
conditions—including non-tempered harmonic textures,
differentiated vocal emission techniques, intentional
formant tuning,; circular spatial configuration—none of
which is assumed to be strictly sufficient in isolation.
Building upon previous spectral coherence analyses, the
study introduces a Quintina Directionality Index (QDI) to
quantify the spatial dimension of the phenomenon. QDI is
defined as the ratio between spectral energy in two
frequency bands associated with sa quintina (600–750 Hz;
1200–1400 Hz); total spectral energy. The index is
evaluated as a function of direction using ambisonic
recordings in an anechoic chamber; as a function of
microphone position in a controlled field setting.
Preliminary observations suggest that sa quintina
corresponds to localized regions of enhanced spectral
coherence; energy reinforcement, supporting its
interpretation as an emergent physical phenomenon that
precedes; enables its perceptual salience, rather than a
purely auditory illusion.
Authors
FB

Felicita Brusoni

PhD candidate Musikhögskolan i Malmö, Lund University
LF

Luca Frigo

Conservatorio G. Nicolini Piacenza
MS

Martino Sarolli

Conservatorio Paganini Genova
RD

Riccardo Dapelo

Conservatorio Nicolini Piacenza
Thursday May 28, 2026 3:00pm - 3:30pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

3:30pm CEST

NAVIQUAL: Creating Spatial Audio Quality Maps for Virtual Live Music Environments
Thursday May 28, 2026 3:30pm - 4:00pm CEST
Live music environments can be simulated; evaluated
through spatial audio; augmented reality (AR)
technology. However, conducting perceptual studies on AR
environments can be challenging, as multiple design
considerations; uncontrolled variables come into play.
Hence, we developed Naviqual, a tool to create a spatial
audio quality map for a virtual live music environment. We
generated objective quality contour; polar maps to
predict the quality of experience (QoE) across listener
locations; directions respectively. We found that these
maps strongly aligned with perceptual evaluations by
normal-hearing listeners through listening tests. We also
found that binaural objective metrics; signal-to-noise
ratio both strongly predict QoE across listener
translations, with the former outperforming the latter in
predicting QoE across listener directions. Overall,
Naviqual provides a QoE map for virtual live music
environments robust across various listener locations;
directions, noise locations, music content,; room
acoustics.
Authors
CT

Carl Timothy Tolentino

University College Dublin
Thursday May 28, 2026 3:30pm - 4:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

4:00pm CEST

Influences of Nonlinear Distortion in Music Playback on Listeners’ Stress Evaluated by PPI; RMSSD of PPG
Thursday May 28, 2026 4:00pm - 4:30pm CEST
The phenomenon in which listeners’ impressions of music are
unintentionally altered even when the same sound source is
played back remains an important issue. Previous research
has shown that the state; combination of audio equipment
affect the characteristics of nonlinear distortion in music
playback. Hence, we conducted a subjective evaluation of
auditory; musical impressions using sound sources with
various nonlinear distortions. However, the subjective
evaluation was unstable; difficult to assess. The reason
was that the sound change was perceived emotionally as a
slight change in sound image; musicality,; the
interpretation of evaluation terms varies widely among
subjects due to the difficulty of verbalizing the
impression. Therefore, we evaluated the change in
listeners’ stress caused by nonlinear distortion in music
playback using the photoplethysmography (PPG). In this
study, we conducted a follow-up experiment with improved
accuracy.
In the experiment, 41 subjects listened to sound sources
with even-order harmonic distortion at 2.69% THD, odd-order
harmonic distortion at 2.69% THD,; no distortion. The
musical piece of sound sources is an original to eliminate
familiarity; bias toward existing music.
We evaluated changes in subjects’ stress states using the
mean pulse-pulse interval (PPI); the root mean square of
successive differences (RMSSD), computed from the PPG
signal, as indicators of stress.
These results reconfirm that nonlinear distortion in music
playback affects listeners’ vital responses, as evidenced
by significant differences in both mean PPI; RMSSD, as
assessed by Cochran's Q test at the 5% significance level.
Authors
KN

Kenshin Nakada

Tokyo University of Science
SM

Shun Muramatsu

The University of Tokyo
TY

Takahiro Yoshida

Professor, Tokyo University of Science
Thursday May 28, 2026 4:00pm - 4:30pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

4:30pm CEST

Personalized Timbre Optimization for Stereophonic Sound Reproduction via Earphones: Part 2 – Practical Implementation; Validation on Consumer TWS Devices
Thursday May 28, 2026 4:30pm - 5:00pm CEST
This paper presents Part 2 of our study on personalized
timbre optimization for stereophonic sound reproduction via
earphones, following our previous work presented at the AES
International Conference on Headphone Technology in 2025.
While Part 1 established a novel auditory-model-based
framework for reproducing a listener’s natural timbre
reference; demonstrated its perceptual validity under
controlled conditions, the present study focuses on the
practical implementation; validation of this approach
for real-world use with consumer True Wireless Stereo (TWS)
earphones.

Conventional headphone; earphone personalization
techniques primarily target spatial audio reproduction or
rely on preference-based equalization, often overlooking
the accurate reproduction of natural timbre in stereophonic
content. Our approach explicitly addresses this limitation
by isolating; optimizing perceptually relevant timbral
cues while excluding spatial encoding components, thereby
improving timbral fidelity without degrading stereo imaging.

The proposed method originally consists of four stages:
high-resolution anatomical scanning of the listener’s upper
body, including the pinnae, individualized HRTF computation
using the boundary element method, selective removal of
spatial encoding components to derive a personalized
reference target response curve (PR-TRC),; perceptual
optimization using a listener-specific weighting
coefficient grounded in auditory reference fidelity rather
than preference. In this paper, each stage is simplified
; automated using smartphone-based scanning;
AI-assisted processing, enabling end users to complete the
entire personalization process via a smartphone connected
to a cloud-based server. The resulting personalized target
response curve is implemented within the computational;
memory constraints of the DSP pipeline of commercial
consumer TWS earphones.

A subjective evaluation using the Semantic Differential
Method was conducted to assess the perceptual impact of the
simplified implementation. Twenty-four listeners evaluated
personalized target curves generated by both the original
; simplified methods, as well as two non-personalized
target curves commonly used in commercial TWS earphones.
The results show that both personalized methods
consistently outperform non-personalized conditions in
overall sound quality; listener preference. Importantly,
no statistically significant degradation in perceived
timbral naturalness was observed between the simplified;
original methods.

These findings demonstrate that auditory-model-based
personalized timbre optimization can be effectively
translated into a practical, consumer-ready technology. The
proposed approach represents a foundational contribution to
future audio personalization; has broad applicability
across headphone; earphone systems for stereophonic
sound reproduction.
Authors
AH

Atsushi Hara

final Inc.
HH

Haruto Hirai

final Inc.
avatar for Kimio Hamasaki

Kimio Hamasaki

President, Artsridge LLC
Kimio Hamasaki, an AES Fellow, is a producer and balance engineer for music recordings, a researcher in spatial audio, an educator in audio engineering and acoustics, and a consultant in audio engineering. He has recorded and produced numerous orchestral and operatic works with the Vienna Philharmonic... Read More →
MH

Mitsuru Hosoo

final Inc.
NT

Nao Tojo

final Inc.
SS

Shun Saito

final Inc./post-doc

Thursday May 28, 2026 4:30pm - 5:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

4:30pm CEST

From Gaze to Gnosis: A Critical Framework for Embodied Audio Production
Thursday May 28, 2026 4:30pm - 5:00pm CEST
Audio engineering standards often present as objective, yet
they frequently rely on a systemic data bias which Perez
characterises as the 'default male bias' [1]. This paper
examines the hegemony of the male ear, a system of norms
that privileges masculine modes of hearing by prioritizing
technical structure; text over affective experience;
timbre [2]. By transitioning from a visual centric auditory
gaze toward an embodied sonic gnosis, researchers can
recover haptic; physiological ways of knowing sound.
Drawing on the feminist listening praxis of the Female Ear
[3], this work explores the recording studio as an
analytical space where sonic microaggressions [4] enforce
rigid technical standards. The author argues for a new
audio praxis that centers ear pleasures [5], validating
subjective; affective sensory data as legitimate
engineering input. This approach seeks to dismantle the
regulatory fiction [6] of a universal hearing standard,
promoting a pluralistic understanding of musicking [7] that
is inclusive of non normative perspectives.
Authors
avatar for Katie Ambrose

Katie Ambrose

PhD Student, University of York
Katie is a postgraduate researcher at the University of York, working on a th...
Thursday May 28, 2026 4:30pm - 5:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
 
Friday, May 29
 

9:30am CEST

Who Controls the Space? Artistic Intent; Sound Diffusion in Immersive Concert Performance
Friday May 29, 2026 9:30am - 10:00am CEST
Recent advances in large-scale multichannel loudspeaker
systems have enabled immersive concert formats that extend
spatial control beyond conventional stereo; small
multichannel configurations. High-density loudspeaker
arrays (HDLAs) allow sound to be distributed across complex
architectural spaces, challenging established distinctions
between composition, performance,; live sound practice.
In live contexts, however, the realization of spatial
attributes is often constrained by system complexity,
limited rehearsal time,; the lack of artist-facing
spatial control interfaces. As a result, spatial
realization; sound diffusion are frequently delegated to
sound engineers, who translate artistic material to the
acoustic; architectural conditions of the venue in real
time.

This paper examines three immersive concerts presented
during Sonic Days 2025 in Denmark, realized on both
large-scale; small-scale multichannel loudspeaker
systems. The concerts represent contrasting production
contexts, including a site-specific spatial composition
conceived explicitly for a high-density loudspeaker array
; performances by artists whose practices are typically
oriented toward stereo or small multichannel formats.
Across these cases, spatialization functioned variously as
compositional material, interpretive layer,; adaptive
live-mixing practice.

The paper analyzes how control over spatial attributes is
negotiated between artists; sound engineers in live
immersive concert settings,; how this negotiation
affects the interpretation of artistic intent; audience
experience. Particular attention is given to the role of
sound engineers as active mediators whose decisions shape
spatial form, listening perspective,; the relationship
between sound; architecture. The findings suggest that
immersive concert formats redistribute creative agency
across artists, technicians,; technological
infrastructures,; point toward the need for revised
conceptual frameworks for authorship, performance,;
listening in large-scale spatial audio environments.
Authors
avatar for Kasper Fangel Skov

Kasper Fangel Skov

Assistant Professor, PhD, Sonic College (UC SYD)
Friday May 29, 2026 9:30am - 10:00am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:30am CEST

The cognition of sound in museums: Toward a spectrum of meanings
Friday May 29, 2026 10:30am - 11:00am CEST
This presentation develops a conceptual framework for
understanding how visitors cognize sound in museum
exhibitions. While sound increasingly features in museum
practice, research has focused primarily on measuring
visitor enjoyment; engagement rather than examining the
specific meanings sound generates. This gap reflects the
absence of a framework conceptualizing sound's
meaning-making capacities to guide empirical investigation.
Drawing on scholarship from music studies, semiotics,
phenomenology,; embodied cognition, I propose a
seven-component spectrum identifying distinct yet
interrelated meanings that sound can convey in museums:
aesthetic, representational, emotional, sensorial,
imaginative, social,; political. These meanings can be
apprehended independently or in combination, typically
through emergent, pre-conscious perception rather than
deliberate awareness.
The spectrum builds on the premise that museum sound
meaning-making unfolds through dynamics internalized from
early childhood as we attune to the world sonically. It
draws on the notion of sound as a "sonic aggregate"
(Grimshaw; Garner 2015)—encompassing social, contextual,
temporal,; embodied experiences—rather than reducing
sound to wave phenomena. Visitors actively co-produce
meanings by drawing on their moods, memories, knowledge,
; imagination during exhibition encounters.
Each meaning category is illustrated with exhibition case
studies, demonstrating the spectrum's applicability across
diverse sound-based multimodal museum practices—from
popular music exhibitions to sound art installations. The
spectrum aims to catalyze research through varied
methodological approaches; establish analytical
standards for studying sound in museums, with potential
adoption by international standardization bodies.
Authors
avatar for alcina cortez

alcina cortez

Sound Studies Researcher, INET-md | NOVA University lisbon
A PhD in ethnomusicology and museum studies and a curator, I am committed to exploring the diverse meaning-making capabilities of sound when exhibited in museums, encompassing the representational, emotional, sensorial, and social, as well as its ability to foster imagination and... Read More →
Friday May 29, 2026 10:30am - 11:00am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

12:30pm CEST

Perceptual Evaluation of the Open Binaural Renderer
Friday May 29, 2026 12:30pm - 1:00pm CEST
This paper presents the perceptual evaluation of the Open Binaural Renderer (OBR), an open-source librarydeveloped for headphone-based rendering of Immersive Audio Model and Formats (IAMF) content. The evaluationfollowed an iterative framework in which findings from a pilot listening study informed the tuning of renderingprofiles, and the resulting renderer was benchmarked against established proprietary solutions. In the pilot study,19 expert listeners rated the Overall Listening Experience (OLE) of the initial prototype (OBRv1) and five externalrenderers across diverse audio content. Qualitative feedback was analysed using inductive coding to identify salientperceptual dimensions. The pilot revealed content-dependent performance and showed that a single default profilewas inadequate, yielding mixed responses in both the numerical scale and in the qualitative feedback and motivatingthe development of multiple rendering profiles in OBRv2. The main study evaluated two OBRv2 profiles targetingdifferent reverberation characteristics (Direct and Ambient) alongside three top-performing external renderers. Atotal of 39 participants, divided into expert and non-expert groups, rated five perceptual attributes: Voice Quality,Envelopment, Externalisation, Overall Listening Experience, and Timbral Balance. Mixed-design ANOVA revealedsignificant main effects of renderer condition on all attributes. Pairwise comparisons showed that OBRv2,Ambientachieved significantly higher OLE ratings than one proprietary renderer and reached statistical parity with theremaining two, representing a measurable improvement over the prototype. A trade-off between Voice Qualityand Externalisation was observed, driven by the level of reverberation in each renderer. The results demonstratethat iterative, perceptually informed tuning can yield competitive binaural rendering quality in an open-sourceframework.
Authors
FL

Felicia Lim

Google LLC
avatar for Gavin Kearney

Gavin Kearney

Professor of Audio Engineering, University of York
Gavin Kearney graduated from Dublin Institute of Technology in 2002 with an Honors degree in Electronic Engineering and has since obtained MSc and PhD degrees in Audio Signal Processing from Trinity College Dublin. He joined the University of York as Lecturer in Sound Design in January... Read More →
avatar for Jan Skoglund

Jan Skoglund

Google, Google

avatar for Jani Huoponen

Jani Huoponen

Google, Google LLC
With 25+ years of media industry product development, Jani Huoponen is a seasoned expert in developing cutting-edge audio and video technologies for consumer devices and streaming systems. Joining Google in 2010, he’s served as a product manager across key multimedia initiatives... Read More →
avatar for Katarzyna Sochaczewska

Katarzyna Sochaczewska

Immersive Music Producer, Researcher, University of York

TR

Tomasz Rudzki

University of York
Friday May 29, 2026 12:30pm - 1:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

12:30pm CEST

Evaluation of Objective Speech Intelligibility Metrics for Hearing-Aid Users in Multi-Talker Spatial Environments
Friday May 29, 2026 12:30pm - 1:00pm CEST
Despite the growing number of hearing-impaired workers
wearing hearing-aids in occupational settings,
understanding speech in multi-talker situations remains
challenging. This difficulty is particularly pronounced in
open-plan offices, where simultaneous talkers; room
reverberation are prone to degrade speech intelligibility.
While spatial cues are essential for segregating target
speech from competing sources, hearing-aids signal
processing may alter binaural information that supports
spatial hearing.
Accurate evaluation of hearing-aids performance is
therefore crucial. Objective speech intelligibility metrics
offer an efficient alternative to time-consuming listening
tests; however, their validity in complex spatial scenarios
involving hearing-impaired listeners remains unclear.
Monaural metrics such as HASPI account for individual
hearing loss but neglect spatial information, whereas
binaural metrics such as MBSTOI incorporate spatial cues
but are primarily designed for normal-hearing listeners.
This study evaluates the ability of existing objective
metrics to predict speech intelligibility for hearing-aid
users in multi-talker spatial environments. Listening tests
are conducted on 20 hearing-impaired participants fitted
with binaural hearing-aids. Four types of multi-talker
auditory scenes representative of open-plan offices are
reproduced using a loudspeaker array. They involve a target
speech, combined with diffuse noise; a localized
competing speech source. Objective measurements are
performed using an acoustic mannequin fitted with the
participants’ hearing-aids. HASPI; MBSTOI values are
computed from the binaural signals recorded at the eardrums
; incorporating individual hearing losses.
Objective predictions are compared with subjective
intelligibility scores,; an ablation analysis is
conducted to distinguish the effects of hearing loss
modeling from those of binaural processing.
Authors
JA

Jean-Pierre Arz

INRS ( Vandoeuvre lès Nancy) - Institut national denrecherche et de sécurité (Vandoeuvre lès Nancy)
JD

Joël Ducourneau

LEMTA - Laboratoire d'Energétique et Mécanique Théorique etnAppliquée
LD

Louis Delebecque

LEMTA - Laboratoire d'Energétique et Mécanique Théorique etnAppliquée
RS

Romain Serizel

LORIA - Laboratoire Lorrain de Recherche en Informatique etnses Applications
Friday May 29, 2026 12:30pm - 1:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Perception, Lecture

1:00pm CEST

Assessing Situational Awareness of Hearing-Impaired People Through their Perception of Non-Speech Sound Events: a Literature Review
Friday May 29, 2026 1:00pm - 1:30pm CEST
Situational awareness is a multisensory ability that
enables individuals to perceive; appropriately take into
account their immediate environment. This perception of the
world through our senses is carried out continuously;
unconsciously throughout the day. When auditory perception
is degraded, an individual may no longer correctly perceive
a doorbell, a water leak, or an alarm signal, which
negatively affects quality of life; may lead to
dangerous situations. Auditory perception can in particular
be degraded by hearing loss, a common; widespread
condition. The most common treatment consists of wearing
hearing aids, which are mainly designed to improve speech
intelligibility, especially in noisy environments. Feedback
from hearing-impaired people; hearing-aid users
indicates that, although auditory situational awareness has
been recognised as an essential component of well-being, it
remains insufficiently studied; requires further
investigation. There is currently no standard method for
assessing to which extent one's situational awareness is
affected by hearing impairment; the use of hearing aids.
This is a complex process that requires assessing the
perception of relevant sound events within a continuous
stream of multisensorial information, by individuals who
have different subjective preferences. Most existing
methods are limited to evaluating only a subset of the
problem, such as identification; localisation of
non-speech sound events. The rise of new technologies, such
as virtual reality, enables the development of assessment
methods within more realistic yet controlled environments.
This study aims to review existing methods in order to
highlight their limitations in addressing the issue at hand.
Authors
AF

Adil Faiz

Université de Lorraine, CNRS, LEMTA, F-54000 Nancy, France
BM

Balbine Maillou

Université de Lorraine, CNRS, LEMTA, F-54000 Nancy, France

EG

Emma Granier

Université de Lorraine, CNRS, Inria, Loria
JD

Joël Ducourneau

LEMTA - Laboratoire d'Energétique et Mécanique Théorique etnAppliquée
RS

Romain Serizel

LORIA - Laboratoire Lorrain de Recherche en Informatique etnses Applications
Friday May 29, 2026 1:00pm - 1:30pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Perception, Lecture

1:30pm CEST

Transient Evoked Otoacoustic Emissions; Self Reported Sound Exposure
Friday May 29, 2026 1:30pm - 2:00pm CEST
Headphone listening has become an integral part of everyday
life, spanning music consumption, communication, online
media,; increasingly, computer gaming. These diverse
listening contexts make individual sound exposure highly
variable; difficult to quantify. While music listening
; occupational headphone use have been widely studied,
sound exposure from gaming remains comparatively
undocumented. This study investigated the relationship
between self‑reported exposure through headphones;
cochlear function assessed using transient evoked
otoacoustic emissions (TEOAE). Forty‑one university
students completed a detailed questionnaire on listening
habits,; TEOAEs were recorded in both ears across five
half‑octave frequency bands. Estimated weekly exposure
levels were derived from participants’ reported durations
; contexts of use. TEOAE amplitude, signal‑to‑noise ratio
(SNR),; reproducibility showed clear frequency‑dependent
patterns; small ear asymmetries, consistent with typical
OAE behaviour. Only limited associations were found between
self‑reported exposure; TEOAE measures, with significant
effects emerging primarily for SNR; reproducibility in
the highest‑exposure group. No consistent differences were
observed between long‑term gamers; non‑gamers. These
findings suggest that self‑reported exposure alone may be
insufficient to detect subtle cochlear changes in young
adults,; underscore the need for more precise
exposure‑monitoring methods when evaluating recreational
sound exposure risks.
Authors
DH

Dorte Hammershøi

Professor, Acoustics and Hearing, AI and Sound, Department of Electronic Systems, Aalborg University
RO

Rodrigo Ordoñez

Aalborg University
Friday May 29, 2026 1:30pm - 2:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

A Perceptual Evaluation Method for Binaural Rendering Algorithms via Minimum Audible Angle Measurements
Friday May 29, 2026 1:30pm - 2:00pm CEST
Binaural rendering is typically assessed via timbre;
localization accuracy, while its intrinsic spatial
resolution remains rarely quantified. This paper proposes a
perceptual evaluation method based on Minimum Audible Angle
(MAA) measurements to estimate the azimuthal
just-noticeable difference (JND) introduced by binaural
rendering algorithms. We systematically compared several
rendering algorithms across eight reference azimuths using
two participant-allocation paradigms. The results show that
spatial resolution is significantly influenced by Ambisonic
order; choice of the rendering alrorithm, with MAA
thresholds systematically decreasing as the truncation
order increases. Furthermore, the propsed method
successfully captures physiological spatial characteristics
; identifies resolution limits imposed by reference
angles. While both participant-allocation paradigms yield
consistent qualitative trends, the repeated-measures design
provides superior data stability. These findings
demonstrate that the proposed MAA-based method is an
effective tool for quantifying the spatial resolution of
binaural rendering algorithms.
Authors
HZ

Houlin Zhu

Peking University
TQ

Tianshu Qu

Peking University
XW

Xihong Wu

Peking University
YQ

Yufan Qian

Peking University
Friday May 29, 2026 1:30pm - 2:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:30pm CEST

Exploring Rendering Variability in Next-Generation Audio Reproduction
Friday May 29, 2026 2:30pm - 3:00pm CEST
This study evaluates three Next-Generation Audio (NGA)
rendering systems through listening tests using real-life
audio content. The testing paradigm prioritized subjective
preference over adherence to a ground-truth reference.
Participants assessed perceptual spatial audio attributes
in both 5.1; 7.1.4 loudspeaker setups. The findings
suggest that strict adherence to the rendering algorithm
used during content creation is not mandatory in terms of
listener preference. While not advocating disregarding
artistic intent without consideration, this study proposes
that such flexibility in reproduction can be an acceptable
compromise.
Authors
ES

Ema Souza-Blanes

Samsung Research America
avatar for Toni Hirvonen

Toni Hirvonen

Researcher, Samsung Research America
Toni Hirvonen studied acoustics at the Helsinki University of Technology (now Aalto University), where he obtained a PhD in audio signal processing and spatial audio. After a position as a Marie Curie fellow, he has worked internationally in the audio industry since 2010. His projects... Read More →
WJ

Wonbeen Jo

Samsung Research
YK

Yongmin Kwon

Samsung Research

Friday May 29, 2026 2:30pm - 3:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
 
Saturday, May 30
 

9:30am CEST

The artistic role of the sound engineer in immersive spatialisation. Investigation of the influence of space in the emotional interpretation of sounds.
Saturday May 30, 2026 9:30am - 10:00am CEST
Historically, music has developed primarily as a frontal
phenomenon, thus limiting the expressive; perceptual
potential related to sound space. The recent development of
immersive audio systems opens new creative possibilities by
expanding the artistic action space from a narrow frontal
area to a complete sphere around the listener. The
Ambisonic system (Scene-Based Audio), together with
Object-Based formats; hybrid solutions, represents
fertile ground for creative experimentation; the
redefinition of workflows in the field of spatialized sound.
In this new context, what is the role of the sound
engineer, as an electroacoustic interpreter, in immersive
musical artistic creation?
The research is based on a multidisciplinary analysis that
combines an in-depth study of current immersive audio
technologies; their performance, with observations of
existing compositional; production approaches.
Additionally, a comparative study is conducted on the
design choices of the sound engineer as an interpreter,
investigating workflows, emerging musical semantics,
available tools,; the recovery of the historical
repertoire.
Particular attention is paid to the experiment aimed at
investigating a correlation between the position of a sound
; an emotional trigger in the listener.
New directions emerge in the creative role of the sound
engineer, who goes beyond the mere technical aspect to
become an integral part of the compositional;
interpretative process, harmonizing the relationship
between technique; art.
Authors
LF

Luca Frigo

Conservatorio G. Nicolini Piacenza
Saturday May 30, 2026 9:30am - 10:00am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:00am CEST

Melodical Mashup of Classical Pieces: How to Maximize Audience Enjoyment?
Saturday May 30, 2026 10:00am - 10:30am CEST
Mashup is a distinctive form of music composition which
integrates elements from existing songs to create a
cohesive audio experience. The digital music landscape,
with various audio processing tools; sharing platforms,
has facilitated the creation; propagation of mashups by
musicians, remixers, audio engineers,; automated
systems. While most prior research; studies focus on
mashups created by combining elements from individual audio
tracks, typically using pop songs, there exists other types
of mashups; for example, by incorporating phrases from base
melodies into a new arrangement. In this study, we examined
listener enjoyment ratings for this type of mashup,
utilizing well-known Western classical melodies. A
listening test was conducted to assess whether variations
in pitch, tempo,; familiarity with the source material
correlate with enhanced enjoyment. This paper presents our
preliminary findings, with plans for future studies;
additional survey responses to strengthen the results;
uncover insights for crafting more engaging classical
mashups.
Authors
AD

Anh-Dung Dinh

The Hong Kong University of Science and Technology
Saturday May 30, 2026 10:00am - 10:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

11:00am CEST

Optimising Sound Effects to Enhance Dialogue Perception in Audio Mixes Using Selective Auditory Attention
Saturday May 30, 2026 11:00am - 11:30am CEST
Dialogue intelligibility is a fundamental aspect of audio
post-production. Ensuring speech clarity in complex sound
mixes remains challenging across different playback
systems. Selective auditory attention plays a central role
in how listeners track dialogue in busy mixes, so small
changes in spectral or spatial structure can influence
perceived clarity in unexpected ways. This study
investigates the effectiveness of psychoacoustically
informed techniques, equalisation; spatialisation, in
reducing auditory masking; improving the clarity of
dialogue. The listening test was completed on participants’
own playback systems, which reflects typical domestic
viewing conditions; aligns the study with real-world
listening environments. The techniques were tested
individually; in combination to assess their impact.
Results show that equalisation was more effective than
spatialisation in reducing masking, while their combination
produced a significant improvement in intelligibility,
clarity,; reduced interference. The effectiveness of
these methods varied between the two groups of clips,
suggesting that their application should be adapted to the
specific acoustic context of each scene.
Authors
avatar for Federico Aramini

Federico Aramini

Edinburgh Napier University
Dialogue and sound editor with 3+ years' experience and 30+ credits in film across feature film, animation, documentary and TV series.Contributed to award-winning and festival recognised productions, including films screened at the Venice Film Festival and the David di Donatello Awards... Read More →
IM

Iain McGregor

Edinburgh Napier University
RS

Rod Selfridge

Edinburgh Napier University
Saturday May 30, 2026 11:00am - 11:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

11:00am CEST

The Missing Next Step: Sound, Agency,; Plausibility in Virtual Reality — A Narrative Review
Saturday May 30, 2026 11:00am - 11:30am CEST
Sound plays a critical role in virtual reality (VR),
shaping attention, narrative comprehension, emotional
engagement,; experiential plausibility under conditions
of embodiment; user agency. Although a growing body of
research addresses VR audio techniques, perceptual effects,
; sound taxonomies, existing approaches remain fragmented
; largely descriptive. In particular, they do not provide
a unifying, VR-specific account of how sound meaning;
emotional intent are operationally linked to user agency
; non-linear narrative progression. This paper presents a
narrative review of selected literature spanning game audio
frameworks, immersive sound design, narrative theory,;
plausibility-related research in games; VR. Through
synthesis of these perspectives, the review identifies a
conceptual gap in current research, namely the absence of a
VR-specific, agency-coupled sound design framework for
structuring sound meaning; emotional intent in support
of experiential plausibility as users actively shape events
in interactive VR environments.
Authors
avatar for Eve Klein

Eve Klein

Senior Lecturer, Music Technology & Popular Music, The University of Queensland, School of Music
Dr Eve Klein is a lecturer in music technology at the University of Queensland, Australia. She is also an operatic mezzo soprano, a composer, and an Ableton Live Certified Trainer. Eve's research is concentrated on music technology, recording cultures and contemporary music. Her current... Read More →
NH

Neil Hillman

The Audio Suite
NB

Nilufar Baghaei

The University of Queensland, School of ElectricalnEngineering and Computer Science
PK

Peter Kurucz

The University of Queensland, School of ElectricalnEngineering and Computer Science
SS

Stefania Serafin

Department of Engineering Technology and Didactics,nTechnical University of Denmark
Saturday May 30, 2026 11:00am - 11:30am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
 


Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.