Loading…
Schedule as of May 16, 2022 - subject to change

Default Time Zone is CEST - Central European Summer Time
You can change your view to your time zone (look for "Timezone" on the right)


LIVESTREAMS : A and B


ON DEMAND VIDEOS (previous days)
 
Subject: Lecture clear filter
Thursday, May 28
 

9:00am CEST

Deep Learning-Based Lower-Layer Upmixing
Thursday May 28, 2026 9:00am - 9:30am CEST
This paper introduces a novel approach for generating a
lower layer in multichannel audio upmixing, addressing a
gap in existing methods that primarily focus on mid; top
layers. Leveraging Harmonic-Percussive Separation (HPS),
the proposed framework dynamically adjusts key parameters
(separation factor, harmonic attenuation,; phase shift)
to enhance percussive components while diffusing harmonic
elements. We compared three neural network architectures
for this task: LSTM, TCN,; Transformer. Experimental
results show comparable perceptual quality; objective
metrics across all models, with the TCN being the most
balanced; suitable for deployment on edge devices.
Authors
ES

Ema Souza-Blanes

Samsung Research America
LM

Luis Madrid

Samsung Research Tijuana
avatar for Thaddeus Páez

Thaddeus Páez

Research Engineer, Samsung Research Tijuana
Research Engineer at Samsung Mexico.
Thursday May 28, 2026 9:00am - 9:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Design; Optimization of Acoustic Lenses for Audible Frequency
Thursday May 28, 2026 9:00am - 9:30am CEST
Acoustic lenses are structures that enable the focusing of
acoustic waves, with increasing applications in audio
devices like loudspeakers to concentrate energy toward a
listening position. While typically employed at higher
frequencies, achieving effective performance within the
audible frequency range remains a significant challenge due
to long acoustic wavelengths, which necessitate structures
of substantially larger dimensions.
This paper addresses the design of an acoustic lens
dedicated to operation in the audible range. The proposed
lens is composed of periodically arranged acoustic unit
cells, enabling precise control over both the sound
transmission coefficient; the phase delay. A parametric
analysis of a single acoustic unit cell was performed,
followed by global optimization of the complete lens
structure using the Particle Swarm Optimization (PSO)
algorithm. The outcome of the study is an acoustic lens
design with predefined properties that demonstrate the
desired directional characteristics. The findings highlight
the potential of this approach for effectively manipulating
the acoustic wave field; the directivity of sound
sources within the audible frequency range.
Authors
JH

Jadwiga Hyla

AGH University of Krakow
JR

Jarosław Rubacha

AGH University of Krakow
Thursday May 28, 2026 9:00am - 9:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Audio Equipment, Lecture

9:30am CEST

Spectral Optimization for Automatic Multitrack Mixing Using Answer Set Programming
Thursday May 28, 2026 9:30am - 10:00am CEST
The mixing stage in music production involves a complex set
of interdependent technical; creative decisions aimed at
achieving a coherent; industry-level result. Intelligent
Music Production (IMP) is an emerging research area that
integrates Artificial Intelligence techniques into music
creation; post-production processes, spanning from
composition to mastering. Within this context, Answer Set
Programming (ASP), a declarative paradigm from Knowledge
Representation; Reasoning, has proven effective for
modeling; solving complex optimization problems. This
article presents frmixerr, an ASP-based intelligent system
designed to optimize the mixing process by automatically
generating balanced mixes. The system formulates mixing as
a combinatorial optimization problem; evaluates
candidate solutions against a reference spectral profile.
To assess its performance, a subjective listening test was
conducted comparing mixes generated by frmixerr with mixes
produced by human engineers with varying levels of
professional experience. The results indicate no
significant differences in perceived quality between
frmixerr mix; those created by professionals, suggesting
that ASP constitutes a viable approach for intelligent
assistance in music mixing.
Authors
CB

Carlos Benítez

Tec de Monterrey
FE

Flavio Everardo

Tec de Monterrey, University of Potsdam
Thursday May 28, 2026 9:30am - 10:00am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:30am CEST

Mutual coupling investigation of bass horn loaded speakers
Thursday May 28, 2026 9:30am - 10:00am CEST
In today’s live; electronic music events there are some
sound reinforcement systems that are using horn loaded bass
speaker cabinets to provide the low-end section. Especially
for the electronic music applications the PA system is
designed to use one or multiple clusters of bass cabinets
to provide the needed SPL; impact in the low frequency
range. Despite being large; heavy the horn loaded bass
speakers have some advantages like the efficiency;
directivity which makes them a great option for electronic
music. Even more, the enthusiasts are describing them as
having a longer projection of the sound when compared with
bass reflex units. When used in clusters the bass horns
present a mutual coupling due to a larger mouth surface
area; the physics behind. This effect alters the working
parameters in a good way regarding sound reproduction;
is clearly noticed at high levels. This mechanism increases
the output close to the low edge of the frequency response
interval; changes the directivity pattern. A cluster of
four or six double 18” horn loaded bass bins placed in the
front middle of a dance area will provide good impact
described a “punchy” sound, so acclaimed in the electronic
music party scene. In this paper I will describe an
investigation of the mutual coupling between horn cabinets
using electrical; acoustical measurements to reveal the
mentioned above mechanism. Electrical impedance measurement
together with SPL; frequency response in coupled;
uncoupled scenarios are used to describe; demystify the
mutual coupling phenomena.
Authors
avatar for Aurelian Botau

Aurelian Botau

Sound system design engineer, Resound
Sound system design and calibration engineer.
I am running a company providing professional sound systems and DJ equipment rental. Sound system setup design, numerical simulations and technical support are included in the portfolio.
Horn speakers and Vacuum tube amplifiers enthus... Read More →
Thursday May 28, 2026 9:30am - 10:00am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:00am CEST

Experimental study of sound zone methods for indoor/outdoor active noise cancellation
Thursday May 28, 2026 10:00am - 10:30am CEST
The development of personal sound zone systems in recent
years show great potential for low-frequency noise control
outside of noisy spaces. These approaches show promising
applications to manage noise pollution arising from
concerts in large venues or urban festivals. However, most
of the literature considered that the created sound zones
would exist in the same room or acoustic space as the noise
source. This premise hence discards all setups where the
disturbances would occur outside of concert venues (e.g in
neighboring houses). This paper presents a first
experimental study of the behavior of sound zone methods
for indoor sound zones; outdoor noise sources. These
initial results present a good efficiency of these methods
in this edge case, opening new use cases for these
approaches.
Authors
LH

Lucas Hocquette

L-Acoustics
avatar for Yves Pene

Yves Pene

Research Engineer, L-Acoustics
Thursday May 28, 2026 10:00am - 10:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:00am CEST

Beyond Species Identification: Real-Time Spatial Interaction Analysis in Avian Bioacoustics Using Microphone Arrays; Hybrid Beamforming on Edge Architectures
Thursday May 28, 2026 10:00am - 10:30am CEST
Conventional ornithological monitoring systems rely heavily
on single-channel recorders; deep learning classifiers
to identify "what" species is present, but fail to capture
"where" it is located or how individuals interact
spatially. This limitation hinders the study of complex
ecological behaviors, such as inter-specific spacing in
dense vegetation; predator-prey dynamics. We propose a
novel, dual-mode acoustic localization system designed to
unify semantic classification; spatial tracking.
Utilizing an economically scalable 16-channel Uniform
Rectangular Array (UMA-16) interfaced with edge-computing
platforms, we implement a hybrid spatial filtering pipeline
structured to balance real-time latency constraints with
achievable angular resolution. The first stage employs a
computationally efficient, noise-robust linear scanning
technique to generate an acoustic energy map; estimate
source multiplicity. This preliminary data initializes a
second-stage, super-resolution spectral estimation
algorithm predicated on signal-noise subspace
orthogonality, allowing the noise robustness of
non-parametric beamforming methods with the precision of
parametric approaches. By integrating these spatial filters
with standard deep learning classifiers, the system
resolves overlapping vocalizations in "Cocktail Party"
scenarios; improves Signal-to-Noise Ratio (SNR) for
cryptic species detection. We address the physical
"Localization-Detection Range Disparity," demonstrating
that while detection is viable at long ranges, precise
localization is constrained by the array aperture to the
near-to-mid field. The system outputs real-time video
overlays of acoustic heatmaps for field observation;
generates autonomous volumetric territory maps in fixed
deployments, collectively providing ornithologists with a
robust capability for analyzing the spatial ecology of
avian vocalizations.
Authors
avatar for Emre Göktuğ AKTAŞ

Emre Göktuğ AKTAŞ

Istanbul Technical University
MK

Mesut Kartal

Istanbul Technical University
Thursday May 28, 2026 10:00am - 10:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:00am CEST

Comparative Quantitative Analysis of Immersive Mixing Practices: Tracking Spatial Trends in Award-Winning; Popular Streaming Media
Thursday May 28, 2026 10:00am - 10:30am CEST
Since 2021, 7.1.4 musical content has transitioned from a
niche specialty to a mainstream commercial deliverable
within major streaming ecosystems. However, industry
discourse indicates a disparity in how the immersive stage
is utilized across different production tiers. This paper
presents a targeted quantitative study of thirty 7.1.4
tracks (N = 30 total; 15 per category; 2021–2026),
employing a matched-pair sampling strategy driven by the
availability of 'Established Excellence' (Grammy
Award-winning/nominated immersive albums) against
genre-equivalent 'Market Dominance' (top-charting streaming
tracks). The study utilizes a multi-parameter measurement
methodology, including Inter-Channel Cross-Correlation,
hemispheric symmetry; spatial width analysis.
Furthermore, vertical spectral centroid distribution;
channel occupancy (Center; LFE) are analyzed to identify
recurring structural immersive design markers. Preliminary
findings suggest a consistent forward-facing bias; lower
activity in select channels in charting commercial releases
compared to award-recognized counterparts. By documenting
these technical indicators, such as quarter-sphere
correlation; LFE handling differences, this study
establishes a benchmark for current immersive mixing
practices; highlights the technical indicators that may
limit the transition from enhanced stereo to true immersive
envelopment.
Authors
avatar for Can Murtezaoglu

Can Murtezaoglu

Research Assistant, Istanbul Technical University
Immersive audio recording and mixing techniques, audio design for visual media
Thursday May 28, 2026 10:00am - 10:30am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Immersive Audio, Lecture

10:30am CEST

Effect of an Active Acoustic Reinforcement System on Musical Performance in a Recording Studio
Thursday May 28, 2026 10:30am - 11:00am CEST
This work presents the results of a perceptual study
investigating the influence on musicians of a virtual
acoustics system installed in the live room of a
professional recording studio. The study focused on
analyzing relationships between a selection of objective
acoustic parameters (T30, STLate, LJ); subjective
perceptions of 19 solo
musicians performing under 11 different acoustic
conditions. The experiment was conducted using the VAT
(Virtual Acoustic Technology) system; the VAT Suite
software developed at the Immersive Media Laboratory
(IMLab) in the Sound Recording Department at McGill
University. Correlations between quantitative;
qualitative analyses
show that musicians’ preferences converge on conditions
with T30 ≈ 1 s,; that late; lateral energy increases
the perception of spatiality, providing a positive balance
between clarity; acoustic support. However, longer
reverberation reduces comfort; executive control.
Authors
avatar for Gianluca Grazioli

Gianluca Grazioli

Montreal, Canada, McGill University
avatar for Richard King

Richard King

McGill University, McGill University
Montreal
WW

Wieslaw Woszczyk

McGill University
Thursday May 28, 2026 10:30am - 11:00am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:30am CEST

Confidently Wrong: Evaluating AudioSet-Trained Models Under Real-World Deployment
Thursday May 28, 2026 10:30am - 11:00am CEST
Audio event-classification models trained on AudioSet are
widely adopted; form a central component of the state of
the art in machine listening, yet their behavior when
deployed in complex, open acoustic environments remains
largely unexplored. In this study, we evaluate several
widely adopted AudioSet-pretrained
architectures—particularly models from the PANNs family,
including MobileNetV2; Wavegram; Transformer-based
PaSST model—when applied to a real operational scenario at
the commercial Port of Valencia, Spain. We observed a
recurring; systematic unexpected behavior: the models
frequently assigned disproportionately high probability to
the class Music for non-musical industrial;
transportation sounds. These mislabeled events included
train-wheel squealing, motorcycle acceleration, emergency
sirens,; reversing beeps—sound categories that are
common in port logistics environments but acoustically
different from music. By analyzing the probability
distributions output by the models, we demonstrate that
this erroneous Music activation is not an isolated failure
but a pervasive pattern across several architectures. Our
findings highlight a critical gap in the robustness;
domain generalization of AudioSet-derived models;
emphasize the need for targeted adaptation techniques when
deploying them in real industrial settings.
Authors
JN

Javier Naranjo Alcazar

Instituto Tecnologico de Informatica (ITI), Paterna, Spain
JG

Jordi Grau de Haro

Instituto Tecnológico de Informática

MG

Marta Garcia Ballesteros

Instituto Tecnologico de Informatica (ITI), Paterna, Spain
RR

Ruben Ribes Serrano

Instituto Tecnologico de Informatica (ITI), Paterna, Spain
Thursday May 28, 2026 10:30am - 11:00am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:30am CEST

Nonlinear viscoelasticity in loudspeaker suspensions
Thursday May 28, 2026 10:30am - 11:00am CEST
Damping in viscoelastic materials such as rubbers is often
desirable, especially in loudspeaker suspensions. Under
high strain loads however, viscoelastic materials can also
exhibit a hysteretic stiffness behavior, causing a
stiffness decrease with amplitude. In this study, we
examine the viscoelastic rubber suspension of a
loudspeaker, using the loudspeaker motor system as actuator
; sensor. From measurements we observe the hysteretic
force-displacement behavior; pronounced odd-order
harmonic distortion even at low amplitudes, in accordance
with the literature. We further explore a
macro-thermodynamic plastic flow model to model the
stiffness of viscoelastic materials. The results show that
the plastic flow suspension model explains; replicates
the observed nonlinear hysteretic behavior. We also show
that a fitted time-domain loudspeaker model including
plastic flow matches the measured distortion profile. In
contrast, models with polynomial stiffness; viscous
damping fail to explain the observed amplitude dependencies
such as odd order harmonic levels. The experiments
demonstrate that viscoelastic hysteresis occurs not only at
high but also at low amplitudes, where the elastic
stiffness is approximately linear.
Authors
avatar for Finn Agerkvist

Finn Agerkvist

Technical University of Denmark
My interest are loudspeakers (measurements, modelling, (nonlinear) parameter estimation, nonlinear compensation. Active noise control, indoor and outdoor sound field control

MH

Manuel Hahmann

Dynaudio A/S
Thursday May 28, 2026 10:30am - 11:00am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

11:00am CEST

Audio data augmentation techniques for frame drum stroke recognition
Thursday May 28, 2026 11:00am - 11:30am CEST
This work addresses the problem of frame drum (bendir)
stroke technique recognition in simulated real-world
conditions. The traditional frame drum technique includes
three discrete strokes that are used to create rhythmic
patterns, dum, tek; slap. In the presented work, audio
data augmentation is investigated on a dataset containing
recordings of instruments of various construction
attributes. The used techniques are selected in the
direction of generalizing classification in real-world
conditions. Moreover, the mixing of the frame drum samples
with accompanying guitar chords is introduced, simulating
the more complicated problem of hit technique recognition
when playing in a duo. The application of the
aforementioned data augmentation leads to the formation of
different available datasets for training; testing. Two
convolutional neural network architectures (one-;
two-dimensional) are taken into consideration, trained on
waveforms; melscale spectrograms of the different
subsets accordingly.
Authors
AP

Antonis Pagonis

Pagonis Percussion
CD

Charalampos Dimoulas

Aristotle University of Thessaloniki
LV

Labros Vasileiou

Aristotle University of Thessaloniki
avatar for Nikolaos Vryzas

Nikolaos Vryzas

Aristotle University of Thessaloniki
Dr. Nikolaos Vryzas was born in Thessaloniki in 1990. He studied Electrical & Computer Engineering in the Aristotle University of Thessaloniki (AUTh). After graduating, he received his master degrees on Information and Communication Audio Video Technologies for Education & Production from the Interdepartme... Read More →
Thursday May 28, 2026 11:00am - 11:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

11:00am CEST

Input-output linearization of loudspeaker dynamics via automatic differentiation
Thursday May 28, 2026 11:00am - 11:30am CEST
Input-output linearization is a technique for compensating
nonlinear distortion in loudspeakers. To apply it to
complex loudspeaker models, we describe an end-to-end
framework for estimating model parameters from data;
deriving the linearizing control laws using automatic
differentiation. The parameter estimation approach combines
frequency-domain linear parameter estimation with a
time-domain prediction-error method for the nonlinear
parameters. The linearization approach supports non-linear
reference systems; stabilization of the control law
using trajectory tracking. We implement the framework in
dynax, an open-source Python package based on JAX,;
validate it experimentally as a feed-forward controller on
a closed-box loudspeaker. Results demonstrate validation
errors of 1--5\,\% NRMSE; total harmonic distortion
reductions of 6--12\,dB. The framework enables researchers
; engineers to rapidly prototype; validate complex
loudspeaker models for distortion compensation without
manual symbolic derivations.
Authors
avatar for Finn Agerkvist

Finn Agerkvist

Technical University of Denmark
My interest are loudspeakers (measurements, modelling, (nonlinear) parameter estimation, nonlinear compensation. Active noise control, indoor and outdoor sound field control

Thursday May 28, 2026 11:00am - 11:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

11:00am CEST

Comfortability analysis of immersive sound playback system for cabin noise based on frontal lobe fNIRS experiment: an application of 4th order ambisonics
Thursday May 28, 2026 11:00am - 11:30am CEST
This study introduces a fourth-order Ambisonics-based decoding system to reproduce railway cabin running noise in a studio environment, enabling enhanced spatial impression and detailed sound field variation. Real-world operational noise was recorded using a multichannel fourth-order Ambisonics microphone (Eigenmike® EM32, mh acoustics LLC, USA), and the reproduced sound field was implemented through a multichannel loudspeaker system. The reproduced signals were quantitatively compared with the original operational noise in terms of spectral variation and waveform distortion.
Authors
avatar for Yonghee Lee

Yonghee Lee

Research Associate, Changwon National University
Yonghee Lee
Ph D. Mechanical Engineeing.
Ultrasonic, Acoustic, SHM, NDE, fNIRS, and Bio-medical engineering.
Contact: [email protected]
Institute: Changwon National Uniersity, South Korea
Thursday May 28, 2026 11:00am - 11:30am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Immersive Audio, Lecture

11:30am CEST

System-Level Remapping for Electronic Music Spatial Reproduction: A Case Study of the Cross-Venue Reperformance of Symphonic Coding
Thursday May 28, 2026 11:30am - 12:00pm CEST
Taking the premiere and reperformance of the sci-tech symphonic suite Symphonic Coding as a case study, this paper discusses audio system organization, sound diffusion, and cross-venue migration in the co-performance of symphonic and electronic music. Given the challenges of diverse live inputs, real-time control of the electronic music part, concurrent recording and live streaming, and varying acoustic conditions, the article analyzes how a single workflow handles traditional miking, electronic music generation and control, live spatial diffusion, and multi-purpose distribution. The study is structured across four levels: system design requirements, signal organization, dual-venue implementation, and engineering discussion. It illustrates the development of an interconnected workflow comprising Content, Rendering, and Distribution Layers through mixing console organization, immersive rendering, and AoIP distribution. Results indicate that the significance of this work lies not in the reproduction of the listening experience of the entire performance, but in enabling the spatial presentation of the electronic music part to remain valid across different environments based on a consistent reference. Furthermore, the project enhances reperformance capability and production flexibility through the separation of functions, roles, and systems.
Authors
avatar for Chuhan Gao

Chuhan Gao

Communication University of China
XY

Xiuquan Yao

Communication University of China
YZ

Yilong Zhang

Communication University of China
Thursday May 28, 2026 11:30am - 12:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

11:30am CEST

Virtualization-Based Mechanical Loudspeaker Protection Using Nonlinear Wave Digital Modeling
Thursday May 28, 2026 11:30am - 12:00pm CEST
Mechanical overload remains a primary limitation in
high-output loudspeaker operation, particularly at low
frequencies where large coil excursions are required.
Conventional mechanical protection strategies are typically
implemented as signal-domain limiters or filters, which act
indirectly on the loudspeaker’s mechanical state; may
introduce discontinuities, spectral modification, or
unnecessary attenuation.

This paper proposes a methodological framework for
mechanical loudspeaker protection based on the
virtualization of admissible system behavior. The approach
is formulated within a nonlinear wave digital loudspeaker
model; realized using a direct–inverse–direct
architecture. Mechanical protection is embedded directly
into the virtual loudspeaker dynamics by shaping the
nonlinear suspension compliance as a function of voice-coil
displacement. As the excursion approaches a prescribed
admissible limit, the virtual compliance is progressively
reduced using a smooth raised-cosine law, resulting in a
continuous increase of the virtual mechanical stiffness.
Excessive excursion is therefore prevented as a consequence
of the system dynamics, without explicit limiting,
clipping, or signal-domain intervention.

The proposed framework is evaluated through numerical
simulations using steady-state low-frequency sinusoids;
low-frequency sine bursts under free-air loading. Results
are compared against an unprotected loudspeaker; a fixed
high-pass filter configured to meet the same excursion
constraint. The simulations verify that the proposed method
enforces a soft excursion ceiling without discontinuities,
preserves low-frequency output in the near-limit operating
region,; exhibits stable; immediate recovery
following transient excitation. Distortion behavior is
characterized; shown to increase smoothly as a result of
the introduced mechanical nonlinearity.

The results demonstrate that mechanical protection can be
realized as an emergent property of a virtual loudspeaker
model rather than as an external control action. The
proposed approach provides a physically interpretable;
numerically robust foundation for virtualization-based
loudspeaker protection.
Authors
LB

Lucio Bianchi

Elettromedia s.p.a.
Thursday May 28, 2026 11:30am - 12:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Audio Equipment, Lecture

11:30am CEST

The efficacy of phantom image perception: an active listener perspective.
Thursday May 28, 2026 11:30am - 12:00pm CEST
A “phantom image” is the illusion of an independent sound
source created by two or more loudspeakers. Most often
created by manipulating level differences between
stereophonic channels (aka, “panning”), the effect is used
to create a sense of auditory space between loudspeakers
; is largely taken for granted. In recent years,
surround; immersive audio systems have attempted to
utilize phantom image processing to render audio objects in
desired positions across multiple loudspeaker arrays. This
research examined the efficacy of phantom image perception
horizontally; vertically from an active listener
perspective. After listening to a target loudspeaker,
listeners (n = 442) were asked to move a phantom sound to a
position to match that of the target loudspeaker. The
listener’s phantom placement was then compared to the
target,; subjects were allowed “correct” their phantom
position. The horizontal experiment was based on a
standard stereophonic 60° loudspeaker array with the target
loudspeaker at 15° off center. The vertical experiment
utilized elevated loudspeakers in a 60° arc with the target
loudspeaker elevated 10° above the horizon (lower
loudspeaker). Results show nearly universal “undershoot” in
horizontal placement error on first attempts with gradual
improvement over trials that coalesced around the projected
target location. However, after repeated tries, final
perceptual image locations were spread over 2/3 of the
sound-field around the target loudspeaker. In the vertical
trials perceptual locations were spread across the entire
sound field in all three trials; failed to show any
patterns of coalescence around the target loudspeaker.
Authors
avatar for Song Hui CHON

Song Hui CHON

Associate Professor, Belmont University
Associate Professor of Audio Engineering Technology, interested in the perception and cognition of music and sound, especially timbre and attention. An amateur historical keyboardist. And my first name sounds like "song-he" as in "The song he sang was beautiful."
WB

Wesley Bulla

Belmont University
Thursday May 28, 2026 11:30am - 12:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

A New Reference Target Curve for Studio Headphones
Thursday May 28, 2026 1:30pm - 2:00pm CEST
Target curves for the sound signature of headphones are a
helpful design target during the development process. While
a lot of attention has been made to fi nd target curves that
match the listening preference of consumers, equivalents
for studio headphones date back to the 90’s. In the context
of music production a mutual target or even standard is
essential as to make mixing; mastering more
gear-independent. This becomes even more important since
the main tool for sound engineers shifts from loudspeakers
in professional environments such as acoustically treated
studios to headphones, often additionally equipped with
virtualization algorithms. This enables them to be more fl
exible; to rely less on potentially expensive
loudspeaker setups. The diffuse fi eld target curve that is
currently still the only standardized target curve for
studio headphones is often reported to not match a real
loudspeaker-equivalent of studio environments. In this
paper, we approach to find a new standard target curve for
studio headphones emulating the frequency response of a
loudspeaker setup in modern studio environments.
For this, we give an overview of current target curves;
match them to their equivalent loudspeaker setups.
Based on that we propose a new methodology for a
measurement-based target curve incorporating typical
panning paradigms of music signals based on measurements
inside multiple control rooms. To verify the results, we
conduct listening tests with professionals in multiple
studio environments.
Authors
avatar for Jonas Foerster

Jonas Foerster

Signal Processing Engineer, beyerdynamic GmbH & Co. KG
Passionate about Headphones, Signal Processing and their interaction.

Focus on headphone target curves, spatial audio and ANC
LK

Lukas Keppler

beyerdynamic GmbH & Co. KG
Thursday May 28, 2026 1:30pm - 2:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

Joint Neural Translation; Classification of Videos for Audio Processing
Thursday May 28, 2026 1:30pm - 2:00pm CEST
A low-parameter-count machine-learning model for
classifying streaming video can enable content-aware
audio/video processing on consumer edge devices with
latency, computational,; battery constraints. In this
paper, we propose a low-compute classification technique
that uses only text metadata from the streaming file
header, enabling near-instantaneous inference without
decoding; analyzing audio or video signals as is
traditionally done. In particular, to support multilingual
platforms such as YouTube, we first apply neural machine
translation as a pre-processing step for the text metadata
; optimize a lightweight neural classifier for a
three-class audio-centric classification taxonomy (movie,
music, dialog/other). Experiments on a mixed-language
YouTube dataset achieve $\approx$90\% classification
accuracy on a test set using a combined translation; a
classification model (with only $\sim22K$ parameters),
demonstrating a globally-scalable approach for robust
classification on the edge.
Authors
AC

Alejandro Cajica

Samsung Research Mexico
avatar for Sunil Bharitkar

Sunil Bharitkar

Samsung Research America

Thursday May 28, 2026 1:30pm - 2:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:00pm CEST

Personalized VR for hearing research with embedded devices
Thursday May 28, 2026 2:00pm - 2:30pm CEST
Deep learning has significantly improved speech enhancement
performance in controlled laboratory conditions, yet these
advances rarely translate into robust real-world benefit
for hearing aid users. Current algorithms are trained;
evaluated in simplified acoustic scenarios, neglecting
multimodal cues, user interaction, environmental dynamics,
; the strict latency; power constraints of embedded
devices. As a result, a persistent gap remains between
algorithmic performance; everyday listening experience.
This position paper reviews recent progress in speech
enhancement, embedded Artificial Intelligence hardware,;
hearing aid systems,; argues for a shift toward
ecologically valid evaluation; hardware-aware design. We
propose virtual reality as a reproducible, multisensory
benchmarking platform enabling joint assessment of human
perception; algorithmic processing. This perspective
outlines a research roadmap toward adaptive, context-aware,
; practically deployable hearing technologies.
Authors
RS

Romain Serizel

LORIA - Laboratoire Lorrain de Recherche en Informatique etnses Applications
SS

Stefania Serafin

Department of Engineering Technology and Didactics,nTechnical University of Denmark
Thursday May 28, 2026 2:00pm - 2:30pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:00pm CEST

The Perception; Measurement of Nonlinear Distortion in Headphones
Thursday May 28, 2026 2:00pm - 2:30pm CEST
Few studies exist on the perception; measurement of
nonlinear distortion in headphones. This paper reports the
detection thresholds; perceived sound quality from real
distortion in headphones. Five different distortion
measurements were made on the headphones to determine how
well they predict audibility; quality. Music samples
were binaurally recorded on six headphones at playback
levels ranging from 85 to +110 dBA at 3 dB increments. The
recordings were reproduced at a normal playback level (83
dBA) through a reference headphone with low distortion. The
headphone recordings were post-processed to remove both
level; frequency response differences so only nonlinear
distortions; residual noise remained. In a second test,
listeners rated the similarity in quality of headphones
relative to an undistorted reference; a hidden version
of it. The results provide evidence audible distortion in
headphones with music occurs at significantly higher
playback levels (104 to 112 dBA SPL) than what is
considered typical; safe. The percentage of measured THD
in the headphone had the highest correlation with the
detection thresholds while the non-coherent distortion with
music best predicted the similarity ratings. We discuss the
results; the practical implications they might have on
future headphone design, testing; measurement.
Authors
avatar for Sean Olive

Sean Olive

Audio Consultant, Sean Olive Audio Consulting
United States
Thursday May 28, 2026 2:00pm - 2:30pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:00pm CEST

Perceptual Model Considering Comodulation Masking Release by Postmasking Adaptation
Thursday May 28, 2026 2:00pm - 2:30pm CEST
This work presents a perceptual model based on a complex
IIR filterbank. The filterbank with a frequency resolution
of 4 bands per Bark consists of 104 filters whose slopes
are designed to take spectral masking effects into account.
The filter outputs are used to obtain masking thresholds
with the following post processing. To obtain resonable
masking thresholds from the spreading outputs, a post
masking stage is required. Here, we propose a comodulation
dependent adaptation of the postmasking decay to model
Comodulation Masking Release (CMR) effects. This approach
explicitely considers the dip-listening effect known from
literature. The final masking thresholds are obtained by
weighting the postmasking outputs by a tonality dependent
gain, controlled using spectral flatness estimation. A
listening test compares the proposed method to an already
known approach using direct CMR based modification of the
masking threshold gains.
Authors
BE

Bernd Edler

International Audio Laboratories Erlangen, Germany
FS

Fabian Schaller

Fraunhofer IIS, Erlangen, Germany
Thursday May 28, 2026 2:00pm - 2:30pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:30pm CEST

A Recursive Attractor Network for Long-Form Sound Source Localization; Identity Tracking with a Variable Number of Sources
Thursday May 28, 2026 2:30pm - 3:00pm CEST
Sound source localization; identity tracking are
fundamental tasks in acoustic scene analysis, enabling
machines to determine what, where; when produces sound
events. While deep attractor-based networks have
demonstrated improved performance under an unknown number
of sources, maintaining continuous source tracking over
long-form audio remains challenging due to memory
limitations; permutation ambiguities across adjacent
segments. In this paper, we propose a Recursive Attractor
Network (RANet) for long-form sound source localization;
identity tracking with a variable number of sources. RANet
explicitly represents source attractors as transferable
embeddings; recursively propagates them across adjacent
audio segments using a LSTM-based model, thereby preserving
source identity continuity over time. Experimental results
on simulated datasets demonstrate that RANet achieves
robust long-form sound source localization; consistent
source identity tracking, outperforming baseline approaches
under variable; dynamic source conditions.
Authors
JD

Jiaqi Du

Peking University
TQ

Tianshu Qu

Peking University
XW

Xihong Wu

Peking University
Thursday May 28, 2026 2:30pm - 3:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:30pm CEST

Optical MEMS microphones leverage architectural advantages to achieve 80dB SNR
Thursday May 28, 2026 2:30pm - 3:00pm CEST
There are three architectural approaches to
microelectromechanical systems (MEMS) microphones,
miniature devices used in a wide range of products.
Capacitive microelectromechanical systems (MEMS)
microphones are embedded in billions of consumer
electronics. Solder-compatible; providing tight
part-to-part sensitivity matching—all in a small
footprint—capacitive MEMS microphones have demonstrated
improved performance in recent years. State-of-the-art
digital capacitive MEMS microphones can now achieve up to
72dB signal-to-noise ratio (SNR), with a 22dBA noise floor
; overall dynamic range in the order of 106 dB.

However, capacitive MEMS microphone technology has now
reached the limits of its architecture, which constrains
the key audio performance metrics: SNR; acoustic
overload point (AOP).

Piezoelectric MEMS microphones have not demonstrated SNR
performance exceeding 65dB,; require new materials to be
developed to increase their performance.
Optical MEMS microphones—a new architectural approach that
combines a laser optical subsystem, a MEMS; advanced
CMOS circuit design—has exceeded the limits of capacitive
technology. With 80dB SNR supporting a 14 dBA noise floor,
132 dB dynamic range,; a 146dB AOP, optical MEMS
microphones accomplish studio-quality performance in a tiny
form factor that supports semiconductor-level yields in
high-volume manufacturing.

This presentation will explain the architectural
advancements of optical MEMS microphones in comparison to
capacitive MEMS microphones. It will provide example use
cases of high-SNR; high-AOP microphones in high volume
applications.
Authors
Thursday May 28, 2026 2:30pm - 3:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:30pm CEST

EMORSION – Examining the Impact of Audio Features on Emotional Responses; Immersion in Film.
Thursday May 28, 2026 2:30pm - 3:00pm CEST
EMORSION is an exploratory study examining how film audio
design shapes audience emotion; immersion. It was
conducted using scenes from four films in the horror (2)
; drama (2) genres, with two mainstream; two
independent productions. For each scene, multiple
alternative audio mixes were created by systematically
manipulating three core aspects of audio design; frequency
(pitch), dynamics (loudness),; directionality (spatial
placement). Three audience groups were exposed to the
scenes in a cinema setting, with each group experiencing
either one manipulated audio mix; a control mix.
Audience responses were assessed through a multimodal
framework combining self-reported emotion; immersion via
a questionnaire,; physiological measures, including
heart rate monitoring; video-based motion tracking.
Results show that subtle changes in audio design
significantly affect emotional perception; immersion.
Unconventional mixes produced greater variability in
interpretation, while conventional immersive mixes led to
stronger agreement across audiences. Notably, participants
often reported perceived visual changes despite no
alterations to the visual content.
Authors
CS

Charalampos Saitis

Queen Mary University of London
GF

George Fazekas

Queen Mary University of London
avatar for Josh Reiss

Josh Reiss

Professor, Queen Mary University of London
Josh Reiss is Professor of Audio Engineering with the Centre for Digital Music at Queen Mary University of London. He has published more than 200 scientific papers (including over 50 in premier journals and 6 best paper awards) and co-authored two books. His research has been featured... Read More →
avatar for Nelly Garcia

Nelly Garcia

PhD Researcher, Queen Mary University of London
I'm Nelly Garcia.
I'm an engineer in communications and electronics with the specialty in acoustics.
Now, I'm a PhD Researcher at the Centre for Digital Music (C4DM) at Queen Mary University of London.
My main interest is sound design, ways to create sounds from scratch, optimize the workflow of a sound designer and innovative ways to label, categorise or access samples... Read More →
avatar for Ruby Crocker

Ruby Crocker

Queen Mary University of London
Thursday May 28, 2026 2:30pm - 3:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

3:00pm CEST

Sound Absorber Estimation with Deep Neural Network
Thursday May 28, 2026 3:00pm - 3:30pm CEST
Boundary conditions are a critical part of room acoustic
simulations. In the case of ray tracing, absorption
coefficients of nearly all materials are measured;
provided. However, wave-based simulations face several
issues. The first one is the variety of boundary conditions
used. Depending on the method, surface impedance or
admittance might be needed, either in the frequency or in
the time domain, as an angle-dependent or averaged
variable. This limitation hinders the development of a
standard measured quantity for boundary conditions in
wave-based simulations. In turn, this leads to the second
issue encountered, which is the lack of widely available
data to describe the characteristics of the different
materials commonly found in rooms. In this study, a deep
neural network has been trained to estimate the material
properties of porous absorbers from their absorption
coefficient in octave bands. These estimated material
properties can then be used to calculate any boundary
condition needed. This method thus allows to characterize
the boundary conditions for any type of room acoustic
simulation from the most commonly available data. Moreover,
it provides a new tool to identify the sound absorber
corresponding to a desired absorption profile during the
design phase of a project. The training dataset in this
study was generated from finite element method simulations.
The poroelastic properties of the material, the sample
thickness, as well as the depth of the air cavity backing
the material were varied to create the training dataset.
Authors
BM

Boris Mondet

COMSOL A/S
Thursday May 28, 2026 3:00pm - 3:30pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

3:00pm CEST

Deep-Learning-Driven Sensory Profiling of Headphone Target Curves with Adaptive Listening Test Validation
Thursday May 28, 2026 3:00pm - 3:30pm CEST
Identifying robust headphone target curves is challenging
when preference data from untrained listeners are
interpreted without explicit perceptual structure. This
work presents a methodological framework in which deep-
learning-driven sensory-profile analysis serves as the
primary interpretive layer for listening data.
Candidate target curves are generated using an Interactive
Differential Evolution (IDE) listening experiment that
combines paired comparisons with a second- stage
absolute-rating task, enabling continuous exploration of the
perceptually relevant tuning space while reducing cognitive
load. Converged gain sets are analyzed using a Virtual
Listener Panel (VLP), a Deep Learning (DL) model trained on
large-scale expert evaluations to predict perceptual
attributes from rendered musical material. Predicted
attributes are reported as relative scores along key sensory
dimensions, including bass strength, timbral balance,;
brilliance, enabling exploration of sensory clusters,
perceptual trade-offs,; potential families of target
tunings.
Adaptive listening data from three culturally distinct
listener panels (Denmark, Japan,; Colombia; 20
participants
per site) support the DL-based interpretation. Convergence
is quantified as a reduction in population variance,
; cross-site analyses assess the similarity of clustering
structures; the consistency of relationships between
preference; sensory attributes. Overall, the framework
provides a scalable, perceptually grounded approach to
interpreting listener-preference data when developing
headphone target curves.
Authors
avatar for Gabriele Ravizza

Gabriele Ravizza

Perceptual Audio Evaluation Specialist, FORCE Technology
▪  Acoustics, psychoacoustics, product development, and digital communication as an Audio Engineer in the consumer electronics industry.
▪ Currently employed as a specialist at FORCE Technology's SenseLab department, contributing to enhancing sound quality in a wide range of consumer electronics products, collaborating with audio companies from across the globe... Read More →
avatar for Julian Villegas

Julian Villegas

University of Aizu, University of Aizu
Japan
Thursday May 28, 2026 3:00pm - 3:30pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

3:00pm CEST

Emergence; Spatial Directionality of Sa Quintina in the Sacred Vocal Tradition of Castelsardo, Sardinia, Italy: An Early-Stage Sonological–Acoustical Study
Thursday May 28, 2026 3:00pm - 3:30pm CEST
Sa quintina is a distinctive emergent vocal phenomenon
almost exclusively associated with the sacred polyphonic
singing tradition of Castelsardo, perceived as an
autonomous “fifth voice” arising during collective
performance by four male singers. Although widely
acknowledged in ethnomusicological literature, its
formation mechanisms remain only partially explored within
audio engineering; acoustical research.
This paper presents an early-stage, descriptive sonological
case study proposing new hypotheses on the formation;
spatial reinforcement of sa quintina. The phenomenon is
interpreted as a physically grounded, measurable outcome of
harmonic fusion; spatial interference, observable
through spectral energy distribution; coherence. It is
hypothesized to emerge from a converging set of
conditions—including non-tempered harmonic textures,
differentiated vocal emission techniques, intentional
formant tuning,; circular spatial configuration—none of
which is assumed to be strictly sufficient in isolation.
Building upon previous spectral coherence analyses, the
study introduces a Quintina Directionality Index (QDI) to
quantify the spatial dimension of the phenomenon. QDI is
defined as the ratio between spectral energy in two
frequency bands associated with sa quintina (600–750 Hz;
1200–1400 Hz); total spectral energy. The index is
evaluated as a function of direction using ambisonic
recordings in an anechoic chamber; as a function of
microphone position in a controlled field setting.
Preliminary observations suggest that sa quintina
corresponds to localized regions of enhanced spectral
coherence; energy reinforcement, supporting its
interpretation as an emergent physical phenomenon that
precedes; enables its perceptual salience, rather than a
purely auditory illusion.
Authors
FB

Felicita Brusoni

PhD candidate Musikhögskolan i Malmö, Lund University
LF

Luca Frigo

Conservatorio G. Nicolini Piacenza
MS

Martino Sarolli

Conservatorio Paganini Genova
RD

Riccardo Dapelo

Conservatorio Nicolini Piacenza
Thursday May 28, 2026 3:00pm - 3:30pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

3:30pm CEST

Center Extraction GAN
Thursday May 28, 2026 3:30pm - 4:00pm CEST
This paper presents a method for extracting a center signal
from two-channel stereo signals for upmixing;
reproduction with additional center loudspeakers.
It uses a generative adversarial network with a generator
trained with multiple reconstruction losses; adversarial
losses obtained from a discriminator.
The processing is of low computationally complexity, causal
; can be configured for latencies down to one audio frame
of 46 ms length.
It is described how training data are created using only
publicly available signals; how the generation of target
data enables to control the attenuation of diffuse signals
; direct signals panned off-center.
An evaluation with listening test; computational metrics
SI-SDR; F2 measure is presented.
It shows an advantage compared to methods based on
classical signal processing in terms of computational
metrics for source separation; listeners preference.
Authors
AW

Andreas Walther

Fraunhofer IIS

avatar for Christian Uhle

Christian Uhle

Chief Scientist, Fraunhofer Institute for Integrated Circuits IIS
Christian Uhle is chief scientist in the Audio and Media Technologies division of the Fraunhofer IIS, Erlangen, Germany, and in the International Audio Laboratories Erlangen.
He received the Dipl.-Ing. and PhD degrees from the Technical University of Ilmenau, Germany, in 1997 and... Read More →
JK

Julian Klapp

Fraunhofer Institute for Integrated Circuits IIS
PP

Pablo Panter

Fraunhofer Institute for Integrated Circuits IIS
Thursday May 28, 2026 3:30pm - 4:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

3:30pm CEST

Measurement Uncertainty of MEMS Microphone Sensitivity in A Free-Field Condition
Thursday May 28, 2026 3:30pm - 4:00pm CEST
This work presents a measurement uncertainty evaluation of
the free-field sensitivity of a MEMS microphone using a
substitution comparison method. The measurement setup is
based on principles used in secondary microphone
calibration, with sensitivity determined relative to a
calibrated reference microphone. The uncertainty analysis
follows the Guide to the Expression of Uncertainty in
Measurement (GUM), where Type A; Type B uncertainty
evaluations are propagated through a defined measurement
model to obtain the final measurement result. The MEMS
microphone sensitivity is estimated together with an
expanded uncertainty, where the calibration uncertainty of
the reference microphone is identified as the dominant
contributor. Broadband results show that the measured
sensitivity is close to the typical manufacturer
sensitivity over a wide frequency range; follows a
similar frequency trend. The proposed approach enables
reproducible estimation of the free-field sensitivity of
MEMS microphones; provides a clear framework for
uncertainty evaluation.
Authors
SB

Salvador Barrera Figueroa

Danish Fundamental Metrology A/S, 2970 Hørsholm, Denmark
TA

Teguh Aditanoyo

DTU Electrical and Photonics Engineering, TechnicalnUniversity of Denmark (DTU), 2800 Kgs. Lyngby, Denmark
Thursday May 28, 2026 3:30pm - 4:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Audio Equipment, Lecture

3:30pm CEST

NAVIQUAL: Creating Spatial Audio Quality Maps for Virtual Live Music Environments
Thursday May 28, 2026 3:30pm - 4:00pm CEST
Live music environments can be simulated; evaluated
through spatial audio; augmented reality (AR)
technology. However, conducting perceptual studies on AR
environments can be challenging, as multiple design
considerations; uncontrolled variables come into play.
Hence, we developed Naviqual, a tool to create a spatial
audio quality map for a virtual live music environment. We
generated objective quality contour; polar maps to
predict the quality of experience (QoE) across listener
locations; directions respectively. We found that these
maps strongly aligned with perceptual evaluations by
normal-hearing listeners through listening tests. We also
found that binaural objective metrics; signal-to-noise
ratio both strongly predict QoE across listener
translations, with the former outperforming the latter in
predicting QoE across listener directions. Overall,
Naviqual provides a QoE map for virtual live music
environments robust across various listener locations;
directions, noise locations, music content,; room
acoustics.
Authors
CT

Carl Timothy Tolentino

University College Dublin
Thursday May 28, 2026 3:30pm - 4:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

4:00pm CEST

Flow-HOA: Generative Joint Optimization for Ambisonics Encoding via Flow Matching
Thursday May 28, 2026 4:00pm - 4:30pm CEST
Higher-Order Ambisonics (HOA) encoding from sparse,
irregular microphone arrays remains a critical challenge
for consumer spatial audio capture in immersive
communication; XR. We propose Flow-HOA, a generative
framework that jointly optimizes a multi-dimensional
perceptual objective while producing a deployable,
time-invariant bank of Finite Impulse Response (FIR)
encoding filters. Using conditional flow matching, the
model learns to map a simple prior distribution to the
target distribution of FIR filter coefficients. Training is
guided by a composite loss that balances time-domain
waveform fidelity, multi-resolution spectral consistency,
sub-band energy preservation,; spatial directivity
constraints. Objective evaluations demonstrate improved
performance over strong model-based baselines in both
signal fidelity; spatial accuracy metrics. Subjective
listening tests further confirm that Flow-HOA yields higher
overall sound quality with reduced artifacts.
Authors
TQ

Tianshu Qu

Peking University
XL

Xueyang Lv

Xiaomi Communications Co., Ltd
YQ

Yufan Qian

Peking University
avatar for Yuhuan You

Yuhuan You

Master, Peking University
Thursday May 28, 2026 4:00pm - 4:30pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

4:00pm CEST

Accurate Characterization of Integrated Microphone Arrays for Device--Related Transfer Function Synthesis
Thursday May 28, 2026 4:00pm - 4:30pm CEST
This paper presents an improved method for characterizing
integrated microphone arrays for Device‑Related Transfer
Function (DRTF) synthesis. A probe‑array extension of the
IMPro technique is introduced to measure all device
microphones simultaneously, eliminating unknown timing
offsets that arise in asynchronous device–probe recordings.
A custom four‑element probe array; modular test jig were
developed to evaluate relative inter‑channel propagation
delay (RIPD) accuracy across varied microphone‑port
geometries. Hybrid free‑field DRTFs were synthesized by
combining IMPro data with Boundary Element Method (BEM)
acoustic scattering simulations, demonstrating that the
probe‑array measurements capture small delay variations
essential for precise spatial‑audio modeling. The extended
IMPro method offers a practical, scalable alternative to
anechoic‑chamber measurements for modern multi‑microphone
devices.
Authors
avatar for John Cozens

John Cozens

JCoustics
avatar for Matti Hamalainen

Matti Hamalainen

Head of Audio Technologies and Ecosystems, Nokia Technology Standards
Matti S. Hämäläinen is a seasoned expert in audio technologi...
MP

Mikko Pekkarinen

Nokia Technology Standards
Thursday May 28, 2026 4:00pm - 4:30pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

4:00pm CEST

Influences of Nonlinear Distortion in Music Playback on Listeners’ Stress Evaluated by PPI; RMSSD of PPG
Thursday May 28, 2026 4:00pm - 4:30pm CEST
The phenomenon in which listeners’ impressions of music are
unintentionally altered even when the same sound source is
played back remains an important issue. Previous research
has shown that the state; combination of audio equipment
affect the characteristics of nonlinear distortion in music
playback. Hence, we conducted a subjective evaluation of
auditory; musical impressions using sound sources with
various nonlinear distortions. However, the subjective
evaluation was unstable; difficult to assess. The reason
was that the sound change was perceived emotionally as a
slight change in sound image; musicality,; the
interpretation of evaluation terms varies widely among
subjects due to the difficulty of verbalizing the
impression. Therefore, we evaluated the change in
listeners’ stress caused by nonlinear distortion in music
playback using the photoplethysmography (PPG). In this
study, we conducted a follow-up experiment with improved
accuracy.
In the experiment, 41 subjects listened to sound sources
with even-order harmonic distortion at 2.69% THD, odd-order
harmonic distortion at 2.69% THD,; no distortion. The
musical piece of sound sources is an original to eliminate
familiarity; bias toward existing music.
We evaluated changes in subjects’ stress states using the
mean pulse-pulse interval (PPI); the root mean square of
successive differences (RMSSD), computed from the PPG
signal, as indicators of stress.
These results reconfirm that nonlinear distortion in music
playback affects listeners’ vital responses, as evidenced
by significant differences in both mean PPI; RMSSD, as
assessed by Cochran's Q test at the 5% significance level.
Authors
KN

Kenshin Nakada

Tokyo University of Science
SM

Shun Muramatsu

The University of Tokyo
TY

Takahiro Yoshida

Professor, Tokyo University of Science
Thursday May 28, 2026 4:00pm - 4:30pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

4:30pm CEST

Personalized Timbre Optimization for Stereophonic Sound Reproduction via Earphones: Part 2 – Practical Implementation; Validation on Consumer TWS Devices
Thursday May 28, 2026 4:30pm - 5:00pm CEST
This paper presents Part 2 of our study on personalized
timbre optimization for stereophonic sound reproduction via
earphones, following our previous work presented at the AES
International Conference on Headphone Technology in 2025.
While Part 1 established a novel auditory-model-based
framework for reproducing a listener’s natural timbre
reference; demonstrated its perceptual validity under
controlled conditions, the present study focuses on the
practical implementation; validation of this approach
for real-world use with consumer True Wireless Stereo (TWS)
earphones.

Conventional headphone; earphone personalization
techniques primarily target spatial audio reproduction or
rely on preference-based equalization, often overlooking
the accurate reproduction of natural timbre in stereophonic
content. Our approach explicitly addresses this limitation
by isolating; optimizing perceptually relevant timbral
cues while excluding spatial encoding components, thereby
improving timbral fidelity without degrading stereo imaging.

The proposed method originally consists of four stages:
high-resolution anatomical scanning of the listener’s upper
body, including the pinnae, individualized HRTF computation
using the boundary element method, selective removal of
spatial encoding components to derive a personalized
reference target response curve (PR-TRC),; perceptual
optimization using a listener-specific weighting
coefficient grounded in auditory reference fidelity rather
than preference. In this paper, each stage is simplified
; automated using smartphone-based scanning;
AI-assisted processing, enabling end users to complete the
entire personalization process via a smartphone connected
to a cloud-based server. The resulting personalized target
response curve is implemented within the computational;
memory constraints of the DSP pipeline of commercial
consumer TWS earphones.

A subjective evaluation using the Semantic Differential
Method was conducted to assess the perceptual impact of the
simplified implementation. Twenty-four listeners evaluated
personalized target curves generated by both the original
; simplified methods, as well as two non-personalized
target curves commonly used in commercial TWS earphones.
The results show that both personalized methods
consistently outperform non-personalized conditions in
overall sound quality; listener preference. Importantly,
no statistically significant degradation in perceived
timbral naturalness was observed between the simplified;
original methods.

These findings demonstrate that auditory-model-based
personalized timbre optimization can be effectively
translated into a practical, consumer-ready technology. The
proposed approach represents a foundational contribution to
future audio personalization; has broad applicability
across headphone; earphone systems for stereophonic
sound reproduction.
Authors
AH

Atsushi Hara

final Inc.
HH

Haruto Hirai

final Inc.
avatar for Kimio Hamasaki

Kimio Hamasaki

President, Artsridge LLC
Kimio Hamasaki, an AES Fellow, is a producer and balance engineer for music recordings, a researcher in spatial audio, an educator in audio engineering and acoustics, and a consultant in audio engineering. He has recorded and produced numerous orchestral and operatic works with the Vienna Philharmonic... Read More →
MH

Mitsuru Hosoo

final Inc.
NT

Nao Tojo

final Inc.
SS

Shun Saito

final Inc./post-doc

Thursday May 28, 2026 4:30pm - 5:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

4:30pm CEST

A Parametric Dual-Channel Audio Coding via Learned Time-Frequency Masking
Thursday May 28, 2026 4:30pm - 5:00pm CEST
While Neural Audio Codecs (NAC) have revolutionized
monaural audio compression, achieving high-fidelity
dual-channel coding at low bitrates remains a significant
challenge. Existing approaches often rely on naive
independent channel quantization, leading to phase
incoherence, or entangled latent modeling, which sacrifices
spatial precision for spectral energy. This paper proposes
a novel dual-channel coding framework based on
contentspatial disentanglement. Reframing spatial
reconstruction as an informed source separation task, our
architecture synergizes a frozen, pre-trained DAC encoder
for robust mono content preservation with a
parameter-efficient side information encoder that predicts
fine-grained time-frequency masks. To ensure precise
spatial imaging, we introduce explicit physical constraints
into the end-to-end training. Experimental results indicate
that at low bitrates of 9; 11 kbps, the proposed method
outperforms state-of-the-art dual-mono neural baselines;
industry standards in both objective spatial metrics;
subjective MUSHRA evaluations.
Authors
QH

Qingbo Huang

MMLab,ByteDance
TQ

Tianshu Qu

Peking University
YW

Yihan Wang

Peking University
YQ

Yufan Qian

Peking University
Thursday May 28, 2026 4:30pm - 5:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

4:30pm CEST

From Gaze to Gnosis: A Critical Framework for Embodied Audio Production
Thursday May 28, 2026 4:30pm - 5:00pm CEST
Audio engineering standards often present as objective, yet
they frequently rely on a systemic data bias which Perez
characterises as the 'default male bias' [1]. This paper
examines the hegemony of the male ear, a system of norms
that privileges masculine modes of hearing by prioritizing
technical structure; text over affective experience;
timbre [2]. By transitioning from a visual centric auditory
gaze toward an embodied sonic gnosis, researchers can
recover haptic; physiological ways of knowing sound.
Drawing on the feminist listening praxis of the Female Ear
[3], this work explores the recording studio as an
analytical space where sonic microaggressions [4] enforce
rigid technical standards. The author argues for a new
audio praxis that centers ear pleasures [5], validating
subjective; affective sensory data as legitimate
engineering input. This approach seeks to dismantle the
regulatory fiction [6] of a universal hearing standard,
promoting a pluralistic understanding of musicking [7] that
is inclusive of non normative perspectives.
Authors
avatar for Katie Ambrose

Katie Ambrose

PhD Student, University of York
Katie is a postgraduate researcher at the University of York, working on a th...
Thursday May 28, 2026 4:30pm - 5:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
 
Friday, May 29
 

9:00am CEST

A method to synchronize dynamic media stream on heterogenous media playback devices
Friday May 29, 2026 9:00am - 9:30am CEST
Audio synchronization across heterogeneous media playback
devices is essential for delivering immersive sound
experiences in applications such as speaker group play;
multi-room audio playback. Existing synchronization
techniques predominantly rely on tightly coupled network
infrastructures; often embed a media sequence;
timestamp information to the media packet at the
transmitting source end, which restrict flexibility of
selecting the transmitting source; also compromises
robustness under dynamic network conditions. This paper
proposes a network; source independent audio
synchronization framework that eliminates dependency on
embedding media sequence; timestamps. The proposed
system employs an audio fingerprinting-based media
sequencing algorithm amongst the media playback devices
without relying on the type of transmitting source; the
network availability. A novel audio synchronization
algorithm is proposed which first determines a common
sequence start information given a dynamic media stream
from the transmitting source; then communicates the
fingerprint; timestamp amongst the media playback
devices without modifying the original audio packet
structure. Experimental results demonstrate that the
proposed approach achieves a high audio-audio
synchronization of less than 10ms across media playback
devices in a no network environment, thereby extending the
scope of immersive audio application irrespective of the
transmitting source.
Authors
AS

Avinash Singh

Samsung Research Institute, Delhi (SRID)
MS

Mohit Singh

Samsung Research Institute, Delhi (SRID)
avatar for Natasha Meena

Natasha Meena

Samsung Research Institute, Delhi (SRID)
I am working as Software developer in Samsung Research Institute India - Delhi and am responsible for development of features related to Samsung sound device’s
Friday May 29, 2026 9:00am - 9:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Exploring 2D Ambisonics by Amplitudes; Phases
Friday May 29, 2026 9:00am - 9:30am CEST
We present a spectral-like reformulation of 2D ambisonics,
enabling an alternative representation of the sound field
in terms of amplitudes; phases. We hypothesise that it
simplifies the representation; creative manipulation of
2D ambisonics, beyond encoded directional point sources.

In 2D high-order ambisonics (HOA) of order N, a sound field
can be represented as a 2π-periodic angular function as a
combination of circular harmonics (Y_m) weighted by the
coefficients (a_m) with m ∈ [-N, N]. This representation
can be reformulated in terms of N+1 amplitudes; N
phases, similarly to a Fourier decomposition.

A simple example of this representation is the ambisonic
encoder at an angle theta. Phases are then multiples of a
phase phi = theta/2π, as frequencies are multiples of a
fundamental in harmonic sounds. Therefore, the
amplitude-phase approach can draw on the field of sound
synthesis, between harmonic; inharmonic modelling.
Operations on ambisonic vectors in amplitude-phase also
rely on Fourier representation, namely the spectral
convolution of two vectors (element-wise products of the
amplitudes, element-wise sums of the phases). Spectral
convolution has vast potential in ambisonics, allowing to
represent all the usual spatial operations (geometric;
transformative) in a simple manner.

To test this approach, we are currently developing an
ambisonic synthesiser based on Faust functions running in
Max environment. We are evaluating the scope of this
representation, both theoretical; compositional,;
then attempt to expand this approach to 3D ambisonics.
Authors
avatar for Alain Bonardi

Alain Bonardi

Professor in Computer Science and Music Creation, University of Paris 8
Alain Bonardi is a Professor of Computer Science and Music Creation at Paris 8 University, where he is based in the Music Department and is a member of the Musidanse laboratory.
There, he co-directs the CICM (Center for Research in Computer Science and Music Creation) with Anne... Read More →
A

AxelChemla-Romeu-Santos

University of Paris 8
EF

Emma Frid

University of Paris 8
PG

Paul Goutmann

University of Paris 8
Friday May 29, 2026 9:00am - 9:30am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Altering the Immersive Potential: The Case of the Heilung Concert at Roskilde Festival
Friday May 29, 2026 9:00am - 9:30am CEST
Immersive audio systems are increasingly deployed in
large-scale live music contexts, yet there is limited
research addressing how immersive concerts are perceived
; experienced by audiences. This paper presents a
practice-based; ethnographically informed study of the
immersive audio design; audience experience of the band
Heilung’s concert at Roskilde Festival, staged in the Arena
Tent where a large-scale multichannel loudspeaker system
including main, surround,; overhead arrays was used.
The study combines insights in technical system design;
pre-production methods with qualitative audience research
in order to explore how immersive sound alters perception,
embodiment,; social engagement in live concerts.
Pre-production involved scaled system simulations,
reference listening positions, timing strategies,;
power-matched test environments to translate an immersive
studio mix to a festival-scale venue. During; after the
concert, audience experience was investigated through
in-depth interviews, focus group discussions, participant
observation, binaural; ambisonic recordings,;
phenomenologically inspired interview techniques.
Findings indicate that immersive audio contributes to
heightened affective engagement, bodily involvement,; a
sense of envelopment that exceeds conventional stereo
concert experiences. Audience members described the
experience as multisensory, ritualistic,; spatially
ambiguous, often lacking technical vocabulary but
emphasizing embodied; emotional responses. Importantly,
immersion was not perceived as sound alone, but as emerging
from the interaction of sound, visuals, architecture,
social presence,; narrative framing.
The paper argues that understanding immersive concerts
requires the integration of anthropological insights with
audio engineering knowledge. While technical approaches
explain how immersive sound systems operate,
anthropological perspectives are essential for
understanding how such systems are experienced,
interpreted,; given meaning by audiences. The study
contributes to the limited body of research on the effects
of immersive concert formats by examining how audiences
perceive immersion; how they ascribe meaning to
immersive sound.
Authors
avatar for Birgitte Folmann

Birgitte Folmann

Senior Associate Professor, Sonic College
Friday May 29, 2026 9:00am - 9:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:30am CEST

Design; analysis of sound insulation soft-solid metamaterials with periodic inclusions
Friday May 29, 2026 9:30am - 10:00am CEST
One of the many applications of acoustic metamaterials is
the ability to substantially improve acoustic insulation in
the low-frequency range compared to traditional materials.
The objective of this study was to investigate a
vibroacoustic metamaterial consisting of a soft solid plane
with embedded inclusions. The analysed structure consists
of a porous layer with periodic solid elements, which
allows for enhanced insulation properties. A numerical
model considering interactions between the acoustic domain
; a solid was developed using COMSOL Multiphysics. The
influence of selected material; geometric parameters,
such as the shape of the inclusions; their placement, on
the overall effectiveness of the structure was analysed.
Based on the simulation results, a variant of the structure
was selected; used to create a prototype of the
metamaterial. The acoustic insulation of the constructed
structure was then measured in the diffuse field. The next
step is to conduct an optimization using the PSO algorithm
in order to find geometry of the structure that can achieve
the most favourable results in the selected frequency
range. The optimized structure will then be validated by
creating an additional sample; conducting another
measurement.
Authors
AM

Agata Maciuszek

Department of Mechanics and Vibroacoustics, AGH Universitynof Cracow, Poland
KC

Klara Chojnacka

Department of Mechanics and Vibroacoustics, AGH Universitynof Cracow, Poland
Friday May 29, 2026 9:30am - 10:00am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:30am CEST

Generate 4pi SBA reverberation from virtual sound sources detected from x-y-z sound intensities. -Improvement of the source detection method.
Friday May 29, 2026 9:30am - 10:00am CEST
Generating 4pi acoustical atmosphere of a target space is
important for creating an immersive sound content. A
SBA-based reverb is a useful tool for this purpose. We
developed VSVerb, a SBA reverb that generates 4pi
reverberation from the virtual sound sources detected from
three orthogonal x-y-z sound intensities measured at the
target space. A virtual sound source, also known as a
mirror source, is an acoustic concept in geometrical
acoustics. According to this theory, many virtual sound
sources are considered to be located outside the room;
provide reflection sounds inside the room. Since the
spatial information of virtual sound sources is a kind of
fingerprint of a room’s reverberant characteristics,
correctly sampled virtual sound sources enables us to
recreate room's reverberation precisely.
Several methods have been proposed for detecting virtual
sound sources of a room, i.e., dominant reflection sounds
in a room, by using the spatial room impulse responses
(SRIRs). However, these methods have the disadvantage of
failing to detect small virtual sound sources that provide
late reflections, because they detect sources by focusing
on the peak amplitude values in SRIRs. It is difficult to
distinguish if a small peak in the latter part of SRIR
indicates the reflection or noise component. Additionally,
in low-band analysis, side robes of the band pass filter
add many large peaks to the SRIRs,; they make it
difficult to detect true reflection peaks.
To overcome these disadvantages of the conventional
methods, we developed a method that detects virtual sound
sources without using the amplitude characteristics of
SRIRs. We call this method “Speed Detection.” This method
detects virtual sound sources based on the spatial moving
speed of the sound intensity. Instead of measuring SRIRs of
the sound pressure, we measure SRIRs of x-y-z instantaneous
sound intensities. Since we can assume that the reflection
sound comes from a “certain-sized” virtual sound source
over a “certain period,” the sound intensity provided by
the virtual sound source is considered to remain within a
small area; move slowly while the source emits the
reflection sound. We focused on this behavior of sound
intensities; developed the new detection method.
First, we identify the portions of the sound intensity that
move slowly; isolate them as the “Source intensity.”
Then, we calculate the positions, strengths,; phase
characteristics of the virtual sound sources from these
Source intensities of the x, y,; z directions. We
examined Speed Detection method by generating several types
of 4pi reverbs from the virtual sound sources detected
using this method,; verified that it works well in many
cases. However, we have also found that it does not always
work well. We have realized the necessity of improving the
threshold value for classifying sound intensity into the
source intensity or other.
We have used to classify sound intensities into source
intensities; others by referring a threshold value,
vt=40(1000t+10)^1.5 [m/s], where t indicates the arrival
time [s] of the sound intensity. This equation is based on
our practical experience, rather than scientific facts. It
works well in most cases, but some adjustments are required
in very rare cases. To apply the threshold value to various
acoustical conditions of the target spaces, we propose
switching the threshold value from our conventional
equation to an averaged value using a time-varying time
window. To examine the newly proposed threshold value, we
conducted experiments on detecting virtual sound sources of
a simple rectangular room. The results demonstrated the
validity of the new threshold value. We expect this new
threshold value to improve the sound quality of VSVerb;
V2MA as well.
Authors
AO

Akira Omoto

Kyushu University / ONFUTURE Ltd.
avatar for Masataka Nakahara

Masataka Nakahara

President / Senior Managing Director, ONFUTURE Ltd. / SONA Corp.
Masataka Nakahara is an acoustician specializing in studio acoustic design and room acoustics R&D. After studying acoustics at the Kyushu Institute of Design, he began his professional career as an acoustic designer at SONA Corporation. He earned his Ph.D. in acoustics from Kyushu... Read More →
Friday May 29, 2026 9:30am - 10:00am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:30am CEST

Who Controls the Space? Artistic Intent; Sound Diffusion in Immersive Concert Performance
Friday May 29, 2026 9:30am - 10:00am CEST
Recent advances in large-scale multichannel loudspeaker
systems have enabled immersive concert formats that extend
spatial control beyond conventional stereo; small
multichannel configurations. High-density loudspeaker
arrays (HDLAs) allow sound to be distributed across complex
architectural spaces, challenging established distinctions
between composition, performance,; live sound practice.
In live contexts, however, the realization of spatial
attributes is often constrained by system complexity,
limited rehearsal time,; the lack of artist-facing
spatial control interfaces. As a result, spatial
realization; sound diffusion are frequently delegated to
sound engineers, who translate artistic material to the
acoustic; architectural conditions of the venue in real
time.

This paper examines three immersive concerts presented
during Sonic Days 2025 in Denmark, realized on both
large-scale; small-scale multichannel loudspeaker
systems. The concerts represent contrasting production
contexts, including a site-specific spatial composition
conceived explicitly for a high-density loudspeaker array
; performances by artists whose practices are typically
oriented toward stereo or small multichannel formats.
Across these cases, spatialization functioned variously as
compositional material, interpretive layer,; adaptive
live-mixing practice.

The paper analyzes how control over spatial attributes is
negotiated between artists; sound engineers in live
immersive concert settings,; how this negotiation
affects the interpretation of artistic intent; audience
experience. Particular attention is given to the role of
sound engineers as active mediators whose decisions shape
spatial form, listening perspective,; the relationship
between sound; architecture. The findings suggest that
immersive concert formats redistribute creative agency
across artists, technicians,; technological
infrastructures,; point toward the need for revised
conceptual frameworks for authorship, performance,;
listening in large-scale spatial audio environments.
Authors
avatar for Kasper Fangel Skov

Kasper Fangel Skov

Assistant Professor, PhD, Sonic College (UC SYD)
Friday May 29, 2026 9:30am - 10:00am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:00am CEST

Designing Music Spaces in Educational Buildings: Challenges; Considerations
Friday May 29, 2026 10:00am - 10:30am CEST
The acoustic design of music rooms is well supported by
existing guidances which are covering recording spaces,
practice rooms, green rooms,; large-scale performance
environments. However, the direct application of these
standards to high school; college buildings is often
constrained by limitations in budget, space, client
requirements; construction timelines. As a result,
educational music spaces present various design challenges
that require specially considered solutions. This paper
examines key architectural; acoustic issues for music
teaching; performance spaces in high schools, including
wall performance between non-compatible spaces, limited
room volumes,; other acoustic challenges, i.e.
interconnecting doors; windows between the spaces. A
case study of a good design implemented at the large school
project is presented to demonstrate how strategic planning
; interdisciplinary coordination can result in
high-quality, functional,; acoustically successful
learning environments. It is highlighted that the
collaboration between the design team; acoustic
consultants was the key to resolve the major project
challenges to achieve the best possible performance results
across all spaces.
Authors
EP

Elena Prokofieva

Edinburgh Napier University
Friday May 29, 2026 10:00am - 10:30am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:00am CEST

Detecting Bandwidth Variation Artifacts in Perceptual Audio Coding
Friday May 29, 2026 10:00am - 10:30am CEST
Accurate identification of audio coding artifacts is
instrumental in encoder design, audio post-processing,;
perceptual quality assessment. This paper addresses the
detection of artifacts arising from changes in the
effective bandwidth of coded audio signals caused by coarse
spectral quantization. Such bandwidth variations give rise
to two prominent artifact types: bandwidth limitation (BL)
; birdies, also referred to as spectral islands (SI).
Blind detection methods, requiring no reference signal, are
presented for both artifact types. Bandwidth limitation
is detected by analyzing variations in the zero-crossing
count across time-domain subband signals, enabling
estimation of both fixed; time-varying cutoff
frequencies. Spectral islands are identified through
analysis of the spectrogram by detecting clusters of
isolated components in the time–frequency domain,
characterized by their temporal; spectral extents. The
proposed methods are evaluated using audio material from
the ODAQ; USAC verification datasets. Results show that
the BL detection method achieves an average bandwidth
estimation error of approximately 160 Hz; demonstrates
robustness to noisy bandwidth-limited signals. In addition,
the detected birdie artifacts are perceptually validated
through listening tests, indicating an improvement in
perceived quality following detection; subsequent
suppression of the birdie artifacts.
Authors
AN

Andreas Niedermeier

Fraunhofer IIS, Erlangen

BE

Bernd Edler

International Audio Laboratories Erlangen, Germany
DD

Dipanjan Datta Roy

International Audio Labs, Erlangen
avatar for Sascha Dick

Sascha Dick

Fraunhofer IIS, Fraunhofer IIS, Erlangen
Germany
Friday May 29, 2026 10:00am - 10:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Audio Processing, Lecture

10:00am CEST

Predicting Sonic Atmospheres - Expectation; Attunement
Friday May 29, 2026 10:00am - 10:30am CEST
Soundscapes; sonic atmospheres are often approached as
environmental conditions perceived; evaluated through
their acoustic properties; affective qualities. Recent
predictive; inferential accounts of perception, however,
suggest a different understanding: that perception operates
as an anticipatory process in which sensory input is
primarily used to minimise error in an ongoing predictive
model of the world, rather than to construct experience
from the bottom up. From this perspective, auditory
perception is an active, temporally extended process shaped
by expectation, memory, attention,; action.

This paper explores what such a predictive understanding
contributes to the study of everyday sonic atmospheres.
Drawing on predictive processing as a conceptual
framework—while acknowledging its contested status—the
paper situates auditory perception alongside other sensory
modalities as part of a broader inferential engagement with
environments. Classical auditory phenomena; longer-term
perceptual “illusions” motivate this reframing by
highlighting how expectations shape experience across
multiple timescales.
The main analytical focus is the case of transitioning from
one atmosphere to another. Atmospheres are approached here
as multimodal, quasi-objective phenomena that do not reside
in sound, space, or subjects alone, but emerge through
shared, situated engagement. Transitions foreground this
process by exposing how expectations, attentional
strategies,; perceptual norms are recalibrated over
time. From a predictive perspective, atmospheres are
constituted through collective anticipatory activity, in
which agents continuously negotiate environmental cues;
affordances across sensory modalities. Attunement is thus
understood as a temporally extended, socially coordinated
process shaped by prior experience; anticipated action.
By analysing atmospheric transitions through a predictive
lens, the paper argues that sonic atmospheres can be
understood as dynamically constituted; reconfigurable
achievements. This reframing challenges object-centred or
purely subjective accounts of atmospheres; opens new
ways of thinking about how sonic environments are shaped,
staged,; transformed in everyday life.
Authors
avatar for Jonas Kirkegaard

Jonas Kirkegaard

Lecturer & Internship coordinator, UC SYD
BIO: Jonas R. Kirkegaard (1982) is a danish sound artist, composer and sound designer working in the field of interaction design, sound installations, multi channel composition and designing “place specific” atmospheres through sound. Upon replacing nano science with music studies back in 2005, he now... Read More →
Friday May 29, 2026 10:00am - 10:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:30am CEST

Spatial Estimation of Room Acoustic Parameters using Sound Field Reconstruction Methods
Friday May 29, 2026 10:30am - 11:00am CEST
The acoustic characterisation of indoor spaces is crucial
for a wide range of applications. While global metrics
provide convenient descriptors of a room's overall
behaviour, a more spatially detailed analysis offers deeper
insight into the spatio-temporal structure of the sound
field, albeit at a higher experimental cost. This paper
proposes a methodology that leverages the predictive
capabilities of sound field reconstruction methods to
estimate room acoustic parameters as a function of
position. The approach is experimentally evaluated in an
auditorium, where it achieves accurate estimation of
temporal; energetic room acoustic parameters across the
entire audience area. In addition, the reconstructed field
yields higher intelligibility indices compared to the raw
measurements. Overall, these results highlight the
potential of sound field reconstruction techniques as a
practical tool for room acoustic characterisation; for
supporting assistive listening technologies.
Authors
avatar for Antonio Figueroa-Duran

Antonio Figueroa-Duran

Universidad Politécnica de Madrid
EF

Efren Fernandez-Grande

Universidad Politécnica de Madrid
Friday May 29, 2026 10:30am - 11:00am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:30am CEST

Lossless Audio Coding revisited
Friday May 29, 2026 10:30am - 11:00am CEST
MPEG-4 SLS (scalable lossless coding) was published more
than 20 years ago. In the meantime several tools to improve
coding efficiency; flexibilities have been invented.
Currently, in MPEG WG6 (audio coding) there are two
standardization activities on lossless audio coding: Audio
Coding for Machines (ACoM); Biomedical; general
waveform signal coding (BWC).
ACoM phase 1 originally was targeted only towards lossless
storage formats for training of machine listening schemes,
but additional uses cases like “user generated content
analysis”, “live stream content analysis”,; “artistic
creation” have been added. The focus was extended to the
transmission of audio data from microphone (arrays) to
central processing units.
BWC is a joint activity with TU-R SG21. While ACoM started
with a large number of use cases; includes the
specification of a rich set of metadata BWC started with a
focus on medical data like electroencephalogram (EEG);
electrocardiogram (ECG). However, BWC can be used for audio
signals, too; medical data coding are on the list of use
cases for ACoM.
The call for proposals (CfP) for ACoM was completed in
January 2025. Two proposals, both outperforming MPEG-4 SLS,
had been submitted. Both proposals reused; optimized
core codecs from BWC. Currently, MPEG audio investigates
how the ACoM proposals can be merged into BWC. This merge
process must be completed end of April 2026.
The presentation will give details about ACoM use cases,
the ACoM CfP process, the results of the CfP; results
from the merge process.
Authors
avatar for Thomas Sporer

Thomas Sporer

Deputy Director IDMT / Convenor MPEG audio, Fraunhofer IDMT
Friday May 29, 2026 10:30am - 11:00am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:30am CEST

The cognition of sound in museums: Toward a spectrum of meanings
Friday May 29, 2026 10:30am - 11:00am CEST
This presentation develops a conceptual framework for
understanding how visitors cognize sound in museum
exhibitions. While sound increasingly features in museum
practice, research has focused primarily on measuring
visitor enjoyment; engagement rather than examining the
specific meanings sound generates. This gap reflects the
absence of a framework conceptualizing sound's
meaning-making capacities to guide empirical investigation.
Drawing on scholarship from music studies, semiotics,
phenomenology,; embodied cognition, I propose a
seven-component spectrum identifying distinct yet
interrelated meanings that sound can convey in museums:
aesthetic, representational, emotional, sensorial,
imaginative, social,; political. These meanings can be
apprehended independently or in combination, typically
through emergent, pre-conscious perception rather than
deliberate awareness.
The spectrum builds on the premise that museum sound
meaning-making unfolds through dynamics internalized from
early childhood as we attune to the world sonically. It
draws on the notion of sound as a "sonic aggregate"
(Grimshaw; Garner 2015)—encompassing social, contextual,
temporal,; embodied experiences—rather than reducing
sound to wave phenomena. Visitors actively co-produce
meanings by drawing on their moods, memories, knowledge,
; imagination during exhibition encounters.
Each meaning category is illustrated with exhibition case
studies, demonstrating the spectrum's applicability across
diverse sound-based multimodal museum practices—from
popular music exhibitions to sound art installations. The
spectrum aims to catalyze research through varied
methodological approaches; establish analytical
standards for studying sound in museums, with potential
adoption by international standardization bodies.
Authors
avatar for alcina cortez

alcina cortez

Sound Studies Researcher, INET-md | NOVA University lisbon
A PhD in ethnomusicology and museum studies and a curator, I am committed to exploring the diverse meaning-making capabilities of sound when exhibited in museums, encompassing the representational, emotional, sensorial, and social, as well as its ability to foster imagination and... Read More →
Friday May 29, 2026 10:30am - 11:00am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

11:00am CEST

Acoustic and Perceptual Consequences of Time Misalignments in Line Array Speakers
Friday May 29, 2026 11:00am - 11:30am CEST
Variable‑curvature line arrays achieve their intended directivity and spectral balance through phase‑coherent summation across cabinets. Even small timing disparities between elements perturb the interference patterns that shape the array response, with consequences for both spatial coverage and timbre. In this work we quantify these effects end‑to‑end. Using simulations for a typical 12‑element array, we examine how inter‑element delays modify the frequency response across an audience area. We then apply an auditory coloration model to predict the perceived impact of those modifications and validate the predictions through controlled listening tests. We observe that delays of a few dozen microseconds generate pronounced spectral coloration that listeners consistently judge as degraded quality, whereas coloration becomes detectable at delays on the order of one microsecond. These results translate into synchronization accuracy targets for high‑fidelity line‑array deployments.
Authors
avatar for Nicolas Epain

Nicolas Epain

Application Research Engineer, L-Acoustics
Friday May 29, 2026 11:00am - 11:30am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

11:00am CEST

Experimental; Numerical Design of Vibroacoustic Metamaterials for Guitar Soundboard Resonance Control
Friday May 29, 2026 11:00am - 11:30am CEST
Metamaterials are engineered structures whose acoustic;
mechanical behavior arise from their geometric
configuration; internal architecture rather than their
material properties. Within this group, vibroacoustic
metamaterials offer the ability to influence elastic wave
propagation by introducing frequency bands in which
flexural vibrations are either suppressed or selectively
altered. The integration of such structures into musical
instruments, particularly acoustic guitars, provides a
promising approach to shaping their vibroacoustic response
; mitigating undesirable structural resonances.
The objective of this project is to design a vibroacoustic
metamaterial capable of modifying the resonance properties
of an acoustic guitar soundboard. For this purpose,
vibration measurements with Laser Doppler Vibrometer were
conducted to identify the fundamental resonant modes of the
soundboard. Based on these measurements, a coupled
structural-acoustic numerical model was developed using
COMSOL Multiphysics; subsequently calibrated with the
experimental data. In the following phase, various
vibroacoustic metamaterial configurations were designed,
; their influence on the resonance characteristics of the
soundboard was investigated. The most effective resonator
design was fabricated using 3D printing; its performance
was experimentally evaluated.
The anticipated outcome of this research is the development
of an effective method for tailoring; enhancing the
tonal response of an acoustic guitar without modifying its
conventional construction, thereby contributing to new
design strategies for stringed musical instruments.
Authors
AS

Aleksandra Sawczuk

AGH University of Krakow
KC

Klara Chojnacka

Department of Mechanics and Vibroacoustics, AGH Universitynof Cracow, Poland
Friday May 29, 2026 11:00am - 11:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

11:00am CEST

Technologies of Everyday Life: “Yupiter”; the Formation of a Personal Acoustic Environment in the Ukrainian SSR
Friday May 29, 2026 11:00am - 11:30am CEST
This article examines open-reel tape recorders marketed
under the “Yupiter” brand as a key technology of everyday
life in late Soviet Ukraine; as a material foundation
for the formation of a personal acoustic environment in the
Ukrainian SSR. The study aims to reconstruct the
“biography” of the device, including its design, serial
production ramp-up, distribution,; use. It shows how the
institutional constraints of a planned economy;
defense-sector priorities were translated into domestic
regimes of listening; recording. Methodologically, the
article combines approaches from sound studies, the history
of technology,; the history of everyday life,
supplemented by concepts of the “domestication” of
technology, DIY culture,; “phonographic labor.” The
source base includes internal documents of the Kyiv
“Kommunist” plant (annual reports, explanatory memoranda,
plans,; quality-related materials for 1968-1976),
interdepartmental reviews; programmatic materials of the
sector, technical handbooks; instructions, as well as
oral interviews with users. Bringing together the “upper”
level of managerial reporting; the “lower” level of user
experience makes it possible to identify a gap between
quality as a planning category; quality as a daily
practice: repairability, shortages of parts; tape,
re-recording,; selective choice of media were more the
norm than the exception. The article demonstrates that the
“fine-tuning” of tape recorders became institutionalized
through networks of amateur knowledge; informal service,
while fluctuating availability (shortage; overstock)
shaped the social geography of purchase. Ultimately,
“Yupiter” emerges not as a symbol of progress or nostalgia,
but as a material trace of late-socialist modernization -
one that helps integrate the Ukrainian case into
international debates on media materiality, listening,;
the politics of audibility. Particular attention is paid to
the temporality of the object: the extension of “Yupiter’s”
normative life cycle through repair; re-recording, as
well as its “outliving” of the Soviet system in the 1990s.
This makes it possible to interpret the tape recorder as a
carrier of acoustic memory; an indicator of social
hierarchies of access to technology. The findings refine
the understanding of shortage not as mere lack, but as an
everyday regime in the life of things.
Authors
avatar for Rostyslav Konta

Rostyslav Konta

Professor, Taras Shevchenko National University of Kyiv
Professor at Taras Shevchenko National University of Kyiv (Ukraine) working in the fields of cultural anthropology, ethnology, history of science and technology, and sound studies. My research focuses on everyday life, music, media, and technology in Eastern Europe, especially in... Read More →
Friday May 29, 2026 11:00am - 11:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

11:30am CEST

Qualifying Timing Errors in Audio-over-Ethernet Networks for Live Sound
Friday May 29, 2026 11:30am - 12:00pm CEST
Audio-over-Ethernet (AoE) protocols have become fundamental
in modern live sound reinforcement systems, yet their
real-world synchronization behavior under diverse stress
conditions, both in terms of load; configuration, is not
accurately characterized. Microsecond-scale timing
mismatches between amplifier outputs can disrupt line-array
interference patterns, reducing directivity control;
spectral consistency. Ensuring robust timing accuracy
across large, mixed-traffic network topologies is therefore
critical for predictable system performance.
This paper presents a comprehensive, application-oriented
evaluation of Dante, AES67; Milan-AVB. A representative
multi-hop architecture typical of touring deployments has
been considered. A controlled measurement campaign,
combining eight daisy-chained switches, heavy concurrent
data traffic approaching link saturation,; sub-sampled
latency tracking, assesses each protocol under ideal
conditions, typical field situations,; common
misconfigurations.
Results reveal clear performance distinctions. Dante
exhibits substantial timing variations, exceeding
100~$\mu$s under load. AES67 provides tighter
synchronization but remains vulnerable to configuration
errors, which can induce latency drift or even audio packet
loss. Milan-AVB consistently maintains sub-microsecond
accuracy across all scenarios.
Authors
BD

Benjamin Duval

L Acoustics
avatar for Genio Kronauer

Genio Kronauer

Executive Director of Electronics & Networks Technologies, L Acoustics
Executive Director of Electronics & Networks Technologies
avatar for Nicolas Epain

Nicolas Epain

Application Research Engineer, L-Acoustics
Friday May 29, 2026 11:30am - 12:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

12:30pm CEST

Spatial Quality Measure for Mixed-phase Impulse Response Equalization
Friday May 29, 2026 12:30pm - 1:00pm CEST
Mixed-phase impulse response equalization can improve
magnitude; phase response, but conventional objectives
such as mean-squared error (MSE) can favor solutions that
introduce objectionable temporal artifacts, including
pre-echo; extended post-echo ringing. This paper
proposes a Spatial Equalization Quality Measure (SEQM) to
select a mixed-phase equalization filter that better
controls these artifacts while remaining computationally
simple; applicable across multiple listening positions.
SEQM combines (i) a temporal-domain metric that penalizes
energy preceding the main pulse of an impulse response;
energy persisting after it, while also accounting for the
decay rate of the post-response tail, with (ii) a spatial
aggregation rule that summarizes quality across measurement
positions. We use SEQM to select the modeling delay for
mixed-phase finite-impulse-response (FIR) equalization;
to compare mixed-phase FIR designs with minimum-phase FIR
; IIR alternatives under a common multi-position
measurement framework. Experiments using semi-anechoic
measurements across 34 spatial positions for two
loudspeakers show that SEQM consistently selects
substantially shorter delays than MSE-based selection;
yields impulse responses with reduced pre-echo; faster
post-response decay, while maintaining comparable
frequency-response equalization. These results suggest that
SEQM is a practical objective tool for designing
multi-position mixed-phase equalization filters.
Authors
BD

Bill Decanio

Samsung Electronics
avatar for Sunil Bharitkar

Sunil Bharitkar

Samsung Research America

Friday May 29, 2026 12:30pm - 1:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Audio Processing, Lecture

12:30pm CEST

Perceptual Evaluation of the Open Binaural Renderer
Friday May 29, 2026 12:30pm - 1:00pm CEST
This paper presents the perceptual evaluation of the Open Binaural Renderer (OBR), an open-source librarydeveloped for headphone-based rendering of Immersive Audio Model and Formats (IAMF) content. The evaluationfollowed an iterative framework in which findings from a pilot listening study informed the tuning of renderingprofiles, and the resulting renderer was benchmarked against established proprietary solutions. In the pilot study,19 expert listeners rated the Overall Listening Experience (OLE) of the initial prototype (OBRv1) and five externalrenderers across diverse audio content. Qualitative feedback was analysed using inductive coding to identify salientperceptual dimensions. The pilot revealed content-dependent performance and showed that a single default profilewas inadequate, yielding mixed responses in both the numerical scale and in the qualitative feedback and motivatingthe development of multiple rendering profiles in OBRv2. The main study evaluated two OBRv2 profiles targetingdifferent reverberation characteristics (Direct and Ambient) alongside three top-performing external renderers. Atotal of 39 participants, divided into expert and non-expert groups, rated five perceptual attributes: Voice Quality,Envelopment, Externalisation, Overall Listening Experience, and Timbral Balance. Mixed-design ANOVA revealedsignificant main effects of renderer condition on all attributes. Pairwise comparisons showed that OBRv2,Ambientachieved significantly higher OLE ratings than one proprietary renderer and reached statistical parity with theremaining two, representing a measurable improvement over the prototype. A trade-off between Voice Qualityand Externalisation was observed, driven by the level of reverberation in each renderer. The results demonstratethat iterative, perceptually informed tuning can yield competitive binaural rendering quality in an open-sourceframework.
Authors
FL

Felicia Lim

Google LLC
avatar for Gavin Kearney

Gavin Kearney

Professor of Audio Engineering, University of York
Gavin Kearney graduated from Dublin Institute of Technology in 2002 with an Honors degree in Electronic Engineering and has since obtained MSc and PhD degrees in Audio Signal Processing from Trinity College Dublin. He joined the University of York as Lecturer in Sound Design in January... Read More →
avatar for Jan Skoglund

Jan Skoglund

Google, Google

avatar for Jani Huoponen

Jani Huoponen

Google, Google LLC
With 25+ years of media industry product development, Jani Huoponen is a seasoned expert in developing cutting-edge audio and video technologies for consumer devices and streaming systems. Joining Google in 2010, he’s served as a product manager across key multimedia initiatives... Read More →
avatar for Katarzyna Sochaczewska

Katarzyna Sochaczewska

Immersive Music Producer, Researcher, University of York

TR

Tomasz Rudzki

University of York
Friday May 29, 2026 12:30pm - 1:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

12:30pm CEST

Evaluation of Objective Speech Intelligibility Metrics for Hearing-Aid Users in Multi-Talker Spatial Environments
Friday May 29, 2026 12:30pm - 1:00pm CEST
Despite the growing number of hearing-impaired workers
wearing hearing-aids in occupational settings,
understanding speech in multi-talker situations remains
challenging. This difficulty is particularly pronounced in
open-plan offices, where simultaneous talkers; room
reverberation are prone to degrade speech intelligibility.
While spatial cues are essential for segregating target
speech from competing sources, hearing-aids signal
processing may alter binaural information that supports
spatial hearing.
Accurate evaluation of hearing-aids performance is
therefore crucial. Objective speech intelligibility metrics
offer an efficient alternative to time-consuming listening
tests; however, their validity in complex spatial scenarios
involving hearing-impaired listeners remains unclear.
Monaural metrics such as HASPI account for individual
hearing loss but neglect spatial information, whereas
binaural metrics such as MBSTOI incorporate spatial cues
but are primarily designed for normal-hearing listeners.
This study evaluates the ability of existing objective
metrics to predict speech intelligibility for hearing-aid
users in multi-talker spatial environments. Listening tests
are conducted on 20 hearing-impaired participants fitted
with binaural hearing-aids. Four types of multi-talker
auditory scenes representative of open-plan offices are
reproduced using a loudspeaker array. They involve a target
speech, combined with diffuse noise; a localized
competing speech source. Objective measurements are
performed using an acoustic mannequin fitted with the
participants’ hearing-aids. HASPI; MBSTOI values are
computed from the binaural signals recorded at the eardrums
; incorporating individual hearing losses.
Objective predictions are compared with subjective
intelligibility scores,; an ablation analysis is
conducted to distinguish the effects of hearing loss
modeling from those of binaural processing.
Authors
JA

Jean-Pierre Arz

INRS ( Vandoeuvre lès Nancy) - Institut national denrecherche et de sécurité (Vandoeuvre lès Nancy)
JD

Joël Ducourneau

LEMTA - Laboratoire d'Energétique et Mécanique Théorique etnAppliquée
LD

Louis Delebecque

LEMTA - Laboratoire d'Energétique et Mécanique Théorique etnAppliquée
RS

Romain Serizel

LORIA - Laboratoire Lorrain de Recherche en Informatique etnses Applications
Friday May 29, 2026 12:30pm - 1:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Perception, Lecture

1:00pm CEST

Systematization of Multiplier-less Convolution for 1-bit Audio Signal
Friday May 29, 2026 1:00pm - 1:30pm CEST
High-speed 1-bit signals generated by oversampling are
widely used in audio applications as they allow simple
demodulation via low-pass filtering while preserving
in-band spectral characteristics with high accuracy.
However, conventional FIR filtering of such signals
generally requires conversion to a multi-bit representation
at a common sampling frequency, which increases
computational cost; complicates the overall processing
flow. This paper addresses the convolution of high-speed
1-bit audio signals with multi-bit FIR impulse responses
; presents a systematic formulation of a multiplier-less
convolution approach. Based on a mathematical
reinterpretation of convolution, the proposed formulation
describes how time shifting; amplitude weighting can be
expressed through structured rearranging of 1-bit samples
without arithmetic operations. This provides a theoretical
description of previously reported 1-bit convolution
methods; however, its validity has not been fully
formalized. We examine the spectral characteristics of the
proposed convolution method; compare them with those
obtained by multi-bit convolution followed by ΔΣ
modulation. Experiments are conducted by convolving 1-bit
input signals with FIR filters having multi-band frequency
responses. Spectral analysis shows that the proposed method
achieves extremely high agreement with the standard
approach within the audible band while the differences
appear primarily at much higher frequencies outside the
audible range. These results demonstrate that convolution
of high-speed 1-bit audio signals can be achieved without
multipliers, suggesting the potential for highly efficient
hardware-oriented signal processing architectures.
Authors
IS

Iori Sakurai

Waseda University
TS

Tomohiro Sakaguchi

Doctoral student, Waseda University
YO

Yasuhiro Oikawa

Waseda University

YG

Yuta Gomi

Waseda University
Friday May 29, 2026 1:00pm - 1:30pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Audio Processing, Lecture

1:00pm CEST

Gaussian Splatting-Based Head; Pinna Reconstruction for Individualized HRTF Computation from Commodity Multi-View Images
Friday May 29, 2026 1:00pm - 1:30pm CEST
Individualized head-related transfer functions (HRTFs)
require accurate pinna geometry, yet commodity multi-view
captures leave the ear region self-occluded; weakly
textured. We present a practical pipeline that couples
ear-centric acquisition with 3D Gaussian splatting (3DGS)
; the boundary element method (BEM) for complete HRTF
computation. The protocol augments horizontal views with
per-ear elevated captures under directional lighting; 3DGS
training with depth-distortion regularization yields
watertight meshes via truncated signed distance function
(TSDF) fusion. Standardized head coordinates; ear-canal
annotations interface the mesh with BEM. Experimental
evaluations demonstrate that our method achieves lower
ear-region geometric error; lower full-band spectral
distortion compared to existing image-based personalized
reconstruction baselines including AudioEar, NeuS,;
Metashape MVS.
Authors
HZ

Houlin Zhu

Peking University
TQ

Tianshu Qu

Peking University
XW

Xihong Wu

Peking University
Friday May 29, 2026 1:00pm - 1:30pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

Assessing Situational Awareness of Hearing-Impaired People Through their Perception of Non-Speech Sound Events: a Literature Review
Friday May 29, 2026 1:00pm - 1:30pm CEST
Situational awareness is a multisensory ability that
enables individuals to perceive; appropriately take into
account their immediate environment. This perception of the
world through our senses is carried out continuously;
unconsciously throughout the day. When auditory perception
is degraded, an individual may no longer correctly perceive
a doorbell, a water leak, or an alarm signal, which
negatively affects quality of life; may lead to
dangerous situations. Auditory perception can in particular
be degraded by hearing loss, a common; widespread
condition. The most common treatment consists of wearing
hearing aids, which are mainly designed to improve speech
intelligibility, especially in noisy environments. Feedback
from hearing-impaired people; hearing-aid users
indicates that, although auditory situational awareness has
been recognised as an essential component of well-being, it
remains insufficiently studied; requires further
investigation. There is currently no standard method for
assessing to which extent one's situational awareness is
affected by hearing impairment; the use of hearing aids.
This is a complex process that requires assessing the
perception of relevant sound events within a continuous
stream of multisensorial information, by individuals who
have different subjective preferences. Most existing
methods are limited to evaluating only a subset of the
problem, such as identification; localisation of
non-speech sound events. The rise of new technologies, such
as virtual reality, enables the development of assessment
methods within more realistic yet controlled environments.
This study aims to review existing methods in order to
highlight their limitations in addressing the issue at hand.
Authors
AF

Adil Faiz

Université de Lorraine, CNRS, LEMTA, F-54000 Nancy, France
BM

Balbine Maillou

Université de Lorraine, CNRS, LEMTA, F-54000 Nancy, France

EG

Emma Granier

Université de Lorraine, CNRS, Inria, Loria
JD

Joël Ducourneau

LEMTA - Laboratoire d'Energétique et Mécanique Théorique etnAppliquée
RS

Romain Serizel

LORIA - Laboratoire Lorrain de Recherche en Informatique etnses Applications
Friday May 29, 2026 1:00pm - 1:30pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Perception, Lecture

1:30pm CEST

Transient Evoked Otoacoustic Emissions; Self Reported Sound Exposure
Friday May 29, 2026 1:30pm - 2:00pm CEST
Headphone listening has become an integral part of everyday
life, spanning music consumption, communication, online
media,; increasingly, computer gaming. These diverse
listening contexts make individual sound exposure highly
variable; difficult to quantify. While music listening
; occupational headphone use have been widely studied,
sound exposure from gaming remains comparatively
undocumented. This study investigated the relationship
between self‑reported exposure through headphones;
cochlear function assessed using transient evoked
otoacoustic emissions (TEOAE). Forty‑one university
students completed a detailed questionnaire on listening
habits,; TEOAEs were recorded in both ears across five
half‑octave frequency bands. Estimated weekly exposure
levels were derived from participants’ reported durations
; contexts of use. TEOAE amplitude, signal‑to‑noise ratio
(SNR),; reproducibility showed clear frequency‑dependent
patterns; small ear asymmetries, consistent with typical
OAE behaviour. Only limited associations were found between
self‑reported exposure; TEOAE measures, with significant
effects emerging primarily for SNR; reproducibility in
the highest‑exposure group. No consistent differences were
observed between long‑term gamers; non‑gamers. These
findings suggest that self‑reported exposure alone may be
insufficient to detect subtle cochlear changes in young
adults,; underscore the need for more precise
exposure‑monitoring methods when evaluating recreational
sound exposure risks.
Authors
DH

Dorte Hammershøi

Professor, Acoustics and Hearing, AI and Sound, Department of Electronic Systems, Aalborg University
RO

Rodrigo Ordoñez

Aalborg University
Friday May 29, 2026 1:30pm - 2:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

An Extended Multichannel Frequency-Domain FxLMS Algorithm for Real-Time Full-Band Adaptive Transaural Reproduction
Friday May 29, 2026 1:30pm - 2:00pm CEST
This paper presents a multichannel adaptive filtering
algorithm for real-time full-band adaptive transaural
reproduction on general-purpose hardware. It is based on a
multichannel frequency-domain FxLMS algorithm using an
overlap-save framework for both filtering; adaptation,
; is extended with (i) online plant identification for
fully adaptive operation, (ii) frequency-dependent
normalization for faster convergence,; (iii)
frequency-dependent regularization to stabilize adaptation.
The proposed algorithm is implemented in C language on a
standard desktop PC; evaluated on a 4x2 transaural
configuration running in real time at 48 kHz with 2048-tap
control filters. Two evaluation tests are conducted. The
first test consists of reproducing two uncorrelated
white-noise signals at the ears of a manikin using
crosstalk cancellation as the performance metric. An
average crosstalk cancellation of 32 dB over 100 Hz–20 kHz
is demonstrated. The second experiment considers binaural
signal reproduction as a more realistic use case of the
algorithm. In both cases, performance is assessed for both
a static listener; a moving listener scenario,
demonstrating the algorithm’s ability to rapidly re-adapt.
Friday May 29, 2026 1:30pm - 2:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

A Perceptual Evaluation Method for Binaural Rendering Algorithms via Minimum Audible Angle Measurements
Friday May 29, 2026 1:30pm - 2:00pm CEST
Binaural rendering is typically assessed via timbre;
localization accuracy, while its intrinsic spatial
resolution remains rarely quantified. This paper proposes a
perceptual evaluation method based on Minimum Audible Angle
(MAA) measurements to estimate the azimuthal
just-noticeable difference (JND) introduced by binaural
rendering algorithms. We systematically compared several
rendering algorithms across eight reference azimuths using
two participant-allocation paradigms. The results show that
spatial resolution is significantly influenced by Ambisonic
order; choice of the rendering alrorithm, with MAA
thresholds systematically decreasing as the truncation
order increases. Furthermore, the propsed method
successfully captures physiological spatial characteristics
; identifies resolution limits imposed by reference
angles. While both participant-allocation paradigms yield
consistent qualitative trends, the repeated-measures design
provides superior data stability. These findings
demonstrate that the proposed MAA-based method is an
effective tool for quantifying the spatial resolution of
binaural rendering algorithms.
Authors
HZ

Houlin Zhu

Peking University
TQ

Tianshu Qu

Peking University
XW

Xihong Wu

Peking University
YQ

Yufan Qian

Peking University
Friday May 29, 2026 1:30pm - 2:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:00pm CEST

Real-Time Implementation of Personal Sound Zones Using Partitioned Convolution in Purr Data
Friday May 29, 2026 2:00pm - 2:30pm CEST
Personal sound zones aim to reproduce distinct audio
contents in separate spatial regions using loudspeaker
arrays, while minimizing acoustic interference between
zones. Although well established theoretically, their
real-time implementation remains challenging due to the
long impulse responses involved; the latency constraints
of audio processing systems.
This work presents a real-time implementation of personal
sound zones based on the pressure matching method in a
static context, i.e. transfer functions between the
loudspeakers; the zones are assumed to remain constant.
Sound zone filters are computed in the frequency domain
from experimentally measured impulse responses between an
array of 18 loudspeakers; two microphone arrays of 9
microphones defining a bright zone; a dark zone. The
system performance is then evaluated in terms of acoustic
contrast, reproduction error,; effective frequency
range. To meet real-time constraints, a fast partitioned
convolution algorithm has been used, namely the
Uniformly-Partitioned Overlap Save (UPOLS). This methods
has been implemented in C++ as an external block for the
Purr Data real-time audio environment. Experimental
results, obtained in a semi-anechoic environment,
demonstrate that it enables stable real-time multichannel
convolution with negligible numerical error compared to
offline convolution. The proposed system results in a
functional real-time sound zones demonstrator, suitable for
experimental; interactive spatial audio applications.
The codes are shared in a GitHub repository so that the
scientific community can benefit from them.
Authors
GP

Guilhem Pagès

Laboratoire d'Acoustique de l'Université du Mans (LAUM),nUMR 6613
JB

Jean Beuchet

Laboratoire d'Acoustique de l'Université du Mans (LAUM),nUMR 6613
avatar for Manuel Melon

Manuel Melon

Professor, LAUM / LE MANS Université


TL

Titouan Lefrancois

Laboratoire d'Acoustique de l'Université du Mans (LAUM),nUMR 6613
Friday May 29, 2026 2:00pm - 2:30pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:00pm CEST

Toward an improved auditory model for predicting binaural coloration
Friday May 29, 2026 2:00pm - 2:30pm CEST
The evaluation of audio quality is important in the
development of immersive audio algorithms; reproduction
systems,; binaural models are often used for this as a
quick alternative to listening tests. Coloration (i.e.,
perceived loudness differences integrated across ears;
frequency) is one key quality aspect; however, the majority
of models used to predict coloration are often
oversimplified or are missing a dedicated binaural stage to
consider the relative contribution of the left; right
ear signals. A binaural coloration model is presented that
builds upon previous work; tests three different
approaches for its binaural stage. The proposed model is
evaluated in comparison with nine models that are
frequently used to predict coloration by using data from
five listening tests totaling 252 stimuli with various
audio contents; source positions. The proposed model
performed best with 85% of explained variance, followed by
predictions based on ISO 532-1 loudness, yielding 78%
explained variance. The commonly used log-spectral distance
performed worst, with only 44% explained variance. The
three tested binaural stages had little influence on the
performance of the proposed model. The model is made freely
available to download.
Authors
avatar for Thomas McKenzie

Thomas McKenzie

Lecturer in Acoustics, University of Edinburgh
Thomas McKenzie is a Lecturer in Acoustics and Architectural Acoustics at the Reid School of Music, Edinburgh College of Art, University of Edinburgh, UK. He completed a B.Sc. in Music, Multimedia, and Electronics at the University of Leeds, UK, in 2013, before completing his M.Sc... Read More →
Friday May 29, 2026 2:00pm - 2:30pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Immersive Audio, Lecture

2:30pm CEST

Exploring Rendering Variability in Next-Generation Audio Reproduction
Friday May 29, 2026 2:30pm - 3:00pm CEST
This study evaluates three Next-Generation Audio (NGA)
rendering systems through listening tests using real-life
audio content. The testing paradigm prioritized subjective
preference over adherence to a ground-truth reference.
Participants assessed perceptual spatial audio attributes
in both 5.1; 7.1.4 loudspeaker setups. The findings
suggest that strict adherence to the rendering algorithm
used during content creation is not mandatory in terms of
listener preference. While not advocating disregarding
artistic intent without consideration, this study proposes
that such flexibility in reproduction can be an acceptable
compromise.
Authors
ES

Ema Souza-Blanes

Samsung Research America
avatar for Toni Hirvonen

Toni Hirvonen

Researcher, Samsung Research America
Toni Hirvonen studied acoustics at the Helsinki University of Technology (now Aalto University), where he obtained a PhD in audio signal processing and spatial audio. After a position as a Marie Curie fellow, he has worked internationally in the audio industry since 2010. His projects... Read More →
WJ

Wonbeen Jo

Samsung Research
YK

Yongmin Kwon

Samsung Research

Friday May 29, 2026 2:30pm - 3:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:30pm CEST

Immersive Underwater Audio Capture Using a Wideband Spatial Hydrophone Array
Friday May 29, 2026 2:30pm - 3:00pm CEST
Immersive audio continues to expand beyond traditional
studio; terrestrial field-recording environments, yet
underwater soundscapes—particularly those involving marine
mammals—remain largely documented in mono or stereo
formats. This paper presents a practical; low-cost
approach for capturing immersive underwater audio using a
newly developed wideband hydrophone; a multichannel
array optimized for marine environments. The hydrophones,
designed by the author, feature a low noise floor, extended
frequency response exceeding 100 kHz,; direct
compatibility with standard P48 phantom-powered audio
recorders, enabling deployment without specialized
underwater preamplifiers or power systems.

To translate established immersive recording techniques
into the ocean environment, an array architecture was
developed based on a compact eight-element cube geometry.
Two array variants were constructed to account for the
significantly higher speed of sound in water compared to
air, allowing the spatial characteristics of underwater
sources to be captured with appropriate inter-element
spacing. Field recordings were conducted off the coast of
Hawaii in January during the peak season for humpback whale
song. Recordings were made at multiple depths; positions
to explore variations in reverberation, propagation,;
ambient biological activity.

Preliminary results indicate that the system captures
detailed spatial cues from humpback whale vocalizations
while simultaneously preserving the rich ambient marine
soundscape. The extended ultrasonic response further allows
slowed or pitch-shifted playback to reveal fine temporal
structures not typically audible. This work demonstrates a
feasible method for immersive underwater recording;
provides a foundation for both scientific research;
creative content production.
Authors
avatar for Jules Ryckebusch

Jules Ryckebusch

Sound Sleuth, Sound Sleuth
Jules career with audio and electronics started early. At 16 he built an analog synthesizer from a PAiA kit. While still in high school, he designed and built a mixing board then started doing sound for local bands.
Jules went to college, studied physics, and then joined the US Navy where he spent 20 years as a nuclear submariner. In between submarines, he was an instructor at the Naval Nuclear Power School in Orlando, Florida. He taught Reactor Kinetics by day, and spent many a night in local... Read More →
Friday May 29, 2026 2:30pm - 3:00pm CEST
Aud 31 Technical University of Denmark Asmussens Alle, Building 306 DK-2800 Kgs. Lyngby Denmark
 
Saturday, May 30
 

9:00am CEST

Adaptive Deesser Application
Saturday May 30, 2026 9:00am - 9:30am CEST
High-fidelity vocal processing is frequently compromised by
sibilance, a phenomenon characterized by stochastic
high-frequency energy that presents unique dynamic range
challenges. While traditional de-essing techniques often
rely on static frequency bands, they fail to account for
inter-speaker variability; changing dynamics. This
project presents an adaptive real-time de-essing
application, developed using the JUCE framework, which
automatically detects; suppresses sibilant frequencies.
The proposed methodology integrates a derivative-based
frequency tracking algorithm to estimate the spectral
centroid without the computational overhead of the Fast
Fourier Transform (FFT). This is coupled with a dual-path
envelope detection system; a relative threshold logic to
distinguish sibilance from the wideband signal.
Additionally, a dynamic harmonic exciter is implemented to
restore high-frequency presence during non-sibilant
periods. Objective spectral analysis confirms the system's
ability to selectively attenuate energy in the 6–11 kHz
range while maintaining spectral transparency;
minimizing artifacts.
Authors
CE

Cumhur Erkut

Aalborg University
Cumhur Erkut (M.Sc. 1997, D.Sc. 2002) has received a PhD in acoustics and audio signal processing from Helsinki University of Technology, Finland. During his post-doctoral period, he has contributed to national and international projects (EU FP5 and 6). Between 2007 and 2012, he has conducted i... Read More →
SB

Stefanos Biliousis

Aalborg University
Saturday May 30, 2026 9:00am - 9:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Augmented Ubiquity - A fully volumetric music composition for VR
Saturday May 30, 2026 9:00am - 9:30am CEST
This paper presents the production of a fully volumetric
audiovisual music composition with six degrees of
freedom, featuring a dance performance. The project
realizes the artistic potential enabled by recent
technological
advancements in volumetric video capture; spatial audio
rendering. The interdisciplinary production team
consisted of music composers, dancers, sound engineers,;
experts in 4D Gaussian Splats. An existing 3D body
scanner consisting of 112 cameras was used to capture a
dance performance in high definition video. For the
visual scene, custom 4D Gaussian Splat algorithms were
developed; employed to create the dynamic model.
Additional static 3D Gaussian Splats were captured with the
same scanner; integrated into the scene in Unity.
The acoustic scene is dynamically binauralized via SPAT
Revolution, depending on the position of the listener
in the virtual space. Audio; video scenes are run on
separate PCs, synchronized via OSC; presented via
commercially available head-mounted displays (HMDs).
Audiences report a high level of immersion at the initial
presentation at an exhibition event. A detailed evaluation
is planned in the near future. Furthermore, a unified
application for both visual; audio scenes is planned in
order to reach a wider audience.
Authors
BS

Benedikt Samuel Jäger

Kreativinstitut.OWL | Detmold Music University
LK

Lou Kilger

Kreativinstitut.OWL | Detmold Music University
MS

Manuel Steitz

Wunderkammer Visual Engineering
PD

Pablo Dawson

Wunderkammer Visual Engineering; CENIA
avatar for Sascha Armin Etezazi

Sascha Armin Etezazi

Music Director, Artistic Research Assistant, Kreativinstitut.OWL | Detmold Music University
Saturday May 30, 2026 9:00am - 9:30am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:30am CEST

When Excellence Fails the Mix: Non-Compensatory Relationships in Mix Preparation for Music Production
Saturday May 30, 2026 9:30am - 10:00am CEST
Mix preparation, the foundational stage encompassing
technical, musical,; organisational tasks preceding
creative mixing, remains under-examined despite
professional acknowledgement. This study investigated
whether preparatory effectiveness operates through
compensatory relationships, where excellence in one
dimension offsets weakness in another, or through threshold
requirements demanding adequacy across all dimensions
simultaneously.

Nine professional audio practitioners each prepared three
sessions from a pool of nine multitrack recordings spanning
diverse genres. Nine engineers (with partial overlap) then
evaluated the resulting twenty-seven preparations across
five dimensions derived from Phase 1 practitioner
interviews: Session Organisation, Signal Integrity, Musical
Refinement, Processing Boundaries,; Workflow
Facilitation. Professional 'adequacy' was established at a
4.0 threshold based on practitioner consensus regarding
preparations they would 'work with' versus 'send back'.

Results revealed consistent non-compensatory patterns:
exceptional performance in isolated dimensions failed to
compensate for failures elsewhere. One practitioner
achieved perfect Workflow Facilitation (5.00) yet overall
inadequacy (3.43) due to Signal Integrity failure (2.50).
Another achieved strong Musical Refinement (4.75) whilst
Workflow Facilitation collapse (1.75) produced a
below-threshold outcome (3.49). These patterns held across
all inadequate sessions. No track produced exclusively
adequate or inadequate outcomes, confirming source material
did not determine success.

The findings challenge three assumptions: that
practitioners can specialise; compensate, that education
can sequence skills for later integration,; that
intelligent systems can optimise tasks independently.
Preparatory adequacy requires meeting threshold standards
across all dimensions concurrently, with implications for
professional hiring, curriculum design,; AI-assisted
tool development.
Authors
AA

Ashour Ahmed

University of West London - London College of Music
Saturday May 30, 2026 9:30am - 10:00am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:30am CEST

The artistic role of the sound engineer in immersive spatialisation. Investigation of the influence of space in the emotional interpretation of sounds.
Saturday May 30, 2026 9:30am - 10:00am CEST
Historically, music has developed primarily as a frontal
phenomenon, thus limiting the expressive; perceptual
potential related to sound space. The recent development of
immersive audio systems opens new creative possibilities by
expanding the artistic action space from a narrow frontal
area to a complete sphere around the listener. The
Ambisonic system (Scene-Based Audio), together with
Object-Based formats; hybrid solutions, represents
fertile ground for creative experimentation; the
redefinition of workflows in the field of spatialized sound.
In this new context, what is the role of the sound
engineer, as an electroacoustic interpreter, in immersive
musical artistic creation?
The research is based on a multidisciplinary analysis that
combines an in-depth study of current immersive audio
technologies; their performance, with observations of
existing compositional; production approaches.
Additionally, a comparative study is conducted on the
design choices of the sound engineer as an interpreter,
investigating workflows, emerging musical semantics,
available tools,; the recovery of the historical
repertoire.
Particular attention is paid to the experiment aimed at
investigating a correlation between the position of a sound
; an emotional trigger in the listener.
New directions emerge in the creative role of the sound
engineer, who goes beyond the mere technical aspect to
become an integral part of the compositional;
interpretative process, harmonizing the relationship
between technique; art.
Authors
LF

Luca Frigo

Conservatorio G. Nicolini Piacenza
Saturday May 30, 2026 9:30am - 10:00am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:00am CEST

Low-Frequency Limits of Cross-Talk Cancellation Systems Under Robustness Constraints
Saturday May 30, 2026 10:00am - 10:30am CEST
The low-frequency performance of cross-talk cancellation
(CTC) systems is fundamentally limited by the condition
number of the plant matrix, which indicates the robustness
of the inverse system in the absence of regularisation.
This condition number, in turn, depends on the relationship
between loudspeaker spacing, listener distance,;
acoustic wavelength.
This paper derives a simple approximate expression for the
low-frequency limit of CTC performance, defined for a given
maximum affordable condition number as a function of these
parameters. The increase in condition number is also shown
to be directly related to the increase in array effort
relative to the minimum achievable array effort. The
formulation is derived for a centered listener; can be
extended to the case of off-center listener positions,
demonstrating the method's applicability to
listener-position-adaptive cross-talk cancellation systems.
Speakers
FF

Filippo Fazi

Chief Scientist, Audioscenic
Authors
FF

Filippo Fazi

Chief Scientist, Audioscenic
FV

Francesco Veronesi

University of Southampton
Saturday May 30, 2026 10:00am - 10:30am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:00am CEST

Melodical Mashup of Classical Pieces: How to Maximize Audience Enjoyment?
Saturday May 30, 2026 10:00am - 10:30am CEST
Mashup is a distinctive form of music composition which
integrates elements from existing songs to create a
cohesive audio experience. The digital music landscape,
with various audio processing tools; sharing platforms,
has facilitated the creation; propagation of mashups by
musicians, remixers, audio engineers,; automated
systems. While most prior research; studies focus on
mashups created by combining elements from individual audio
tracks, typically using pop songs, there exists other types
of mashups; for example, by incorporating phrases from base
melodies into a new arrangement. In this study, we examined
listener enjoyment ratings for this type of mashup,
utilizing well-known Western classical melodies. A
listening test was conducted to assess whether variations
in pitch, tempo,; familiarity with the source material
correlate with enhanced enjoyment. This paper presents our
preliminary findings, with plans for future studies;
additional survey responses to strengthen the results;
uncover insights for crafting more engaging classical
mashups.
Authors
AD

Anh-Dung Dinh

The Hong Kong University of Science and Technology
Saturday May 30, 2026 10:00am - 10:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:30am CEST

A multitrack dataset of a ten-song album with stereo; immersive 7.1.4 masters
Saturday May 30, 2026 10:30am - 11:00am CEST
This paper presents a multitrack dataset designed to
support music production research; education, including
machine learning techniques such as automatic mixing;
source separation. The dataset comprises a cohesive 10-song
indie album (indie rock/folk), with separate stems for
individual instruments, such that each song has between 13
; 35 individual tracks (stems). For each song, three
versions of each stem are provided: the raw unprocessed
stems, a dry mixed version (processed but without
reverberation or delay effects),; a full mixed version.
Additionally, each song includes two final master formats:
stereo; immersive 7.1.4. This album-format dataset
enables studies of mix consistency across a thematically
aligned collection of songs, as well as stereo upmixing to
immersive formats,; contains far more stems per song
than traditional four-stem datasets. To illustrate an
example usage of the dataset, the MEGAMI automatic mixing
model is used to produce a mix for two songs. The results
are analysed in comparison to the raw (unmixed); human
mixed versions. The dataset is made open-access; free to
download.
Authors
AW

Alec Wright

University of Edinburgh
EM

Eloi Moliner

Aalto University
avatar for Thomas McKenzie

Thomas McKenzie

Lecturer in Acoustics, University of Edinburgh
Thomas McKenzie is a Lecturer in Acoustics and Architectural Acoustics at the Reid School of Music, Edinburgh College of Art, University of Edinburgh, UK. He completed a B.Sc. in Music, Multimedia, and Electronics at the University of Leeds, UK, in 2013, before completing his M.Sc... Read More →
Saturday May 30, 2026 10:30am - 11:00am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:30am CEST

Room Measurement Based Calibration of MPEG-I Tracked Loudspeaker Rendering
Saturday May 30, 2026 10:30am - 11:00am CEST
The MPEG-I Immersive Audio standard for Virtual;
Augmented Reality (VR/AR) audio with six degrees of freedom
(6DoF) was completed in November 2025 by the MPEG Audio
group (ISO/IEC JTC 1/SC 29/WG 6)). It offers compressed
representation of virtual audio scenes as well as an
efficient; acoustically sophisticated rendering to both
head-tracked binaural headphones; loudspeaker setups.
The latter is a unique feature among VR/AR audio
specifications; enables convincing reproduction of
conventional stereo, surround; 3D material with a large
sweet area in home entertainment setups for a single
tracked user, without the need for a head-mounted
display. This paper describes the technology of MPEG-I
Audio listener-tracked loudspeaker rendering as a
stand-alone application with a special focus on practical
considerations for optimal room calibration.
Authors
avatar for Christof Faller

Christof Faller

Illusonic GmbH
avatar for Juergen Herre

Juergen Herre

International Audio Laboratories Erlangen
avatar for Sascha Disch

Sascha Disch

Fraunhofer IIS, Fraunhofer IIS
Sascha Disch received his Dipl.-Ing. degree in electrical engineering from the Technical University Hamburg-Harburg (TUHH) in 1999 and joined the Fraunhofer Institute for Integrated Circuits (IIS) the same year. Ever since he has been working in research and development of perceptual... Read More →
Saturday May 30, 2026 10:30am - 11:00am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

11:00am CEST

Optimising Sound Effects to Enhance Dialogue Perception in Audio Mixes Using Selective Auditory Attention
Saturday May 30, 2026 11:00am - 11:30am CEST
Dialogue intelligibility is a fundamental aspect of audio
post-production. Ensuring speech clarity in complex sound
mixes remains challenging across different playback
systems. Selective auditory attention plays a central role
in how listeners track dialogue in busy mixes, so small
changes in spectral or spatial structure can influence
perceived clarity in unexpected ways. This study
investigates the effectiveness of psychoacoustically
informed techniques, equalisation; spatialisation, in
reducing auditory masking; improving the clarity of
dialogue. The listening test was completed on participants’
own playback systems, which reflects typical domestic
viewing conditions; aligns the study with real-world
listening environments. The techniques were tested
individually; in combination to assess their impact.
Results show that equalisation was more effective than
spatialisation in reducing masking, while their combination
produced a significant improvement in intelligibility,
clarity,; reduced interference. The effectiveness of
these methods varied between the two groups of clips,
suggesting that their application should be adapted to the
specific acoustic context of each scene.
Authors
avatar for Federico Aramini

Federico Aramini

Edinburgh Napier University
Dialogue and sound editor with 3+ years' experience and 30+ credits in film across feature film, animation, documentary and TV series.Contributed to award-winning and festival recognised productions, including films screened at the Venice Film Festival and the David di Donatello Awards... Read More →
IM

Iain McGregor

Edinburgh Napier University
RS

Rod Selfridge

Edinburgh Napier University
Saturday May 30, 2026 11:00am - 11:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

11:00am CEST

The Missing Next Step: Sound, Agency,; Plausibility in Virtual Reality — A Narrative Review
Saturday May 30, 2026 11:00am - 11:30am CEST
Sound plays a critical role in virtual reality (VR),
shaping attention, narrative comprehension, emotional
engagement,; experiential plausibility under conditions
of embodiment; user agency. Although a growing body of
research addresses VR audio techniques, perceptual effects,
; sound taxonomies, existing approaches remain fragmented
; largely descriptive. In particular, they do not provide
a unifying, VR-specific account of how sound meaning;
emotional intent are operationally linked to user agency
; non-linear narrative progression. This paper presents a
narrative review of selected literature spanning game audio
frameworks, immersive sound design, narrative theory,;
plausibility-related research in games; VR. Through
synthesis of these perspectives, the review identifies a
conceptual gap in current research, namely the absence of a
VR-specific, agency-coupled sound design framework for
structuring sound meaning; emotional intent in support
of experiential plausibility as users actively shape events
in interactive VR environments.
Authors
avatar for Eve Klein

Eve Klein

Senior Lecturer, Music Technology & Popular Music, The University of Queensland, School of Music
Dr Eve Klein is a lecturer in music technology at the University of Queensland, Australia. She is also an operatic mezzo soprano, a composer, and an Ableton Live Certified Trainer. Eve's research is concentrated on music technology, recording cultures and contemporary music. Her current... Read More →
NH

Neil Hillman

The Audio Suite
NB

Nilufar Baghaei

The University of Queensland, School of ElectricalnEngineering and Computer Science
PK

Peter Kurucz

The University of Queensland, School of ElectricalnEngineering and Computer Science
SS

Stefania Serafin

Department of Engineering Technology and Didactics,nTechnical University of Denmark
Saturday May 30, 2026 11:00am - 11:30am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

11:30am CEST

Intelligent Audio personalization for Enhanced user experience
Saturday May 30, 2026 11:30am - 12:00pm CEST
Most of music contents available are stereo which cause
inadequate spatial treatment; listeners feel
disconnected from the music, failing to transport them into
the intended sonic environment. Insufficient separation
between instruments can lead to an unbalanced mix, where
certain elements dominate others; disrupt the overall
harmony. Instruments may appear flat; confined to a
narrow area, reducing the sense of dimensionality in the
mix. Stereo audio offers limited spatial information,
restricting its adaptability to immersive sound
environments. This research presents a novel approach for
converting stereo audio into a personalized immersive
experience by leveraging object-based audio rendering,
sound stage of listener; surround speaker capability.
The proposed system separates audio signals into individual
objects (such as instruments or vocals); dynamically
maps these objects to specific speakers based on
personalized preferences; spatial configurations. This
method improves audio localization; enhances the
listener's engagement by delivering a tailored auditory
experience.
Authors
AS

Avinash Singh

Samsung Research Institute, Delhi (SRID)
avatar for Natasha Meena

Natasha Meena

Samsung Research Institute, Delhi (SRID)
I am working as Software developer in Samsung Research Institute India - Delhi and am responsible for development of features related to Samsung sound device’s
SP

Sumit Panwar

Samsung Research Institute, Delhi (SRID)
Saturday May 30, 2026 11:30am - 12:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

A Systematic Literature Review on Inverse Synthesis; Sound Matching
Saturday May 30, 2026 1:00pm - 1:30pm CEST
This paper presents a systematic literature review on
inverse synthesis; sound matching, which focus on
predicting synthesizer parameters to recreate a target
audio waveform. Automating this process using machine
learning is impeded by distinct technical challenges: many
to one mappings where different parameter settings produce
the exact same sound, the non-differentiability of
commercial black box synthesizers, a scarcity of musically
structured training data,; a lack of standardized
perceptual metrics. Existing approaches are categorized
into non-differentiable synthesizer methods, utilizing
evolutionary algorithms; deep learning, incorporating
techniques to bypass gradient limitations such as neural
proxies or generative models. In contrast, differentiable
synthesizer methods, enable the integration of audio loss
functions into training pipelines via custom signal
processing environments. The analysis identifies a critical
reliance on spectral representations for evaluating
perceptual similarity, given that parameter based metrics
frequently fail to align with human hearing. The findings
indicate that while deep learning has reduced inference
times, the field lacks a unified production solution.
Future progress requires the establishment of standardized
benchmarks to evaluate models, the implementation of novel
advancements in generative models not yet applied to this
problem,; the development of hybrid architectures to
simultaneously address these distinct technical challenges.
Authors
BG

Bruno Gawęcki

Poznan University of Technology, Institute of ComputingnScience
EL

Ewa Łukasik

Poznan University of Technology, Institute of ComputingnScience
Saturday May 30, 2026 1:00pm - 1:30pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

Artificial ear for bone-conducted vibrations, simulation; measurement
Saturday May 30, 2026 1:00pm - 1:30pm CEST
The bone-conducted occlusion effect (OE) is a major source
of acoustic discomfort for users of hearing aids, earbuds,
earplugs,; related devices. Conventional objective OE
measurements rely on in-ear microphones in human subjects,
which are time-consuming, invasive,; difficult to
control during product development. The aim of this paper
is to present a new artificial ear, specifically designed
for objective OE measurements under bone-conducted
excitation, coupled with a finite element analysis (FEA)
model developed in COMSOL Multiphysics. Both the model;
the artificial ear demonstrate good agreement regarding the
sound pressure found at the tympanic membrane for a
conventional dome at shallow, medium; deep insertions.
The validated FEA model is then used to perform a virtual
test of the bone-conducted objective OE for different
occluding devices, including plastic; foam earplugs;
a conventional closed dome for hearing aids. This is to
investigate the relative contributions; phases of the
ear-canal; device surfaces govern the resulting occluded
sound pressure. The proposed artificial ear; modeling
approach provide a controlled; repeatable platform for
studying the OE; for evaluating occluding devices during
the development process.
Authors
RD

Roberta Dattilo

GN Hearing A/S
YL

Yu Luan

GN Hearing A/S
Saturday May 30, 2026 1:00pm - 1:30pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

From DSP to AI Audio Engineering: The Heritage; the Future of Physical Modeling Sound Synthesis
Saturday May 30, 2026 1:30pm - 2:00pm CEST
Digital Audio Signal Processing has long enabled precise
analysis of musical instrument behavior, supporting digital
sound synthesis. In parallel, physical modeling has evolved
into a mature synthesis; simulation technology capable
of running in real time, coupling vibro-acoustic models
with perceptual control interfaces. Over the last decade,
advances in machine learning have begun to transform both
ends of this pipeline. Instead of relying solely on
analytical DSP methods, we are increasingly able to learn
impulse; frequency responses, infer parameters,;
drive synthesis models directly from data. This broader
transition from classical DSP to *AI Audio Engineering*
brings not only new algorithms but also new workflows,
evaluation practices,; deployment contexts for musical
acoustics.

Two demonstrators illustrate this shift. *First*,
measurement-driven studies of musical instruments can
constrain model architectures; reduce parameter search
spaces. The measurement-derived priors can inform both
classical modeling; data-driven neural surrogates.
*Second*, real-time physical modeling integrated into XR
environments highlights how haptic control, perceptual
feedback,; spatial audio can create convincing virtual
instruments suitable for experimentation, pedagogy,;
performance.

These demonstrators motivate an AI Audio Engineering
workflow in which measurement, modeling, learning,;
perceptual evaluation form a continuous loop, to enable
immersive XR experiences, rapid prototyping of novel
instruments,; new modes of digital lutherie. The
approach invites collaboration across acoustics, DSP,
spatial audio,; AI Audio Engineering: an emerging
discipline that considers audio models as deployable,
maintainable,; continuously improvable artifacts
governed by data, inference, evaluation,; lifecycle
operations.
Authors
CE

Cumhur Erkut

Aalborg University
Cumhur Erkut (M.Sc. 1997, D.Sc. 2002) has received a PhD in acoustics and audio signal processing from Helsinki University of Technology, Finland. During his post-doctoral period, he has contributed to national and international projects (EU FP5 and 6). Between 2007 and 2012, he has conducted i... Read More →
Saturday May 30, 2026 1:30pm - 2:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

A Study on Uncertainty of Sound Pressure Measurements in Cars
Saturday May 30, 2026 1:30pm - 2:00pm CEST
Accurate; efficient measurement of sound pressure levels
around the ears of occupants in cars is essential for
objective evaluation of basic sound quality; automotive
audio features such as personal sound zones; active
noise control. In this paper, the uncertainties of sound
pressure measurements obtained with 5 commonly used methods
are compared, which are the AES 6-microphone method, the
single-microphone method, the two-microphone method with
occupants presented, the head-and-torso simulator method,
; the human binaural method. Measurements were conducted
in the front-right seat of a 4-door electric Sedan, using
either all car body loudspeakers or a pair of headrest
loudspeakers driven by a two-channel uncorrelated pink
noise to generate an average sound pressure level of 70 dBA
in the seat. Each method underwent 3 complete
install–measure–remove cycles, a total of 54 recordings
were collected,; the standard deviation of the measured
average sound pressure levels was adopted to quantify
measurement uncertainty. The test results show that all the
5 methods have good repeatability; low uncertainty below
200 Hz; above 15 kHz, but have large uncertainty between
200 Hz; 15 kHz. The AES 6-microphone method demonstrates
the best repeatability with the lowest uncertainty across
most frequency resolutions,; its maximum uncertainty in
1/3 octave bands is less than 2.0 dB for sound pressure
measurements in the car. Therefore, the AES 6-microphone
method is recommended for use in engineering comparison;
reporting.
Authors
JT

Jiancheng Tao

Key Laboratory of Modern Acoustics and Institute ofnAcoustics, Nanjing University
RC

Ruoyan Chen

Key Laboratory of Modern Acoustics and Institute ofnAcoustics, Nanjing University
avatar for Xiaojun Qiu

Xiaojun Qiu

Yinwang Intelligent Technology Co., Ltd, Shanghai, China
ZZ

Zhou Zhou

Key Laboratory of Modern Acoustics and Institute ofnAcoustics, Nanjing University
Saturday May 30, 2026 1:30pm - 2:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:00pm CEST

Knowledge-Driven Optimization of Reverberation Parameters Using Declarative Audio Constraints
Saturday May 30, 2026 2:00pm - 2:30pm CEST
Artificial reverberation is a fundamental process in music
production; audio post-production. However, the large
; highly interdependent parameter spaces of modern
reverberation algorithms make the identification of
perceptually optimal configurations difficult, particularly
when attempting to minimize audible artifacts. This paper
presents a knowledge-driven framework for reverberation
parameter optimization that evaluates candidate
configurations using rule-based audio quality constraints
derived from perceptual; signal-processing principles.
The system automatically detects; prevents common
artifacts including spectral obfuscation, clipping, spatial
collapse,; ringing phenomena. Instead of relying on
data-driven training procedures, the proposed approach
employs declarative reasoning to model audio engineering
knowledge; systematically constrain parameter
exploration. Experimental evaluation demonstrates that the
framework successfully reduces artifact occurrence across
diverse audio material while maintaining computational
feasibility. The results suggest that knowledge-based
reasoning can provide an interpretable; controllable
alternative to data-driven optimization strategies in audio
signal processing.
Authors
FE

Flavio Everardo

Tec de Monterrey, University of Potsdam
NH

Noah Haussmann

TU Berlin, University of Potsdam
Saturday May 30, 2026 2:00pm - 2:30pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:00pm CEST

Optimal levels; measurement time for separation of nonlinear components
Saturday May 30, 2026 2:00pm - 2:30pm CEST
Linear loudspeaker parameters are often estimated via
fitting of transferfunctions, under the assumption of
linearity. This paper investigates the corruption of the
measurement caused by nonlinearities in the system;
presents a new; improved method for separating the true
linear response from the nonlinear components by analyzing
a sequence of measurements done at different levels. The
method is improved by analyzing the influence of the chosen
measurement levels as well as the measurement time at each
level; presents numerically optimal values for the most
typical cases of nonlinear behaviour. While the influence
of noise; nonlinear distortion can be eliminated
completely in the case of finite orders of nonlinearities
on the system, the method is also shown to provide improved
accuracy in the more realistic case where all orders are
present but only a finite number of them dominate.
Authors
avatar for Finn Agerkvist

Finn Agerkvist

Technical University of Denmark
My interest are loudspeakers (measurements, modelling, (nonlinear) parameter estimation, nonlinear compensation. Active noise control, indoor and outdoor sound field control

Saturday May 30, 2026 2:00pm - 2:30pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:30pm CEST

Topology Optimized Tweeter Waveguides for Automotive Audio
Saturday May 30, 2026 2:30pm - 3:00pm CEST
Automotive audio is challenging for a variety of reasons.
The acoustic environment is noisy, the geometry is complex
with many reflecting surfaces,; there are several
listening positions of interest. While digital signal
processing can to some degree alleviate some of the
associated issues, there often is a need for specialized
waveguides that directly affect the sound propagation from
the transducers. However, with the desired objectives being
quite intricate; involving on-axis pressure,
directivity, beam width,; possibly other metrics, the
design process is highly non-trivial. A strategy based on
acoustical topology optimization is presented here,;
where a tweeter waveguide to be mounted in the dashboard is
optimized towards certain objective functions.
Authors
RC

René Christensen

CEO, Acculution ApS
PhD in microacoustics. CEO of Acculution ApS. Consultant, vibroacoustics.
Saturday May 30, 2026 2:30pm - 3:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
 


Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.