AES Europe 2026: Full Schedule

Schedule as of May 16, 2022 - subject to change

Default Time Zone is CEST - Central European Summer Time
You can change your view to your time zone (look for "Timezone" on the right)

LIVESTREAMS : A and B

ON DEMAND VIDEOS (previous days)

arrow_back View All Dates

9:00am CEST

Deep Learning-Based Lower-Layer Upmixing

Thursday May 28, 2026 9:00am - 9:30am CEST

Aud 43

This paper introduces a novel approach for generating a
lower layer in multichannel audio upmixing, addressing a
gap in existing methods that primarily focus on mid; top
layers. Leveraging Harmonic-Percussive Separation (HPS),
the proposed framework dynamically adjusts key parameters
(separation factor, harmonic attenuation,; phase shift)
to enhance percussive components while diffusing harmonic
elements. We compared three neural network architectures
for this task: LSTM, TCN,; Transformer. Experimental
results show comparable perceptual quality; objective
metrics across all models, with the TCN being the most
balanced; suitable for deployment on edge devices.

Authors

Ema Souza-Blanes

Samsung Research America

Luis Madrid

Samsung Research Tijuana

Thaddeus Páez

Research Engineer, Samsung Research Tijuana

Research Engineer at Samsung Mexico.

Thursday May 28, 2026 9:00am - 9:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Processing, Lecture | Immersive Audio, Lecture

Presentation Type Lecture

9:00am CEST

Design; Optimization of Acoustic Lenses for Audible Frequency

Thursday May 28, 2026 9:00am - 9:30am CEST

Aud 44

Acoustic lenses are structures that enable the focusing of
acoustic waves, with increasing applications in audio
devices like loudspeakers to concentrate energy toward a
listening position. While typically employed at higher
frequencies, achieving effective performance within the
audible frequency range remains a significant challenge due
to long acoustic wavelengths, which necessitate structures
of substantially larger dimensions.
This paper addresses the design of an acoustic lens
dedicated to operation in the audible range. The proposed
lens is composed of periodically arranged acoustic unit
cells, enabling precise control over both the sound
transmission coefficient; the phase delay. A parametric
analysis of a single acoustic unit cell was performed,
followed by global optimization of the complete lens
structure using the Particle Swarm Optimization (PSO)
algorithm. The outcome of the study is an acoustic lens
design with predefined properties that demonstrate the
desired directional characteristics. The findings highlight
the potential of this approach for effectively manipulating
the acoustic wave field; the directivity of sound
sources within the audible frequency range.

Authors

Jadwiga Hyla

AGH University of Krakow

Jarosław Rubacha

AGH University of Krakow

Thursday May 28, 2026 9:00am - 9:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Equipment, Lecture

Presentation Type Lecture

9:30am CEST

Spectral Optimization for Automatic Multitrack Mixing Using Answer Set Programming

Thursday May 28, 2026 9:30am - 10:00am CEST

Aud 43

The mixing stage in music production involves a complex set
of interdependent technical; creative decisions aimed at
achieving a coherent; industry-level result. Intelligent
Music Production (IMP) is an emerging research area that
integrates Artificial Intelligence techniques into music
creation; post-production processes, spanning from
composition to mastering. Within this context, Answer Set
Programming (ASP), a declarative paradigm from Knowledge
Representation; Reasoning, has proven effective for
modeling; solving complex optimization problems. This
article presents frmixerr, an ASP-based intelligent system
designed to optimize the mixing process by automatically
generating balanced mixes. The system formulates mixing as
a combinatorial optimization problem; evaluates
candidate solutions against a reference spectral profile.
To assess its performance, a subjective listening test was
conducted comparing mixes generated by frmixerr with mixes
produced by human engineers with varying levels of
professional experience. The results indicate no
significant differences in perceived quality between
frmixerr mix; those created by professionals, suggesting
that ASP constitutes a viable approach for intelligent
assistance in music mixing.

Authors

Carlos Benítez

Tec de Monterrey

Flavio Everardo

Tec de Monterrey, University of Potsdam

Thursday May 28, 2026 9:30am - 10:00am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Applications and Technologies, Lecture | Audio Processing, Lecture

Presentation Type Lecture

9:30am CEST

Mutual coupling investigation of bass horn loaded speakers

Thursday May 28, 2026 9:30am - 10:00am CEST

Aud 44

In today’s live; electronic music events there are some
sound reinforcement systems that are using horn loaded bass
speaker cabinets to provide the low-end section. Especially
for the electronic music applications the PA system is
designed to use one or multiple clusters of bass cabinets
to provide the needed SPL; impact in the low frequency
range. Despite being large; heavy the horn loaded bass
speakers have some advantages like the efficiency;
directivity which makes them a great option for electronic
music. Even more, the enthusiasts are describing them as
having a longer projection of the sound when compared with
bass reflex units. When used in clusters the bass horns
present a mutual coupling due to a larger mouth surface
area; the physics behind. This effect alters the working
parameters in a good way regarding sound reproduction;
is clearly noticed at high levels. This mechanism increases
the output close to the low edge of the frequency response
interval; changes the directivity pattern. A cluster of
four or six double 18” horn loaded bass bins placed in the
front middle of a dance area will provide good impact
described a “punchy” sound, so acclaimed in the electronic
music party scene. In this paper I will describe an
investigation of the mutual coupling between horn cabinets
using electrical; acoustical measurements to reveal the
mentioned above mechanism. Electrical impedance measurement
together with SPL; frequency response in coupled;
uncoupled scenarios are used to describe; demystify the
mutual coupling phenomena.

Authors

Aurelian Botau

Sound system design engineer, Resound

Sound system design and calibration engineer.
I am running a company providing professional sound systems and DJ equipment rental. Sound system setup design, numerical simulations and technical support are included in the portfolio.
Horn speakers and Vacuum tube amplifiers enthus... Read More →

Thursday May 28, 2026 9:30am - 10:00am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Applications and Technologies, Lecture | Audio Equipment, Lecture | Sound Design, Lecture

Presentation Type Lecture

10:00am CEST

Experimental study of sound zone methods for indoor/outdoor active noise cancellation

Thursday May 28, 2026 10:00am - 10:30am CEST

Aud 44

The development of personal sound zone systems in recent
years show great potential for low-frequency noise control
outside of noisy spaces. These approaches show promising
applications to manage noise pollution arising from
concerts in large venues or urban festivals. However, most
of the literature considered that the created sound zones
would exist in the same room or acoustic space as the noise
source. This premise hence discards all setups where the
disturbances would occur outside of concert venues (e.g in
neighboring houses). This paper presents a first
experimental study of the behavior of sound zone methods
for indoor sound zones; outdoor noise sources. These
initial results present a good efficiency of these methods
in this edge case, opening new use cases for these
approaches.

Authors

Lucas Hocquette

L-Acoustics

Yves Pene

Research Engineer, L-Acoustics

Thursday May 28, 2026 10:00am - 10:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Acoustics of Music Rooms, Lecture | Audio Applications and Technologies, Lecture | Audio Equipment, Lecture | Audio Processing, Lecture

Presentation Type Lecture

10:00am CEST

Beyond Species Identification: Real-Time Spatial Interaction Analysis in Avian Bioacoustics Using Microphone Arrays; Hybrid Beamforming on Edge Architectures

Thursday May 28, 2026 10:00am - 10:30am CEST

Aud 43

Conventional ornithological monitoring systems rely heavily
on single-channel recorders; deep learning classifiers
to identify "what" species is present, but fail to capture
"where" it is located or how individuals interact
spatially. This limitation hinders the study of complex
ecological behaviors, such as inter-specific spacing in
dense vegetation; predator-prey dynamics. We propose a
novel, dual-mode acoustic localization system designed to
unify semantic classification; spatial tracking.
Utilizing an economically scalable 16-channel Uniform
Rectangular Array (UMA-16) interfaced with edge-computing
platforms, we implement a hybrid spatial filtering pipeline
structured to balance real-time latency constraints with
achievable angular resolution. The first stage employs a
computationally efficient, noise-robust linear scanning
technique to generate an acoustic energy map; estimate
source multiplicity. This preliminary data initializes a
second-stage, super-resolution spectral estimation
algorithm predicated on signal-noise subspace
orthogonality, allowing the noise robustness of
non-parametric beamforming methods with the precision of
parametric approaches. By integrating these spatial filters
with standard deep learning classifiers, the system
resolves overlapping vocalizations in "Cocktail Party"
scenarios; improves Signal-to-Noise Ratio (SNR) for
cryptic species detection. We address the physical
"Localization-Detection Range Disparity," demonstrating
that while detection is viable at long ranges, precise
localization is constrained by the array aperture to the
near-to-mid field. The system outputs real-time video
overlays of acoustic heatmaps for field observation;
generates autonomous volumetric territory maps in fixed
deployments, collectively providing ornithologists with a
robust capability for analyzing the spatial ecology of
avian vocalizations.

Authors

Emre Göktuğ AKTAŞ

Istanbul Technical University

Mesut Kartal

Istanbul Technical University

Thursday May 28, 2026 10:00am - 10:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Applications and Technologies, Lecture | Audio Processing, Lecture | Cross-Disciplinary Sound Studies, Lecture

Presentation Type Lecture

10:00am CEST

Comparative Quantitative Analysis of Immersive Mixing Practices: Tracking Spatial Trends in Award-Winning; Popular Streaming Media

Thursday May 28, 2026 10:00am - 10:30am CEST

Aud 42

Since 2021, 7.1.4 musical content has transitioned from a
niche specialty to a mainstream commercial deliverable
within major streaming ecosystems. However, industry
discourse indicates a disparity in how the immersive stage
is utilized across different production tiers. This paper
presents a targeted quantitative study of thirty 7.1.4
tracks (N = 30 total; 15 per category; 2021–2026),
employing a matched-pair sampling strategy driven by the
availability of 'Established Excellence' (Grammy
Award-winning/nominated immersive albums) against
genre-equivalent 'Market Dominance' (top-charting streaming
tracks). The study utilizes a multi-parameter measurement
methodology, including Inter-Channel Cross-Correlation,
hemispheric symmetry; spatial width analysis.
Furthermore, vertical spectral centroid distribution;
channel occupancy (Center; LFE) are analyzed to identify
recurring structural immersive design markers. Preliminary
findings suggest a consistent forward-facing bias; lower
activity in select channels in charting commercial releases
compared to award-recognized counterparts. By documenting
these technical indicators, such as quarter-sphere
correlation; LFE handling differences, this study
establishes a benchmark for current immersive mixing
practices; highlights the technical indicators that may
limit the transition from enhanced stereo to true immersive
envelopment.

Authors

Can Murtezaoglu

Research Assistant, Istanbul Technical University

Immersive audio recording and mixing techniques, audio design for visual media

Thursday May 28, 2026 10:00am - 10:30am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Immersive Audio, Lecture

Presentation Type Lecture

10:30am CEST

Effect of an Active Acoustic Reinforcement System on Musical Performance in a Recording Studio

Thursday May 28, 2026 10:30am - 11:00am CEST

Aud 42

This work presents the results of a perceptual study
investigating the influence on musicians of a virtual
acoustics system installed in the live room of a
professional recording studio. The study focused on
analyzing relationships between a selection of objective
acoustic parameters (T30, STLate, LJ); subjective
perceptions of 19 solo
musicians performing under 11 different acoustic
conditions. The experiment was conducted using the VAT
(Virtual Acoustic Technology) system; the VAT Suite
software developed at the Immersive Media Laboratory
(IMLab) in the Sound Recording Department at McGill
University. Correlations between quantitative;
qualitative analyses
show that musicians’ preferences converge on conditions
with T30 ≈ 1 s,; that late; lateral energy increases
the perception of spatiality, providing a positive balance
between clarity; acoustic support. However, longer
reverberation reduces comfort; executive control.

Authors

Gianluca Grazioli

Montreal, Canada, McGill University

Richard King

McGill University, McGill University

Montreal

Wieslaw Woszczyk

McGill University

Thursday May 28, 2026 10:30am - 11:00am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Acoustics of Music Rooms, Lecture | Immersive Audio, Lecture | Perception, Lecture | Recording Production and Reproduction, Lecture

Presentation Type Lecture

10:30am CEST

Confidently Wrong: Evaluating AudioSet-Trained Models Under Real-World Deployment

Thursday May 28, 2026 10:30am - 11:00am CEST

Aud 43

Audio event-classification models trained on AudioSet are
widely adopted; form a central component of the state of
the art in machine listening, yet their behavior when
deployed in complex, open acoustic environments remains
largely unexplored. In this study, we evaluate several
widely adopted AudioSet-pretrained
architectures—particularly models from the PANNs family,
including MobileNetV2; Wavegram; Transformer-based
PaSST model—when applied to a real operational scenario at
the commercial Port of Valencia, Spain. We observed a
recurring; systematic unexpected behavior: the models
frequently assigned disproportionately high probability to
the class Music for non-musical industrial;
transportation sounds. These mislabeled events included
train-wheel squealing, motorcycle acceleration, emergency
sirens,; reversing beeps—sound categories that are
common in port logistics environments but acoustically
different from music. By analyzing the probability
distributions output by the models, we demonstrate that
this erroneous Music activation is not an isolated failure
but a pervasive pattern across several architectures. Our
findings highlight a critical gap in the robustness;
domain generalization of AudioSet-derived models;
emphasize the need for targeted adaptation techniques when
deploying them in real industrial settings.

Authors

Javier Naranjo Alcazar

Instituto Tecnologico de Informatica (ITI), Paterna, Spain

Jordi Grau de Haro

Instituto Tecnológico de Informática

Marta Garcia Ballesteros

Instituto Tecnologico de Informatica (ITI), Paterna, Spain

Pedro Diego Zuccarello González Victorica

Ruben Ribes Serrano

Instituto Tecnologico de Informatica (ITI), Paterna, Spain

Thursday May 28, 2026 10:30am - 11:00am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture

Presentation Type Lecture

10:30am CEST

Nonlinear viscoelasticity in loudspeaker suspensions

Thursday May 28, 2026 10:30am - 11:00am CEST

Aud 44

Damping in viscoelastic materials such as rubbers is often
desirable, especially in loudspeaker suspensions. Under
high strain loads however, viscoelastic materials can also
exhibit a hysteretic stiffness behavior, causing a
stiffness decrease with amplitude. In this study, we
examine the viscoelastic rubber suspension of a
loudspeaker, using the loudspeaker motor system as actuator
; sensor. From measurements we observe the hysteretic
force-displacement behavior; pronounced odd-order
harmonic distortion even at low amplitudes, in accordance
with the literature. We further explore a
macro-thermodynamic plastic flow model to model the
stiffness of viscoelastic materials. The results show that
the plastic flow suspension model explains; replicates
the observed nonlinear hysteretic behavior. We also show
that a fitted time-domain loudspeaker model including
plastic flow matches the measured distortion profile. In
contrast, models with polynomial stiffness; viscous
damping fail to explain the observed amplitude dependencies
such as odd order harmonic levels. The experiments
demonstrate that viscoelastic hysteresis occurs not only at
high but also at low amplitudes, where the elastic
stiffness is approximately linear.

Authors

Finn Agerkvist

Technical University of Denmark

My interest are loudspeakers (measurements, modelling, (nonlinear) parameter estimation, nonlinear compensation. Active noise control, indoor and outdoor sound field control

Franz Heuchel

GN Audio

Manuel Hahmann

Dynaudio A/S

Thursday May 28, 2026 10:30am - 11:00am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Equipment, Lecture | Recording Production and Reproduction, Lecture

Presentation Type Lecture

11:00am CEST

Audio data augmentation techniques for frame drum stroke recognition

Thursday May 28, 2026 11:00am - 11:30am CEST

Aud 43

This work addresses the problem of frame drum (bendir)
stroke technique recognition in simulated real-world
conditions. The traditional frame drum technique includes
three discrete strokes that are used to create rhythmic
patterns, dum, tek; slap. In the presented work, audio
data augmentation is investigated on a dataset containing
recordings of instruments of various construction
attributes. The used techniques are selected in the
direction of generalizing classification in real-world
conditions. Moreover, the mixing of the frame drum samples
with accompanying guitar chords is introduced, simulating
the more complicated problem of hit technique recognition
when playing in a duo. The application of the
aforementioned data augmentation leads to the formation of
different available datasets for training; testing. Two
convolutional neural network architectures (one-;
two-dimensional) are taken into consideration, trained on
waveforms; melscale spectrograms of the different
subsets accordingly.

Authors

Antonis Pagonis

Pagonis Percussion

Charalampos Dimoulas

Aristotle University of Thessaloniki

Labros Vasileiou

Aristotle University of Thessaloniki

Nikolaos Vryzas

Aristotle University of Thessaloniki

Dr. Nikolaos Vryzas was born in Thessaloniki in 1990. He studied Electrical & Computer Engineering in the Aristotle University of Thessaloniki (AUTh). After graduating, he received his master degrees on Information and Communication Audio Video Technologies for Education & Production from the Interdepartme... Read More →

Thursday May 28, 2026 11:00am - 11:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture

Presentation Type Lecture

11:00am CEST

Input-output linearization of loudspeaker dynamics via automatic differentiation

Thursday May 28, 2026 11:00am - 11:30am CEST

Aud 44

Input-output linearization is a technique for compensating
nonlinear distortion in loudspeakers. To apply it to
complex loudspeaker models, we describe an end-to-end
framework for estimating model parameters from data;
deriving the linearizing control laws using automatic
differentiation. The parameter estimation approach combines
frequency-domain linear parameter estimation with a
time-domain prediction-error method for the nonlinear
parameters. The linearization approach supports non-linear
reference systems; stabilization of the control law
using trajectory tracking. We implement the framework in
dynax, an open-source Python package based on JAX,;
validate it experimentally as a feed-forward controller on
a closed-box loudspeaker. Results demonstrate validation
errors of 1--5\,\% NRMSE; total harmonic distortion
reductions of 6--12\,dB. The framework enables researchers
; engineers to rapidly prototype; validate complex
loudspeaker models for distortion compensation without
manual symbolic derivations.

Authors

Finn Agerkvist

Technical University of Denmark

My interest are loudspeakers (measurements, modelling, (nonlinear) parameter estimation, nonlinear compensation. Active noise control, indoor and outdoor sound field control

Franz Heuchel

GN Audio

Thursday May 28, 2026 11:00am - 11:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Processing, Lecture | Recording Production and Reproduction, Lecture

Presentation Type Lecture

11:00am CEST

Comfortability analysis of immersive sound playback system for cabin noise based on frontal lobe fNIRS experiment: an application of 4th order ambisonics

Thursday May 28, 2026 11:00am - 11:30am CEST

Aud 42

This study introduces a fourth-order Ambisonics-based decoding system to reproduce railway cabin running noise in a studio environment, enabling enhanced spatial impression and detailed sound field variation. Real-world operational noise was recorded using a multichannel fourth-order Ambisonics microphone (Eigenmike® EM32, mh acoustics LLC, USA), and the reproduced sound field was implemented through a multichannel loudspeaker system. The reproduced signals were quantitatively compared with the original operational noise in terms of spectral variation and waveform distortion.

Authors

Yonghee Lee

Research Associate, Changwon National University

Yonghee Lee
Ph D. Mechanical Engineeing.
Ultrasonic, Acoustic, SHM, NDE, fNIRS, and Bio-medical engineering.
Contact: [email protected]
Institute: Changwon National Uniersity, South Korea

Thursday May 28, 2026 11:00am - 11:30am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Immersive Audio, Lecture

Presentation Type Lecture

11:30am CEST

System-Level Remapping for Electronic Music Spatial Reproduction: A Case Study of the Cross-Venue Reperformance of Symphonic Coding

Thursday May 28, 2026 11:30am - 12:00pm CEST

Aud 43

Taking the premiere and reperformance of the sci-tech symphonic suite Symphonic Coding as a case study, this paper discusses audio system organization, sound diffusion, and cross-venue migration in the co-performance of symphonic and electronic music. Given the challenges of diverse live inputs, real-time control of the electronic music part, concurrent recording and live streaming, and varying acoustic conditions, the article analyzes how a single workflow handles traditional miking, electronic music generation and control, live spatial diffusion, and multi-purpose distribution. The study is structured across four levels: system design requirements, signal organization, dual-venue implementation, and engineering discussion. It illustrates the development of an interconnected workflow comprising Content, Rendering, and Distribution Layers through mixing console organization, immersive rendering, and AoIP distribution. Results indicate that the significance of this work lies not in the reproduction of the listening experience of the entire performance, but in enabling the spatial presentation of the electronic music part to remain valid across different environments based on a consistent reference. Furthermore, the project enhances reperformance capability and production flexibility through the separation of functions, roles, and systems.

Authors

Chuhan Gao

Communication University of China

Xiuquan Yao

Communication University of China

Yilong Zhang

Communication University of China

Thursday May 28, 2026 11:30am - 12:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Applications and Technologies, Lecture | Immersive Audio, Lecture

Presentation Type Lecture

11:30am CEST

Virtualization-Based Mechanical Loudspeaker Protection Using Nonlinear Wave Digital Modeling

Thursday May 28, 2026 11:30am - 12:00pm CEST

Aud 44

Mechanical overload remains a primary limitation in
high-output loudspeaker operation, particularly at low
frequencies where large coil excursions are required.
Conventional mechanical protection strategies are typically
implemented as signal-domain limiters or filters, which act
indirectly on the loudspeaker’s mechanical state; may
introduce discontinuities, spectral modification, or
unnecessary attenuation.

This paper proposes a methodological framework for
mechanical loudspeaker protection based on the
virtualization of admissible system behavior. The approach
is formulated within a nonlinear wave digital loudspeaker
model; realized using a direct–inverse–direct
architecture. Mechanical protection is embedded directly
into the virtual loudspeaker dynamics by shaping the
nonlinear suspension compliance as a function of voice-coil
displacement. As the excursion approaches a prescribed
admissible limit, the virtual compliance is progressively
reduced using a smooth raised-cosine law, resulting in a
continuous increase of the virtual mechanical stiffness.
Excessive excursion is therefore prevented as a consequence
of the system dynamics, without explicit limiting,
clipping, or signal-domain intervention.

The proposed framework is evaluated through numerical
simulations using steady-state low-frequency sinusoids;
low-frequency sine bursts under free-air loading. Results
are compared against an unprotected loudspeaker; a fixed
high-pass filter configured to meet the same excursion
constraint. The simulations verify that the proposed method
enforces a soft excursion ceiling without discontinuities,
preserves low-frequency output in the near-limit operating
region,; exhibits stable; immediate recovery
following transient excitation. Distortion behavior is
characterized; shown to increase smoothly as a result of
the introduced mechanical nonlinearity.

The results demonstrate that mechanical protection can be
realized as an emergent property of a virtual loudspeaker
model rather than as an external control action. The
proposed approach provides a physically interpretable;
numerically robust foundation for virtualization-based
loudspeaker protection.

Authors

Lucio Bianchi

Elettromedia s.p.a.

Thursday May 28, 2026 11:30am - 12:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Equipment, Lecture

Presentation Type Lecture

11:30am CEST

The efficacy of phantom image perception: an active listener perspective.

Thursday May 28, 2026 11:30am - 12:00pm CEST

Aud 42

A “phantom image” is the illusion of an independent sound
source created by two or more loudspeakers. Most often
created by manipulating level differences between
stereophonic channels (aka, “panning”), the effect is used
to create a sense of auditory space between loudspeakers
; is largely taken for granted. In recent years,
surround; immersive audio systems have attempted to
utilize phantom image processing to render audio objects in
desired positions across multiple loudspeaker arrays. This
research examined the efficacy of phantom image perception
horizontally; vertically from an active listener
perspective. After listening to a target loudspeaker,
listeners (n = 442) were asked to move a phantom sound to a
position to match that of the target loudspeaker. The
listener’s phantom placement was then compared to the
target,; subjects were allowed “correct” their phantom
position. The horizontal experiment was based on a
standard stereophonic 60° loudspeaker array with the target
loudspeaker at 15° off center. The vertical experiment
utilized elevated loudspeakers in a 60° arc with the target
loudspeaker elevated 10° above the horizon (lower
loudspeaker). Results show nearly universal “undershoot” in
horizontal placement error on first attempts with gradual
improvement over trials that coalesced around the projected
target location. However, after repeated tries, final
perceptual image locations were spread over 2/3 of the
sound-field around the target loudspeaker. In the vertical
trials perceptual locations were spread across the entire
sound field in all three trials; failed to show any
patterns of coalescence around the target loudspeaker.

Authors

Song Hui CHON

Associate Professor, Belmont University

Associate Professor of Audio Engineering Technology, interested in the perception and cognition of music and sound, especially timbre and attention. An amateur historical keyboardist. And my first name sounds like "song-he" as in "The song he sang was beautiful."

Wesley Bulla

Belmont University

Thursday May 28, 2026 11:30am - 12:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Immersive Audio, Lecture | Perception, Lecture | Recording Production and Reproduction, Lecture

Presentation Type Lecture

1:30pm CEST

A New Reference Target Curve for Studio Headphones

Thursday May 28, 2026 1:30pm - 2:00pm CEST

Aud 44

Target curves for the sound signature of headphones are a
helpful design target during the development process. While
a lot of attention has been made to ﬁ nd target curves that
match the listening preference of consumers, equivalents
for studio headphones date back to the 90’s. In the context
of music production a mutual target or even standard is
essential as to make mixing; mastering more
gear-independent. This becomes even more important since
the main tool for sound engineers shifts from loudspeakers
in professional environments such as acoustically treated
studios to headphones, often additionally equipped with
virtualization algorithms. This enables them to be more ﬂ
exible; to rely less on potentially expensive
loudspeaker setups. The diffuse ﬁ eld target curve that is
currently still the only standardized target curve for
studio headphones is often reported to not match a real
loudspeaker-equivalent of studio environments. In this
paper, we approach to ﬁnd a new standard target curve for
studio headphones emulating the frequency response of a
loudspeaker setup in modern studio environments.
For this, we give an overview of current target curves;
match them to their equivalent loudspeaker setups.
Based on that we propose a new methodology for a
measurement-based target curve incorporating typical
panning paradigms of music signals based on measurements
inside multiple control rooms. To verify the results, we
conduct listening tests with professionals in multiple
studio environments.

Authors

Jonas Foerster

Signal Processing Engineer, beyerdynamic GmbH & Co. KG

Passionate about Headphones, Signal Processing and their interaction.

Focus on headphone target curves, spatial audio and ANC

Lukas Keppler

beyerdynamic GmbH & Co. KG

Thursday May 28, 2026 1:30pm - 2:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Acoustics of Music Rooms, Lecture | Audio Equipment, Lecture | Perception, Lecture | Recording Production and Reproduction, Lecture

Presentation Type Lecture

1:30pm CEST

Joint Neural Translation; Classification of Videos for Audio Processing

Thursday May 28, 2026 1:30pm - 2:00pm CEST

Aud 43

A low-parameter-count machine-learning model for
classifying streaming video can enable content-aware
audio/video processing on consumer edge devices with
latency, computational,; battery constraints. In this
paper, we propose a low-compute classification technique
that uses only text metadata from the streaming file
header, enabling near-instantaneous inference without
decoding; analyzing audio or video signals as is
traditionally done. In particular, to support multilingual
platforms such as YouTube, we first apply neural machine
translation as a pre-processing step for the text metadata
; optimize a lightweight neural classifier for a
three-class audio-centric classification taxonomy (movie,
music, dialog/other). Experiments on a mixed-language
YouTube dataset achieve $\approx$90\% classification
accuracy on a test set using a combined translation; a
classification model (with only $\sim22K$ parameters),
demonstrating a globally-scalable approach for robust
classification on the edge.

Authors

Alejandro Cajica

Samsung Research Mexico

Sunil Bharitkar

Samsung Research America

Thursday May 28, 2026 1:30pm - 2:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Processing, Lecture

Presentation Type Lecture

2:00pm CEST

Personalized VR for hearing research with embedded devices

Thursday May 28, 2026 2:00pm - 2:30pm CEST

Aud 42

Deep learning has significantly improved speech enhancement
performance in controlled laboratory conditions, yet these
advances rarely translate into robust real-world benefit
for hearing aid users. Current algorithms are trained;
evaluated in simplified acoustic scenarios, neglecting
multimodal cues, user interaction, environmental dynamics,
; the strict latency; power constraints of embedded
devices. As a result, a persistent gap remains between
algorithmic performance; everyday listening experience.
This position paper reviews recent progress in speech
enhancement, embedded Artificial Intelligence hardware,;
hearing aid systems,; argues for a shift toward
ecologically valid evaluation; hardware-aware design. We
propose virtual reality as a reproducible, multisensory
benchmarking platform enabling joint assessment of human
perception; algorithmic processing. This perspective
outlines a research roadmap toward adaptive, context-aware,
; practically deployable hearing technologies.

Authors

Romain Michon

INRIA

Romain Serizel

LORIA - Laboratoire Lorrain de Recherche en Informatique etnses Applications

Stefania Serafin

Department of Engineering Technology and Didactics,nTechnical University of Denmark

Thursday May 28, 2026 2:00pm - 2:30pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Applications and Technologies, Lecture | Audio Processing, Lecture | Perception, Lecture

Presentation Type Lecture

2:00pm CEST

The Perception; Measurement of Nonlinear Distortion in Headphones

Thursday May 28, 2026 2:00pm - 2:30pm CEST

Aud 44

Few studies exist on the perception; measurement of
nonlinear distortion in headphones. This paper reports the
detection thresholds; perceived sound quality from real
distortion in headphones. Five different distortion
measurements were made on the headphones to determine how
well they predict audibility; quality. Music samples
were binaurally recorded on six headphones at playback
levels ranging from 85 to +110 dBA at 3 dB increments. The
recordings were reproduced at a normal playback level (83
dBA) through a reference headphone with low distortion. The
headphone recordings were post-processed to remove both
level; frequency response differences so only nonlinear
distortions; residual noise remained. In a second test,
listeners rated the similarity in quality of headphones
relative to an undistorted reference; a hidden version
of it. The results provide evidence audible distortion in
headphones with music occurs at significantly higher
playback levels (104 to 112 dBA SPL) than what is
considered typical; safe. The percentage of measured THD
in the headphone had the highest correlation with the
detection thresholds while the non-coherent distortion with
music best predicted the similarity ratings. We discuss the
results; the practical implications they might have on
future headphone design, testing; measurement.

Authors

Pierre-Emmanuel Lelièvre

Rtings

Sean Olive

Audio Consultant, Sean Olive Audio Consulting

United States

Thursday May 28, 2026 2:00pm - 2:30pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Equipment, Lecture | Perception, Lecture

Presentation Type Lecture

2:00pm CEST

Perceptual Model Considering Comodulation Masking Release by Postmasking Adaptation

Thursday May 28, 2026 2:00pm - 2:30pm CEST

Aud 43

This work presents a perceptual model based on a complex
IIR filterbank. The filterbank with a frequency resolution
of 4 bands per Bark consists of 104 filters whose slopes
are designed to take spectral masking effects into account.
The filter outputs are used to obtain masking thresholds
with the following post processing. To obtain resonable
masking thresholds from the spreading outputs, a post
masking stage is required. Here, we propose a comodulation
dependent adaptation of the postmasking decay to model
Comodulation Masking Release (CMR) effects. This approach
explicitely considers the dip-listening effect known from
literature. The final masking thresholds are obtained by
weighting the postmasking outputs by a tonality dependent
gain, controlled using spectral flatness estimation. A
listening test compares the proposed method to an already
known approach using direct CMR based modification of the
masking threshold gains.

Authors

Bernd Edler

International Audio Laboratories Erlangen, Germany

Fabian Schaller

Fraunhofer IIS, Erlangen, Germany

Thursday May 28, 2026 2:00pm - 2:30pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Processing, Lecture | Perception, Lecture

Presentation Type Lecture

2:30pm CEST

A Recursive Attractor Network for Long-Form Sound Source Localization; Identity Tracking with a Variable Number of Sources

Thursday May 28, 2026 2:30pm - 3:00pm CEST

Aud 42

Sound source localization; identity tracking are
fundamental tasks in acoustic scene analysis, enabling
machines to determine what, where; when produces sound
events. While deep attractor-based networks have
demonstrated improved performance under an unknown number
of sources, maintaining continuous source tracking over
long-form audio remains challenging due to memory
limitations; permutation ambiguities across adjacent
segments. In this paper, we propose a Recursive Attractor
Network (RANet) for long-form sound source localization;
identity tracking with a variable number of sources. RANet
explicitly represents source attractors as transferable
embeddings; recursively propagates them across adjacent
audio segments using a LSTM-based model, thereby preserving
source identity continuity over time. Experimental results
on simulated datasets demonstrate that RANet achieves
robust long-form sound source localization; consistent
source identity tracking, outperforming baseline approaches
under variable; dynamic source conditions.

Authors

Jiaqi Du

Peking University

Tianshu Qu

Peking University

Xihong Wu

Peking University

Thursday May 28, 2026 2:30pm - 3:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Processing, Lecture

Presentation Type Lecture

2:30pm CEST

Optical MEMS microphones leverage architectural advantages to achieve 80dB SNR

Thursday May 28, 2026 2:30pm - 3:00pm CEST

Aud 44

There are three architectural approaches to
microelectromechanical systems (MEMS) microphones,
miniature devices used in a wide range of products.
Capacitive microelectromechanical systems (MEMS)
microphones are embedded in billions of consumer
electronics. Solder-compatible; providing tight
part-to-part sensitivity matching—all in a small
footprint—capacitive MEMS microphones have demonstrated
improved performance in recent years. State-of-the-art
digital capacitive MEMS microphones can now achieve up to
72dB signal-to-noise ratio (SNR), with a 22dBA noise floor
; overall dynamic range in the order of 106 dB.

However, capacitive MEMS microphone technology has now
reached the limits of its architecture, which constrains
the key audio performance metrics: SNR; acoustic
overload point (AOP).

Piezoelectric MEMS microphones have not demonstrated SNR
performance exceeding 65dB,; require new materials to be
developed to increase their performance.
Optical MEMS microphones—a new architectural approach that
combines a laser optical subsystem, a MEMS; advanced
CMOS circuit design—has exceeded the limits of capacitive
technology. With 80dB SNR supporting a 14 dBA noise floor,
132 dB dynamic range,; a 146dB AOP, optical MEMS
microphones accomplish studio-quality performance in a tiny
form factor that supports semiconductor-level yields in
high-volume manufacturing.

This presentation will explain the architectural
advancements of optical MEMS microphones in comparison to
capacitive MEMS microphones. It will provide example use
cases of high-SNR; high-AOP microphones in high volume
applications.

Authors

Jakob Vennerød

sensiBel

Thursday May 28, 2026 2:30pm - 3:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Applications and Technologies, Lecture | Audio Equipment, Lecture

Presentation Type Lecture

2:30pm CEST

EMORSION – Examining the Impact of Audio Features on Emotional Responses; Immersion in Film.

Thursday May 28, 2026 2:30pm - 3:00pm CEST

Aud 43

EMORSION is an exploratory study examining how film audio
design shapes audience emotion; immersion. It was
conducted using scenes from four films in the horror (2)
; drama (2) genres, with two mainstream; two
independent productions. For each scene, multiple
alternative audio mixes were created by systematically
manipulating three core aspects of audio design; frequency
(pitch), dynamics (loudness),; directionality (spatial
placement). Three audience groups were exposed to the
scenes in a cinema setting, with each group experiencing
either one manipulated audio mix; a control mix.
Audience responses were assessed through a multimodal
framework combining self-reported emotion; immersion via
a questionnaire,; physiological measures, including
heart rate monitoring; video-based motion tracking.
Results show that subtle changes in audio design
significantly affect emotional perception; immersion.
Unconventional mixes produced greater variability in
interpretation, while conventional immersive mixes led to
stronger agreement across audiences. Notably, participants
often reported perceived visual changes despite no
alterations to the visual content.

Authors

Bleiz Macsen Del Sette

Charalampos Saitis

Queen Mary University of London

George Fazekas

Queen Mary University of London

Josh Reiss

Professor, Queen Mary University of London

Josh Reiss is Professor of Audio Engineering with the Centre for Digital Music at Queen Mary University of London. He has published more than 200 scientific papers (including over 50 in premier journals and 6 best paper awards) and co-authored two books. His research has been featured... Read More →

Nelly Garcia

PhD Researcher, Queen Mary University of London

I'm Nelly Garcia.
I'm an engineer in communications and electronics with the specialty in acoustics.
Now, I'm a PhD Researcher at the Centre for Digital Music (C4DM) at Queen Mary University of London.
My main interest is sound design, ways to create sounds from scratch, optimize the workflow of a sound designer and innovative ways to label, categorise or access samples... Read More →

Ruby Crocker

Queen Mary University of London

Thursday May 28, 2026 2:30pm - 3:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Immersive Audio, Lecture | Perception, Lecture | Sound Design, Lecture

Presentation Type Lecture

3:00pm CEST

Sound Absorber Estimation with Deep Neural Network

Thursday May 28, 2026 3:00pm - 3:30pm CEST

Aud 42

Boundary conditions are a critical part of room acoustic
simulations. In the case of ray tracing, absorption
coefficients of nearly all materials are measured;
provided. However, wave-based simulations face several
issues. The first one is the variety of boundary conditions
used. Depending on the method, surface impedance or
admittance might be needed, either in the frequency or in
the time domain, as an angle-dependent or averaged
variable. This limitation hinders the development of a
standard measured quantity for boundary conditions in
wave-based simulations. In turn, this leads to the second
issue encountered, which is the lack of widely available
data to describe the characteristics of the different
materials commonly found in rooms. In this study, a deep
neural network has been trained to estimate the material
properties of porous absorbers from their absorption
coefficient in octave bands. These estimated material
properties can then be used to calculate any boundary
condition needed. This method thus allows to characterize
the boundary conditions for any type of room acoustic
simulation from the most commonly available data. Moreover,
it provides a new tool to identify the sound absorber
corresponding to a desired absorption profile during the
design phase of a project. The training dataset in this
study was generated from finite element method simulations.
The poroelastic properties of the material, the sample
thickness, as well as the depth of the air cavity backing
the material were varied to create the training dataset.

Authors

Boris Mondet

COMSOL A/S

Thursday May 28, 2026 3:00pm - 3:30pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Acoustics of Music Rooms, Lecture

Presentation Type Lecture

3:00pm CEST

Deep-Learning-Driven Sensory Profiling of Headphone Target Curves with Adaptive Listening Test Validation

Thursday May 28, 2026 3:00pm - 3:30pm CEST

Aud 44

Identifying robust headphone target curves is challenging
when preference data from untrained listeners are
interpreted without explicit perceptual structure. This
work presents a methodological framework in which deep-
learning-driven sensory-profile analysis serves as the
primary interpretive layer for listening data.
Candidate target curves are generated using an Interactive
Differential Evolution (IDE) listening experiment that
combines paired comparisons with a second- stage
absolute-rating task, enabling continuous exploration of the
perceptually relevant tuning space while reducing cognitive
load. Converged gain sets are analyzed using a Virtual
Listener Panel (VLP), a Deep Learning (DL) model trained on
large-scale expert evaluations to predict perceptual
attributes from rendered musical material. Predicted
attributes are reported as relative scores along key sensory
dimensions, including bass strength, timbral balance,;
brilliance, enabling exploration of sensory clusters,
perceptual trade-offs,; potential families of target
tunings.
Adaptive listening data from three culturally distinct
listener panels (Denmark, Japan,; Colombia; 20
participants
per site) support the DL-based interpretation. Convergence
is quantified as a reduction in population variance,
; cross-site analyses assess the similarity of clustering
structures; the consistency of relationships between
preference; sensory attributes. Overall, the framework
provides a scalable, perceptually grounded approach to
interpreting listener-preference data when developing
headphone target curves.

Authors

Gabriele Ravizza

Perceptual Audio Evaluation Specialist, FORCE Technology

▪ Acoustics, psychoacoustics, product development, and digital communication as an Audio Engineer in the consumer electronics industry.
▪ Currently employed as a specialist at FORCE Technology's SenseLab department, contributing to enhancing sound quality in a wide range of consumer electronics products, collaborating with audio companies from across the globe... Read More →

Julian Villegas

University of Aizu, University of Aizu

Japan

Thursday May 28, 2026 3:00pm - 3:30pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Applications and Technologies, Lecture | Audio Processing, Lecture | Perception, Lecture

Presentation Type Lecture

3:00pm CEST

Emergence; Spatial Directionality of Sa Quintina in the Sacred Vocal Tradition of Castelsardo, Sardinia, Italy: An Early-Stage Sonological–Acoustical Study

Thursday May 28, 2026 3:00pm - 3:30pm CEST

Aud 43

Sa quintina is a distinctive emergent vocal phenomenon
almost exclusively associated with the sacred polyphonic
singing tradition of Castelsardo, perceived as an
autonomous “fifth voice” arising during collective
performance by four male singers. Although widely
acknowledged in ethnomusicological literature, its
formation mechanisms remain only partially explored within
audio engineering; acoustical research.
This paper presents an early-stage, descriptive sonological
case study proposing new hypotheses on the formation;
spatial reinforcement of sa quintina. The phenomenon is
interpreted as a physically grounded, measurable outcome of
harmonic fusion; spatial interference, observable
through spectral energy distribution; coherence. It is
hypothesized to emerge from a converging set of
conditions—including non-tempered harmonic textures,
differentiated vocal emission techniques, intentional
formant tuning,; circular spatial configuration—none of
which is assumed to be strictly sufficient in isolation.
Building upon previous spectral coherence analyses, the
study introduces a Quintina Directionality Index (QDI) to
quantify the spatial dimension of the phenomenon. QDI is
defined as the ratio between spectral energy in two
frequency bands associated with sa quintina (600–750 Hz;
1200–1400 Hz); total spectral energy. The index is
evaluated as a function of direction using ambisonic
recordings in an anechoic chamber; as a function of
microphone position in a controlled field setting.
Preliminary observations suggest that sa quintina
corresponds to localized regions of enhanced spectral
coherence; energy reinforcement, supporting its
interpretation as an emergent physical phenomenon that
precedes; enables its perceptual salience, rather than a
purely auditory illusion.

Authors

Felicita Brusoni

PhD candidate Musikhögskolan i Malmö, Lund University

Luca Frigo

Conservatorio G. Nicolini Piacenza

Martino Sarolli

Conservatorio Paganini Genova

Riccardo Dapelo

Conservatorio Nicolini Piacenza

Thursday May 28, 2026 3:00pm - 3:30pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Processing, Lecture | Cross-Disciplinary Sound Studies, Lecture | Perception, Lecture

Presentation Type Lecture

3:30pm CEST

Center Extraction GAN

Thursday May 28, 2026 3:30pm - 4:00pm CEST

Aud 42

This paper presents a method for extracting a center signal
from two-channel stereo signals for upmixing;
reproduction with additional center loudspeakers.
It uses a generative adversarial network with a generator
trained with multiple reconstruction losses; adversarial
losses obtained from a discriminator.
The processing is of low computationally complexity, causal
; can be configured for latencies down to one audio frame
of 46 ms length.
It is described how training data are created using only
publicly available signals; how the generation of target
data enables to control the attenuation of diffuse signals
; direct signals panned off-center.
An evaluation with listening test; computational metrics
SI-SDR; F2 measure is presented.
It shows an advantage compared to methods based on
classical signal processing in terms of computational
metrics for source separation; listeners preference.

Authors

Andreas Walther

Fraunhofer IIS

Christian Uhle

Chief Scientist, Fraunhofer Institute for Integrated Circuits IIS

Christian Uhle is chief scientist in the Audio and Media Technologies division of the Fraunhofer IIS, Erlangen, Germany, and in the International Audio Laboratories Erlangen.
He received the Dipl.-Ing. and PhD degrees from the Technical University of Ilmenau, Germany, in 1997 and... Read More →

Julian Klapp

Fraunhofer Institute for Integrated Circuits IIS

Pablo Panter

Fraunhofer Institute for Integrated Circuits IIS

Thursday May 28, 2026 3:30pm - 4:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Applications and Technologies, Lecture | Audio Processing, Lecture

Presentation Type Lecture

3:30pm CEST

Measurement Uncertainty of MEMS Microphone Sensitivity in A Free-Field Condition

Thursday May 28, 2026 3:30pm - 4:00pm CEST

Aud 44

This work presents a measurement uncertainty evaluation of
the free-field sensitivity of a MEMS microphone using a
substitution comparison method. The measurement setup is
based on principles used in secondary microphone
calibration, with sensitivity determined relative to a
calibrated reference microphone. The uncertainty analysis
follows the Guide to the Expression of Uncertainty in
Measurement (GUM), where Type A; Type B uncertainty
evaluations are propagated through a defined measurement
model to obtain the final measurement result. The MEMS
microphone sensitivity is estimated together with an
expanded uncertainty, where the calibration uncertainty of
the reference microphone is identified as the dominant
contributor. Broadband results show that the measured
sensitivity is close to the typical manufacturer
sensitivity over a wide frequency range; follows a
similar frequency trend. The proposed approach enables
reproducible estimation of the free-field sensitivity of
MEMS microphones; provides a clear framework for
uncertainty evaluation.

Authors

Salvador Barrera Figueroa

Danish Fundamental Metrology A/S, 2970 Hørsholm, Denmark

Teguh Aditanoyo

DTU Electrical and Photonics Engineering, TechnicalnUniversity of Denmark (DTU), 2800 Kgs. Lyngby, Denmark

Thursday May 28, 2026 3:30pm - 4:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Equipment, Lecture

Presentation Type Lecture

3:30pm CEST

NAVIQUAL: Creating Spatial Audio Quality Maps for Virtual Live Music Environments

Thursday May 28, 2026 3:30pm - 4:00pm CEST

Aud 43

Live music environments can be simulated; evaluated
through spatial audio; augmented reality (AR)
technology. However, conducting perceptual studies on AR
environments can be challenging, as multiple design
considerations; uncontrolled variables come into play.
Hence, we developed Naviqual, a tool to create a spatial
audio quality map for a virtual live music environment. We
generated objective quality contour; polar maps to
predict the quality of experience (QoE) across listener
locations; directions respectively. We found that these
maps strongly aligned with perceptual evaluations by
normal-hearing listeners through listening tests. We also
found that binaural objective metrics; signal-to-noise
ratio both strongly predict QoE across listener
translations, with the former outperforming the latter in
predicting QoE across listener directions. Overall,
Naviqual provides a QoE map for virtual live music
environments robust across various listener locations;
directions, noise locations, music content,; room
acoustics.

Authors

Andrew Hines

Carl Timothy Tolentino

University College Dublin

Thursday May 28, 2026 3:30pm - 4:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Immersive Audio, Lecture | Perception, Lecture

Presentation Type Lecture

4:00pm CEST

Flow-HOA: Generative Joint Optimization for Ambisonics Encoding via Flow Matching

Thursday May 28, 2026 4:00pm - 4:30pm CEST

Aud 42

Higher-Order Ambisonics (HOA) encoding from sparse,
irregular microphone arrays remains a critical challenge
for consumer spatial audio capture in immersive
communication; XR. We propose Flow-HOA, a generative
framework that jointly optimizes a multi-dimensional
perceptual objective while producing a deployable,
time-invariant bank of Finite Impulse Response (FIR)
encoding filters. Using conditional flow matching, the
model learns to map a simple prior distribution to the
target distribution of FIR filter coefficients. Training is
guided by a composite loss that balances time-domain
waveform fidelity, multi-resolution spectral consistency,
sub-band energy preservation,; spatial directivity
constraints. Objective evaluations demonstrate improved
performance over strong model-based baselines in both
signal fidelity; spatial accuracy metrics. Subjective
listening tests further confirm that Flow-HOA yields higher
overall sound quality with reduced artifacts.

Authors

Tianshu Qu

Peking University

Xueyang Lv

Xiaomi Communications Co., Ltd

Yufan Qian

Peking University

Yuhuan You

Master, Peking University

Thursday May 28, 2026 4:00pm - 4:30pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Recording Production and Reproduction, Lecture

Presentation Type Lecture

4:00pm CEST

Accurate Characterization of Integrated Microphone Arrays for Device--Related Transfer Function Synthesis

Thursday May 28, 2026 4:00pm - 4:30pm CEST

Aud 44

This paper presents an improved method for characterizing
integrated microphone arrays for Device‑Related Transfer
Function (DRTF) synthesis. A probe‑array extension of the
IMPro technique is introduced to measure all device
microphones simultaneously, eliminating unknown timing
offsets that arise in asynchronous device–probe recordings.
A custom four‑element probe array; modular test jig were
developed to evaluate relative inter‑channel propagation
delay (RIPD) accuracy across varied microphone‑port
geometries. Hybrid free‑field DRTFs were synthesized by
combining IMPro data with Boundary Element Method (BEM)
acoustic scattering simulations, demonstrating that the
probe‑array measurements capture small delay variations
essential for precise spatial‑audio modeling. The extended
IMPro method offers a practical, scalable alternative to
anechoic‑chamber measurements for modern multi‑microphone
devices.

Authors

Hannu Pulakka

John Cozens

JCoustics

Matti Hamalainen

Head of Audio Technologies and Ecosystems, Nokia Technology Standards

Matti S. Hämäläinen is a seasoned expert in audio technologi...

Mikko Pekkarinen

Nokia Technology Standards

Thursday May 28, 2026 4:00pm - 4:30pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Applications and Technologies, Lecture | Audio Equipment, Lecture

Presentation Type Lecture

4:00pm CEST

Influences of Nonlinear Distortion in Music Playback on Listeners’ Stress Evaluated by PPI; RMSSD of PPG

Thursday May 28, 2026 4:00pm - 4:30pm CEST

Aud 43

The phenomenon in which listeners’ impressions of music are
unintentionally altered even when the same sound source is
played back remains an important issue. Previous research
has shown that the state; combination of audio equipment
affect the characteristics of nonlinear distortion in music
playback. Hence, we conducted a subjective evaluation of
auditory; musical impressions using sound sources with
various nonlinear distortions. However, the subjective
evaluation was unstable; difficult to assess. The reason
was that the sound change was perceived emotionally as a
slight change in sound image; musicality,; the
interpretation of evaluation terms varies widely among
subjects due to the difficulty of verbalizing the
impression. Therefore, we evaluated the change in
listeners’ stress caused by nonlinear distortion in music
playback using the photoplethysmography (PPG). In this
study, we conducted a follow-up experiment with improved
accuracy.
In the experiment, 41 subjects listened to sound sources
with even-order harmonic distortion at 2.69% THD, odd-order
harmonic distortion at 2.69% THD,; no distortion. The
musical piece of sound sources is an original to eliminate
familiarity; bias toward existing music.
We evaluated changes in subjects’ stress states using the
mean pulse-pulse interval (PPI); the root mean square of
successive differences (RMSSD), computed from the PPG
signal, as indicators of stress.
These results reconfirm that nonlinear distortion in music
playback affects listeners’ vital responses, as evidenced
by significant differences in both mean PPI; RMSSD, as
assessed by Cochran's Q test at the 5% significance level.

Authors

Kenshin Nakada

Tokyo University of Science

Shun Muramatsu

The University of Tokyo

Takahiro Yoshida

Professor, Tokyo University of Science

Thursday May 28, 2026 4:00pm - 4:30pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Perception, Lecture | Recording Production and Reproduction, Lecture

Presentation Type Lecture

4:30pm CEST

Personalized Timbre Optimization for Stereophonic Sound Reproduction via Earphones: Part 2 – Practical Implementation; Validation on Consumer TWS Devices

Thursday May 28, 2026 4:30pm - 5:00pm CEST

Aud 44

This paper presents Part 2 of our study on personalized
timbre optimization for stereophonic sound reproduction via
earphones, following our previous work presented at the AES
International Conference on Headphone Technology in 2025.
While Part 1 established a novel auditory-model-based
framework for reproducing a listener’s natural timbre
reference; demonstrated its perceptual validity under
controlled conditions, the present study focuses on the
practical implementation; validation of this approach
for real-world use with consumer True Wireless Stereo (TWS)
earphones.

Conventional headphone; earphone personalization
techniques primarily target spatial audio reproduction or
rely on preference-based equalization, often overlooking
the accurate reproduction of natural timbre in stereophonic
content. Our approach explicitly addresses this limitation
by isolating; optimizing perceptually relevant timbral
cues while excluding spatial encoding components, thereby
improving timbral fidelity without degrading stereo imaging.

The proposed method originally consists of four stages:
high-resolution anatomical scanning of the listener’s upper
body, including the pinnae, individualized HRTF computation
using the boundary element method, selective removal of
spatial encoding components to derive a personalized
reference target response curve (PR-TRC),; perceptual
optimization using a listener-specific weighting
coefficient grounded in auditory reference fidelity rather
than preference. In this paper, each stage is simplified
; automated using smartphone-based scanning;
AI-assisted processing, enabling end users to complete the
entire personalization process via a smartphone connected
to a cloud-based server. The resulting personalized target
response curve is implemented within the computational;
memory constraints of the DSP pipeline of commercial
consumer TWS earphones.

A subjective evaluation using the Semantic Differential
Method was conducted to assess the perceptual impact of the
simplified implementation. Twenty-four listeners evaluated
personalized target curves generated by both the original
; simplified methods, as well as two non-personalized
target curves commonly used in commercial TWS earphones.
The results show that both personalized methods
consistently outperform non-personalized conditions in
overall sound quality; listener preference. Importantly,
no statistically significant degradation in perceived
timbral naturalness was observed between the simplified;
original methods.

These findings demonstrate that auditory-model-based
personalized timbre optimization can be effectively
translated into a practical, consumer-ready technology. The
proposed approach represents a foundational contribution to
future audio personalization; has broad applicability
across headphone; earphone systems for stereophonic
sound reproduction.

Authors

Atsushi Hara

final Inc.

Haruto Hirai

final Inc.

Kimio Hamasaki

President, Artsridge LLC

Kimio Hamasaki, an AES Fellow, is a producer and balance engineer for music recordings, a researcher in spatial audio, an educator in audio engineering and acoustics, and a consultant in audio engineering. He has recorded and produced numerous orchestral and operatic works with the Vienna Philharmonic... Read More →

Mitsuru Hosoo

final Inc.

Nao Tojo

final Inc.

Shun Saito

final Inc./post-doc

Thursday May 28, 2026 4:30pm - 5:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Applications and Technologies, Lecture | Audio Equipment, Lecture | Audio Processing, Lecture | Perception, Lecture

Presentation Type Lecture

4:30pm CEST

A Parametric Dual-Channel Audio Coding via Learned Time-Frequency Masking

Thursday May 28, 2026 4:30pm - 5:00pm CEST

Aud 42

While Neural Audio Codecs (NAC) have revolutionized
monaural audio compression, achieving high-fidelity
dual-channel coding at low bitrates remains a significant
challenge. Existing approaches often rely on naive
independent channel quantization, leading to phase
incoherence, or entangled latent modeling, which sacrifices
spatial precision for spectral energy. This paper proposes
a novel dual-channel coding framework based on
contentspatial disentanglement. Reframing spatial
reconstruction as an informed source separation task, our
architecture synergizes a frozen, pre-trained DAC encoder
for robust mono content preservation with a
parameter-efficient side information encoder that predicts
fine-grained time-frequency masks. To ensure precise
spatial imaging, we introduce explicit physical constraints
into the end-to-end training. Experimental results indicate
that at low bitrates of 9; 11 kbps, the proposed method
outperforms state-of-the-art dual-mono neural baselines;
industry standards in both objective spatial metrics;
subjective MUSHRA evaluations.

Authors

Qingbo Huang

MMLab，ByteDance

Tianshu Qu

Peking University

Yihan Wang

Peking University

Yufan Qian

Peking University

Thursday May 28, 2026 4:30pm - 5:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Processing, Lecture

Presentation Type Lecture

4:30pm CEST

From Gaze to Gnosis: A Critical Framework for Embodied Audio Production

Thursday May 28, 2026 4:30pm - 5:00pm CEST

Aud 43

Audio engineering standards often present as objective, yet
they frequently rely on a systemic data bias which Perez
characterises as the 'default male bias' [1]. This paper
examines the hegemony of the male ear, a system of norms
that privileges masculine modes of hearing by prioritizing
technical structure; text over affective experience;
timbre [2]. By transitioning from a visual centric auditory
gaze toward an embodied sonic gnosis, researchers can
recover haptic; physiological ways of knowing sound.
Drawing on the feminist listening praxis of the Female Ear
[3], this work explores the recording studio as an
analytical space where sonic microaggressions [4] enforce
rigid technical standards. The author argues for a new
audio praxis that centers ear pleasures [5], validating
subjective; affective sensory data as legitimate
engineering input. This approach seeks to dismantle the
regulatory fiction [6] of a universal hearing standard,
promoting a pluralistic understanding of musicking [7] that
is inclusive of non normative perspectives.

Authors

Katie Ambrose

PhD Student, University of York

Katie is a postgraduate researcher at the University of York, working on a th...

Thursday May 28, 2026 4:30pm - 5:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Cross-Disciplinary Sound Studies, Lecture | Perception, Lecture | Recording Production and Reproduction, Lecture

Presentation Type Lecture

9:00am CEST

9:00am CEST

9:30am CEST

9:30am CEST

10:00am CEST

10:00am CEST

10:00am CEST

10:30am CEST

10:30am CEST

10:30am CEST

11:00am CEST

11:00am CEST

11:00am CEST

11:30am CEST

11:30am CEST

11:30am CEST

1:30pm CEST

1:30pm CEST

2:00pm CEST

2:00pm CEST

2:00pm CEST

2:30pm CEST

2:30pm CEST

2:30pm CEST

3:00pm CEST

3:00pm CEST

3:00pm CEST

3:30pm CEST

3:30pm CEST

3:30pm CEST

4:00pm CEST

4:00pm CEST

4:00pm CEST

4:30pm CEST

4:30pm CEST

4:30pm CEST

Get help with the event