AES Europe 2026: Full Schedule

Schedule as of May 16, 2022 - subject to change

Default Time Zone is CEST - Central European Summer Time
You can change your view to your time zone (look for "Timezone" on the right)

LIVESTREAMS : A and B

ON DEMAND VIDEOS (previous days)

arrow_back View All Dates

8:00am CEST

Atendee Registration

Thursday May 28, 2026 8:00am - 5:00pm CEST

Foyer Building 306

Thursday May 28, 2026 8:00am - 5:00pm CEST
Foyer Building 306 Technical University of Denmark Asmussens Alle, Building 306 DK-2800 Kgs. Lyngby Denmark

Registration attendees

9:00am CEST

Deep Learning-Based Lower-Layer Upmixing

Thursday May 28, 2026 9:00am - 9:30am CEST

Aud 43

This paper introduces a novel approach for generating a
lower layer in multichannel audio upmixing, addressing a
gap in existing methods that primarily focus on mid; top
layers. Leveraging Harmonic-Percussive Separation (HPS),
the proposed framework dynamically adjusts key parameters
(separation factor, harmonic attenuation,; phase shift)
to enhance percussive components while diffusing harmonic
elements. We compared three neural network architectures
for this task: LSTM, TCN,; Transformer. Experimental
results show comparable perceptual quality; objective
metrics across all models, with the TCN being the most
balanced; suitable for deployment on edge devices.

Authors

Ema Souza-Blanes

Samsung Research America

Luis Madrid

Samsung Research Tijuana

Thaddeus Páez

Research Engineer, Samsung Research Tijuana

Research Engineer at Samsung Mexico.

Thursday May 28, 2026 9:00am - 9:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Processing, Lecture | Immersive Audio, Lecture

Presentation Type Lecture

9:00am CEST

Design; Optimization of Acoustic Lenses for Audible Frequency

Thursday May 28, 2026 9:00am - 9:30am CEST

Aud 44

Acoustic lenses are structures that enable the focusing of
acoustic waves, with increasing applications in audio
devices like loudspeakers to concentrate energy toward a
listening position. While typically employed at higher
frequencies, achieving effective performance within the
audible frequency range remains a significant challenge due
to long acoustic wavelengths, which necessitate structures
of substantially larger dimensions.
This paper addresses the design of an acoustic lens
dedicated to operation in the audible range. The proposed
lens is composed of periodically arranged acoustic unit
cells, enabling precise control over both the sound
transmission coefficient; the phase delay. A parametric
analysis of a single acoustic unit cell was performed,
followed by global optimization of the complete lens
structure using the Particle Swarm Optimization (PSO)
algorithm. The outcome of the study is an acoustic lens
design with predefined properties that demonstrate the
desired directional characteristics. The findings highlight
the potential of this approach for effectively manipulating
the acoustic wave field; the directivity of sound
sources within the audible frequency range.

Authors

Jadwiga Hyla

AGH University of Krakow

Jarosław Rubacha

AGH University of Krakow

Thursday May 28, 2026 9:00am - 9:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Equipment, Lecture

Presentation Type Lecture

9:00am CEST

The Roaring Twenties - the first decade of consumer loudspeakers

Thursday May 28, 2026 9:00am - 10:00am CEST

Aud 49

The proposed workshop/tutorial serves as a prequel to the
presentation on the history of dynamic loudspeakers given
at the 158th Convention (Warsaw, 2025). It focuses on the
earliest phase of consumer loudspeaker technology in the
1920s, prior to the widespread adoption of dynamic
loudspeakers in the mass market.

Loudspeakers had been in use since the mid-1910s for public
address applications, and the rapid global expansion of
broadcast radio soon brought loudspeakers into domestic
use. The 1920s constituted a period of rapid innovation in
loudspeaker design, preceding the introduction of the
dynamic loudspeaker, which achieved significant commercial
impact only in the latter part of the decade.

The workshop/tutorial will examine consumer loudspeaker
technologies of the 1920s, the concurrent advancements in
audio electronics and signal sources that enabled
subsequent developments, and the earliest efforts in
systematic loudspeaker theory and measurement.

Two loudspeaker types dominated this period: horn
loudspeakers driven by electromagnetic drivers similar to
those used in headphones and telephone receivers (with
headphones, particularly Baldwin models, also serving as
the basis for do-it-yourself loudspeakers), and open-baffle
cone loudspeakers, frequently actuated by electromagnetic
reed drivers.

Although these transducer technologies were rapidly
superseded during the following decade, the electromagnetic
loudspeaker era already featured multi-way loudspeakers
employing passive crossovers. Early measurements exposed
deficiencies in frequency response, leading to the
introduction of equalisation techniques, including notch
filters, to correct these responses.

Developments in amplification were equally significant. The
1920s saw the introduction of push-pull amplifiers
(described at the time as “distortionless”) and, as a key
contributor to improved bandwidth and reduced distortion,
new audio transformers derived from Bell Labs’ telephone
research. Amplifier power limitations nevertheless remained
a dominant constraint in loudspeaker design, resulting in
the widespread use of strong resonances to achieve high
sensitivity. Improvements in signal source quality from the
mid-1920s onwards — including advances in radio
transmission and the introduction of electrical disc
recording and playback — further increased the demand for
improved loudspeaker performance, ultimately contributing
to the development of dynamic loudspeakers. In contrast,
headphone technology appears to have undergone relatively
little development during this period.

The tutorial will conclude with a brief overview of the
loudspeaker manufacturing landscape of the era, noting that
only a small proportion of manufacturers survived the
transition to dynamic loudspeaker technology.

Speakers

Juha Backman

Bang & Olufsen

Thursday May 28, 2026 9:00am - 10:00am CEST
Aud 49 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Equipment, Tutorial

Presentation Type Tutorial

9:00am CEST

Exhibitor Registration

Thursday May 28, 2026 9:00am - 10:00am CEST

Foyer Building 306

Thursday May 28, 2026 9:00am - 10:00am CEST
Foyer Building 306 Technical University of Denmark Asmussens Alle, Building 306 DK-2800 Kgs. Lyngby Denmark

Registration Exhibitors

9:00am CEST

Student Welcome Meeting

Thursday May 28, 2026 9:00am - 10:00am CEST

Aud 41

Come and meet fellow student peers and AES leadership from
around the world. Attendees will gain an overview of
student-focused events at the Convention, other upcoming
student events and competitions organized by AES, and learn
about the finalists in the Student Recording Competition.

Participants will have the opportunity to introduce
themselves and their local student sections. The short
session encourages international connection and
collaboration among students, fostering a global network of
future audio professionals.

Speakers

Ian Corbett

AES / Kansas City Kansas Community College / off-beat-open-hats LLC, AES

Dr. Ian Corbett is the Coordinator and Professor of Audio Engineering and Music Technology at Kansas City Kansas Community College. He also owns and operates "off-beat-open-hats LLC”, providing live sound, audio production, and recording services to clients in the Kansas City area. Highly active... Read More →

Thursday May 28, 2026 9:00am - 10:00am CEST
Aud 41 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Student Events, Special Event

Presentation Type Special Event

9:30am CEST

Spectral Optimization for Automatic Multitrack Mixing Using Answer Set Programming

Thursday May 28, 2026 9:30am - 10:00am CEST

Aud 43

The mixing stage in music production involves a complex set
of interdependent technical; creative decisions aimed at
achieving a coherent; industry-level result. Intelligent
Music Production (IMP) is an emerging research area that
integrates Artificial Intelligence techniques into music
creation; post-production processes, spanning from
composition to mastering. Within this context, Answer Set
Programming (ASP), a declarative paradigm from Knowledge
Representation; Reasoning, has proven effective for
modeling; solving complex optimization problems. This
article presents frmixerr, an ASP-based intelligent system
designed to optimize the mixing process by automatically
generating balanced mixes. The system formulates mixing as
a combinatorial optimization problem; evaluates
candidate solutions against a reference spectral profile.
To assess its performance, a subjective listening test was
conducted comparing mixes generated by frmixerr with mixes
produced by human engineers with varying levels of
professional experience. The results indicate no
significant differences in perceived quality between
frmixerr mix; those created by professionals, suggesting
that ASP constitutes a viable approach for intelligent
assistance in music mixing.

Authors

Carlos Benítez

Tec de Monterrey

Flavio Everardo

Tec de Monterrey, University of Potsdam

Thursday May 28, 2026 9:30am - 10:00am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Applications and Technologies, Lecture | Audio Processing, Lecture

Presentation Type Lecture

9:30am CEST

Mutual coupling investigation of bass horn loaded speakers

Thursday May 28, 2026 9:30am - 10:00am CEST

Aud 44

In today’s live; electronic music events there are some
sound reinforcement systems that are using horn loaded bass
speaker cabinets to provide the low-end section. Especially
for the electronic music applications the PA system is
designed to use one or multiple clusters of bass cabinets
to provide the needed SPL; impact in the low frequency
range. Despite being large; heavy the horn loaded bass
speakers have some advantages like the efficiency;
directivity which makes them a great option for electronic
music. Even more, the enthusiasts are describing them as
having a longer projection of the sound when compared with
bass reflex units. When used in clusters the bass horns
present a mutual coupling due to a larger mouth surface
area; the physics behind. This effect alters the working
parameters in a good way regarding sound reproduction;
is clearly noticed at high levels. This mechanism increases
the output close to the low edge of the frequency response
interval; changes the directivity pattern. A cluster of
four or six double 18” horn loaded bass bins placed in the
front middle of a dance area will provide good impact
described a “punchy” sound, so acclaimed in the electronic
music party scene. In this paper I will describe an
investigation of the mutual coupling between horn cabinets
using electrical; acoustical measurements to reveal the
mentioned above mechanism. Electrical impedance measurement
together with SPL; frequency response in coupled;
uncoupled scenarios are used to describe; demystify the
mutual coupling phenomena.

Authors

Aurelian Botau

Sound system design engineer, Resound

Sound system design and calibration engineer.
I am running a company providing professional sound systems and DJ equipment rental. Sound system setup design, numerical simulations and technical support are included in the portfolio.
Horn speakers and Vacuum tube amplifiers enthus... Read More →

Thursday May 28, 2026 9:30am - 10:00am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Applications and Technologies, Lecture | Audio Equipment, Lecture | Sound Design, Lecture

Presentation Type Lecture

10:00am CEST

Experimental study of sound zone methods for indoor/outdoor active noise cancellation

Thursday May 28, 2026 10:00am - 10:30am CEST

Aud 44

The development of personal sound zone systems in recent
years show great potential for low-frequency noise control
outside of noisy spaces. These approaches show promising
applications to manage noise pollution arising from
concerts in large venues or urban festivals. However, most
of the literature considered that the created sound zones
would exist in the same room or acoustic space as the noise
source. This premise hence discards all setups where the
disturbances would occur outside of concert venues (e.g in
neighboring houses). This paper presents a first
experimental study of the behavior of sound zone methods
for indoor sound zones; outdoor noise sources. These
initial results present a good efficiency of these methods
in this edge case, opening new use cases for these
approaches.

Authors

Lucas Hocquette

L-Acoustics

Yves Pene

Research Engineer, L-Acoustics

Thursday May 28, 2026 10:00am - 10:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Acoustics of Music Rooms, Lecture | Audio Applications and Technologies, Lecture | Audio Equipment, Lecture | Audio Processing, Lecture

Presentation Type Lecture

10:00am CEST

Beyond Species Identification: Real-Time Spatial Interaction Analysis in Avian Bioacoustics Using Microphone Arrays; Hybrid Beamforming on Edge Architectures

Thursday May 28, 2026 10:00am - 10:30am CEST

Aud 43

Conventional ornithological monitoring systems rely heavily
on single-channel recorders; deep learning classifiers
to identify "what" species is present, but fail to capture
"where" it is located or how individuals interact
spatially. This limitation hinders the study of complex
ecological behaviors, such as inter-specific spacing in
dense vegetation; predator-prey dynamics. We propose a
novel, dual-mode acoustic localization system designed to
unify semantic classification; spatial tracking.
Utilizing an economically scalable 16-channel Uniform
Rectangular Array (UMA-16) interfaced with edge-computing
platforms, we implement a hybrid spatial filtering pipeline
structured to balance real-time latency constraints with
achievable angular resolution. The first stage employs a
computationally efficient, noise-robust linear scanning
technique to generate an acoustic energy map; estimate
source multiplicity. This preliminary data initializes a
second-stage, super-resolution spectral estimation
algorithm predicated on signal-noise subspace
orthogonality, allowing the noise robustness of
non-parametric beamforming methods with the precision of
parametric approaches. By integrating these spatial filters
with standard deep learning classifiers, the system
resolves overlapping vocalizations in "Cocktail Party"
scenarios; improves Signal-to-Noise Ratio (SNR) for
cryptic species detection. We address the physical
"Localization-Detection Range Disparity," demonstrating
that while detection is viable at long ranges, precise
localization is constrained by the array aperture to the
near-to-mid field. The system outputs real-time video
overlays of acoustic heatmaps for field observation;
generates autonomous volumetric territory maps in fixed
deployments, collectively providing ornithologists with a
robust capability for analyzing the spatial ecology of
avian vocalizations.

Authors

Emre Göktuğ AKTAŞ

Istanbul Technical University

Mesut Kartal

Istanbul Technical University

Thursday May 28, 2026 10:00am - 10:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Applications and Technologies, Lecture | Audio Processing, Lecture | Cross-Disciplinary Sound Studies, Lecture

Presentation Type Lecture

10:00am CEST

Comparative Quantitative Analysis of Immersive Mixing Practices: Tracking Spatial Trends in Award-Winning; Popular Streaming Media

Thursday May 28, 2026 10:00am - 10:30am CEST

Aud 42

Since 2021, 7.1.4 musical content has transitioned from a
niche specialty to a mainstream commercial deliverable
within major streaming ecosystems. However, industry
discourse indicates a disparity in how the immersive stage
is utilized across different production tiers. This paper
presents a targeted quantitative study of thirty 7.1.4
tracks (N = 30 total; 15 per category; 2021–2026),
employing a matched-pair sampling strategy driven by the
availability of 'Established Excellence' (Grammy
Award-winning/nominated immersive albums) against
genre-equivalent 'Market Dominance' (top-charting streaming
tracks). The study utilizes a multi-parameter measurement
methodology, including Inter-Channel Cross-Correlation,
hemispheric symmetry; spatial width analysis.
Furthermore, vertical spectral centroid distribution;
channel occupancy (Center; LFE) are analyzed to identify
recurring structural immersive design markers. Preliminary
findings suggest a consistent forward-facing bias; lower
activity in select channels in charting commercial releases
compared to award-recognized counterparts. By documenting
these technical indicators, such as quarter-sphere
correlation; LFE handling differences, this study
establishes a benchmark for current immersive mixing
practices; highlights the technical indicators that may
limit the transition from enhanced stereo to true immersive
envelopment.

Authors

Can Murtezaoglu

Research Assistant, Istanbul Technical University

Immersive audio recording and mixing techniques, audio design for visual media

Thursday May 28, 2026 10:00am - 10:30am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Immersive Audio, Lecture

Presentation Type Lecture

10:00am CEST

Distortion Measurements; Can We Measure What We Hear?

Thursday May 28, 2026 10:00am - 11:00am CEST

Aud 49

There are many types of different distortions that can be
measured from linear to non-linear distortion. Often the
two are convoluted together and the linear distortion
influences the non-linear distortion. Distortion is also
very signal and level dependent and it is hard to compare
one type of distortion measurement to another. There are
many type of non-linear distortion metrics, e.g. THD, THD+N
and IMD being the most classic ones using sine tones as the
test signal. But how can we measure distortion with real
signals such as speech and music or even noise and compare
the results to audibility? This tutorial discusses a wide
range of distortion measurements, discusses what is audible
and what distortion sounds like.

Speakers

Steve Temme

Listen Inc.

Steve Temme is founder and President of Listen, Inc., manufacturer of the SoundCheck audio test system. Steve founded the company in 1995, and for the past 30 years the company has remained on the cutting edge of research into audio measurement, regularly introducing new measurement... Read More →

Thursday May 28, 2026 10:00am - 11:00am CEST
Aud 49 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Presentation Type Tutorial

10:00am CEST

Drone-Based Class 1 Sound Level Measurements for Three-Dimensional Characterization of Outdoor PA Systems

Thursday May 28, 2026 10:00am - 11:00am CEST

Building 302, 2nd floor

Accurate characterization of the three-dimensional sound
radiation of outdoor public-address (PA) systems is
essential for sound system engineering, environmental noise
assessment, neighbourhood protection, and the validation of
prediction models. In current practice, field measurements
around performance stages are typically restricted to
receiver heights below 5 m, limiting insight into sound
radiation at elevated positions and towards the surrounding
environment. This tutorial presents a measurement approach
using an unmanned aerial vehicle (UAV) as a platform for
Class 1 sound level measurements, enabling in-situ
characterization of large-scale PA systems sound radiation
in three dimensions.
A controlled case study was conducted at an open-air
festival site in Belgium where the sound radiation of a
professional line-array PA system was measured at heights
of 2 m and 30 m using both conventional ground-based
measurements and a drone-mounted sound level meter. To
ensure compatibility with standard sound engineering and
environmental noise practice, strict Class 1 methodology
was applied, including the use of an omnidirectional
microphone, broadband excitation signals, and background
noise correction in accordance with ISO 1996-2. Drone
self-noise was quantified under operational conditions, and
measurement data not meeting signal-to-noise validity
criteria were excluded.
The results show that reliable drone-based measurements are
achievable in the low-frequency range from 25 to 315 Hz,
which is of primary relevance for outdoor music systems and
community noise impact and disturbance. Directivity indices
derived at elevated height reveal weaker low-frequency
directivity compared to ground-level measurements. This
provides new insight into vertical sound radiation
behaviour of festival PA systems. A comparison between
measured and modelled sound levels demonstrates good
agreement in terms of angular distribution and relative
level differences.
The proposed drone-based measurement approach enables
three-dimensional sound field characterization of outdoor
PA systems that is not attainable using conventional
techniques. The method provides valuable data for sound
system engineering leading to validation of prediction
models and environmental noise assessment. This
three-dimensional decibel measurement represents a step
towards standardized UAV-based measurement methodologies
for large-scale outdoor sound reinforcement systems.
This tutorial will describe in detail the protocol to
operate a measurement drone flight. After the presentation
a practical demonstration of the drone platform will be
held outside of the building.

Speakers

Marcel Kok

CEO, student at dBcontrol

Thursday May 28, 2026 10:00am - 11:00am CEST
Building 302, 2nd floor Technical University of Denmark Asmussens Alle, Building 302 DK-2800 Kgs. Lyngby Denmark

Audio Applications and Technologies, Tutorial

Presentation Type Tutorial

10:00am CEST

The Early Electronic Orchestra: The Analogue Circuits Behind Electronic Keyboards Before Digital Came Along.

Thursday May 28, 2026 10:00am - 11:00am CEST

Aud 41

Before digital signal processing took over electronic
keyboard instruments, they were implemented using analogue
circuits that used tubes/valves, transistors, and even neon
lightbulbs! Yet using these components keyboards were
developed that could mimic string and brass ensembles,
pianos and harpsichords and many other instruments. How did
they do it?

The purpose of this tutorial is to look at both the
architecture and the circuitry of these instruments. And
show how amazing results could be achieved using
comparatively simple electronic circuitry. It will look at:

1. The basic architecture of these instruments
2. How they generated the right notes,
3. How they desired envelope,
4. And imposed them on the waveform,
5. Simulated the effect of many instruments playing
together.

It will also look at how, if it was required, touch
sensitivity could be achieved, such as in electronic
pianos. Where possible there will be audio examples
demonstrating the sounds that could be achieved.

For many people who have only ever experienced the digital
world it will be illuminating to see just how much could be
achieved by comparatively simple circuits.
In those days electronic components were expensive so
considerable ingenuity was expended in minimising the total
number of components required.

These instruments are part of our musical and audio
heritage and the circuit techniques they used are in danger
of being forgotten so this tutorial will be a timely
reminder of what used to be done.
It may also provide useful information to people who are
attempting to model these instruments using modern digital
methods.

The tutorial will be accessible to everyone, you will not
have to be an electronic engineer to understand the
principles behind these unique pieces of audio engineering
history.

Speakers

Jamie Angus-Whiteoak

Emeritus Professor/Consultant/VP-Northern Europe, AES

Jamie Angus-Whiteoak Is Emeritus Professor of Audio Technology at Salford University and VP for Northern Europe.

Her interest in audio was crystallized aged 11 when she visited the WOR studios, NYC, in 1967 on a school trip. After this she was hooked, and spent much of her free ti... Read More →

Thursday May 28, 2026 10:00am - 11:00am CEST
Aud 41 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Applications and Technologies, Tutorial | Audio Equipment, Tutorial | Audio Processing, Tutorial

Presentation Type Tutorial

10:00am CEST

ECHO Project - Immersive Microphone Array Techniques for Orchestral Recording

Thursday May 28, 2026 10:00am - 11:00am CEST

Aud 31

The ECHO Project (Exploring the Cinematic Hemisphere for
Orchestra) is a collaborative initiative investigating 3D
microphone array techniques for orchestral recording.
Building on the 3D-MARCo initiative, the project provides a
platform for sound engineers, composers, researchers, and
students to explore and experiment with immersive recording
approaches. As part of this effort, an open-access database
of high-quality orchestral recordings was created from
sessions at AIR Studios, London, featuring Oscar-winning
composer Volker Bertelmann and the London Contemporary
Orchestra.The ECHO database contains recordings of four
musical pieces captured using up to 143 microphone
capsules, including seven expert-designed microphone
arrays, spot microphones, a dummy head, and a higher-order
spherical microphone system. The database enables
comparison of different recording techniques and supports
experimentation with microphone mixing, making it a
valuable resource for research, teaching, and immersive
audio production. This workshop will introduce the
microphone arrays, describe the recording process and
immersive compositional approach, and showcase selected
recordings in 7.1.4.

Speakers

Hyunkook Lee

Professor, University of Huddersfield

Professor

Katarzyna Sochaczewska

Immersive Music Producer, Researcher, University of York

Morten Lindberg

Engineer and Producer, 2L (Lindberg Lyd)

Recording Producer and Balance Engineer with 50 GRAMMY-nominations, 42 of these in craft categories Best Engineered Album, Best Surround Sound Album, Best Immersive Audio Album and Producer of the Year. Founder and CEO of the record label 2L. Grammy Award-winner 2020 and 2026. Immersive... Read More →

Thursday May 28, 2026 10:00am - 11:00am CEST
Aud 31 Technical University of Denmark Asmussens Alle, Building 306 DK-2800 Kgs. Lyngby Denmark

Immersive Audio, Tutorial | Recording Production and Reproduction, Tutorial

Presentation Type Tutorial

10:00am CEST

Exhibit Hall

Thursday May 28, 2026 10:00am - 6:00pm CEST

Aud 36

Thursday May 28, 2026 10:00am - 6:00pm CEST
Aud 36 Technical University of Denmark Asmussens Alle, Building 306 DK-2800 Kgs. Lyngby Denmark

Exhibition

10:30am CEST

Effect of an Active Acoustic Reinforcement System on Musical Performance in a Recording Studio

Thursday May 28, 2026 10:30am - 11:00am CEST

Aud 42

This work presents the results of a perceptual study
investigating the influence on musicians of a virtual
acoustics system installed in the live room of a
professional recording studio. The study focused on
analyzing relationships between a selection of objective
acoustic parameters (T30, STLate, LJ); subjective
perceptions of 19 solo
musicians performing under 11 different acoustic
conditions. The experiment was conducted using the VAT
(Virtual Acoustic Technology) system; the VAT Suite
software developed at the Immersive Media Laboratory
(IMLab) in the Sound Recording Department at McGill
University. Correlations between quantitative;
qualitative analyses
show that musicians’ preferences converge on conditions
with T30 ≈ 1 s,; that late; lateral energy increases
the perception of spatiality, providing a positive balance
between clarity; acoustic support. However, longer
reverberation reduces comfort; executive control.

Authors

Gianluca Grazioli

Montreal, Canada, McGill University

Richard King

McGill University, McGill University

Montreal

Wieslaw Woszczyk

McGill University

Thursday May 28, 2026 10:30am - 11:00am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Acoustics of Music Rooms, Lecture | Immersive Audio, Lecture | Perception, Lecture | Recording Production and Reproduction, Lecture

Presentation Type Lecture

10:30am CEST

Confidently Wrong: Evaluating AudioSet-Trained Models Under Real-World Deployment

Thursday May 28, 2026 10:30am - 11:00am CEST

Aud 43

Audio event-classification models trained on AudioSet are
widely adopted; form a central component of the state of
the art in machine listening, yet their behavior when
deployed in complex, open acoustic environments remains
largely unexplored. In this study, we evaluate several
widely adopted AudioSet-pretrained
architectures—particularly models from the PANNs family,
including MobileNetV2; Wavegram; Transformer-based
PaSST model—when applied to a real operational scenario at
the commercial Port of Valencia, Spain. We observed a
recurring; systematic unexpected behavior: the models
frequently assigned disproportionately high probability to
the class Music for non-musical industrial;
transportation sounds. These mislabeled events included
train-wheel squealing, motorcycle acceleration, emergency
sirens,; reversing beeps—sound categories that are
common in port logistics environments but acoustically
different from music. By analyzing the probability
distributions output by the models, we demonstrate that
this erroneous Music activation is not an isolated failure
but a pervasive pattern across several architectures. Our
findings highlight a critical gap in the robustness;
domain generalization of AudioSet-derived models;
emphasize the need for targeted adaptation techniques when
deploying them in real industrial settings.

Authors

Javier Naranjo Alcazar

Instituto Tecnologico de Informatica (ITI), Paterna, Spain

Jordi Grau de Haro

Instituto Tecnológico de Informática

Marta Garcia Ballesteros

Instituto Tecnologico de Informatica (ITI), Paterna, Spain

Pedro Diego Zuccarello González Victorica

Ruben Ribes Serrano

Instituto Tecnologico de Informatica (ITI), Paterna, Spain

Thursday May 28, 2026 10:30am - 11:00am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture

Presentation Type Lecture

10:30am CEST

Nonlinear viscoelasticity in loudspeaker suspensions

Thursday May 28, 2026 10:30am - 11:00am CEST

Aud 44

Damping in viscoelastic materials such as rubbers is often
desirable, especially in loudspeaker suspensions. Under
high strain loads however, viscoelastic materials can also
exhibit a hysteretic stiffness behavior, causing a
stiffness decrease with amplitude. In this study, we
examine the viscoelastic rubber suspension of a
loudspeaker, using the loudspeaker motor system as actuator
; sensor. From measurements we observe the hysteretic
force-displacement behavior; pronounced odd-order
harmonic distortion even at low amplitudes, in accordance
with the literature. We further explore a
macro-thermodynamic plastic flow model to model the
stiffness of viscoelastic materials. The results show that
the plastic flow suspension model explains; replicates
the observed nonlinear hysteretic behavior. We also show
that a fitted time-domain loudspeaker model including
plastic flow matches the measured distortion profile. In
contrast, models with polynomial stiffness; viscous
damping fail to explain the observed amplitude dependencies
such as odd order harmonic levels. The experiments
demonstrate that viscoelastic hysteresis occurs not only at
high but also at low amplitudes, where the elastic
stiffness is approximately linear.

Authors

Finn Agerkvist

Technical University of Denmark

My interest are loudspeakers (measurements, modelling, (nonlinear) parameter estimation, nonlinear compensation. Active noise control, indoor and outdoor sound field control

Franz Heuchel

GN Audio

Manuel Hahmann

Dynaudio A/S

Thursday May 28, 2026 10:30am - 11:00am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Equipment, Lecture | Recording Production and Reproduction, Lecture

Presentation Type Lecture

11:00am CEST

Audio data augmentation techniques for frame drum stroke recognition

Thursday May 28, 2026 11:00am - 11:30am CEST

Aud 43

This work addresses the problem of frame drum (bendir)
stroke technique recognition in simulated real-world
conditions. The traditional frame drum technique includes
three discrete strokes that are used to create rhythmic
patterns, dum, tek; slap. In the presented work, audio
data augmentation is investigated on a dataset containing
recordings of instruments of various construction
attributes. The used techniques are selected in the
direction of generalizing classification in real-world
conditions. Moreover, the mixing of the frame drum samples
with accompanying guitar chords is introduced, simulating
the more complicated problem of hit technique recognition
when playing in a duo. The application of the
aforementioned data augmentation leads to the formation of
different available datasets for training; testing. Two
convolutional neural network architectures (one-;
two-dimensional) are taken into consideration, trained on
waveforms; melscale spectrograms of the different
subsets accordingly.

Authors

Antonis Pagonis

Pagonis Percussion

Charalampos Dimoulas

Aristotle University of Thessaloniki

Labros Vasileiou

Aristotle University of Thessaloniki

Nikolaos Vryzas

Aristotle University of Thessaloniki

Dr. Nikolaos Vryzas was born in Thessaloniki in 1990. He studied Electrical & Computer Engineering in the Aristotle University of Thessaloniki (AUTh). After graduating, he received his master degrees on Information and Communication Audio Video Technologies for Education & Production from the Interdepartme... Read More →

Thursday May 28, 2026 11:00am - 11:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture

Presentation Type Lecture

11:00am CEST

Input-output linearization of loudspeaker dynamics via automatic differentiation

Thursday May 28, 2026 11:00am - 11:30am CEST

Aud 44

Input-output linearization is a technique for compensating
nonlinear distortion in loudspeakers. To apply it to
complex loudspeaker models, we describe an end-to-end
framework for estimating model parameters from data;
deriving the linearizing control laws using automatic
differentiation. The parameter estimation approach combines
frequency-domain linear parameter estimation with a
time-domain prediction-error method for the nonlinear
parameters. The linearization approach supports non-linear
reference systems; stabilization of the control law
using trajectory tracking. We implement the framework in
dynax, an open-source Python package based on JAX,;
validate it experimentally as a feed-forward controller on
a closed-box loudspeaker. Results demonstrate validation
errors of 1--5\,\% NRMSE; total harmonic distortion
reductions of 6--12\,dB. The framework enables researchers
; engineers to rapidly prototype; validate complex
loudspeaker models for distortion compensation without
manual symbolic derivations.

Authors

Finn Agerkvist

Technical University of Denmark

My interest are loudspeakers (measurements, modelling, (nonlinear) parameter estimation, nonlinear compensation. Active noise control, indoor and outdoor sound field control

Franz Heuchel

GN Audio

Thursday May 28, 2026 11:00am - 11:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Processing, Lecture | Recording Production and Reproduction, Lecture

Presentation Type Lecture

11:00am CEST

Comfortability analysis of immersive sound playback system for cabin noise based on frontal lobe fNIRS experiment: an application of 4th order ambisonics

Thursday May 28, 2026 11:00am - 11:30am CEST

Aud 42

This study introduces a fourth-order Ambisonics-based decoding system to reproduce railway cabin running noise in a studio environment, enabling enhanced spatial impression and detailed sound field variation. Real-world operational noise was recorded using a multichannel fourth-order Ambisonics microphone (Eigenmike® EM32, mh acoustics LLC, USA), and the reproduced sound field was implemented through a multichannel loudspeaker system. The reproduced signals were quantitatively compared with the original operational noise in terms of spectral variation and waveform distortion.

Authors

Yonghee Lee

Research Associate, Changwon National University

Yonghee Lee
Ph D. Mechanical Engineeing.
Ultrasonic, Acoustic, SHM, NDE, fNIRS, and Bio-medical engineering.
Contact: [email protected]
Institute: Changwon National Uniersity, South Korea

Thursday May 28, 2026 11:00am - 11:30am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Immersive Audio, Lecture

Presentation Type Lecture

11:00am CEST

Kseniya Kawko: 3D Masterclass

Thursday May 28, 2026 11:00am - 12:00pm CEST

Aud 31

Kseniya Kawko, a Munich- and London-based Tonmeister and
recording engineer specializing in classical music and
jazz, shares selections from her recent live and studio
recording and mixing projects, featuring leading orchestras
and jazz ensembles, and provides an introduction to the
artistic and production considerations behind immersive
formats.

This masterclass series, featuring remarkable recording
artists, is a chance to hear 3D audio at its best; as we
discuss qualities that make it truly worth the effort.

In each masterclass, we explore the new spatial
possibilities in recording and production, detailing also
this specific listening room, regarding ITU-R BS.1116
compliance and auditory envelopment (AEV) transparency.
Seats are limited to keep playback variation at bay.

Speakers

Kseniya Kawko

Tonmeister, msm studios

Kseniya Kawko is a producer and recording engineer specialized in classical music and jazz. She holds Master of Music degrees from two world-renowned audio programs: Sound Recording, McGill University (Montréal, Canada) and Musikregie / Tonmeister, Hochschule für Musik Detmold (Germany... Read More →

Thomas Lund

Genelec Oy, Genelec Oy

Denmark

Thursday May 28, 2026 11:00am - 12:00pm CEST
Aud 31 Technical University of Denmark Asmussens Alle, Building 306 DK-2800 Kgs. Lyngby Denmark

Acoustics of Music Rooms, Masterclass | Immersive Audio, Masterclass | Perception, Masterclass | Recording Production and Reproduction, Masterclass

Presentation Type Masterclass

11:00am CEST

Immersive Audio Formats: Innovation, Fragmentation, or Both?

Thursday May 28, 2026 11:00am - 12:00pm CEST

Aud 41

Immersive music is at a critical point in its development.
While production tools, workflows, and distribution models have begun to stabilise, the market remains fragile, and long-term adoption is far from guaranteed.

New immersive audio formats are now entering a field where creators, labels, and platforms have only recently started to commit resources and build confidence. This raises a fundamental question: does the introduction of additional formats strengthen immersive music, or does it increase uncertainty at a time when the market can least afford it?

This panel-based workshop focuses on immersive audio formats for music and explores whether current challenges are best addressed through new formats, or through innovation and improvement within existing ones.

Topics for discussion include:
- What are the most pressing problems facing immersive music today?
- Do emerging formats solve these problems, or risk fragmenting production, distribution, and listening experiences?
- How does format uncertainty affect investment, release strategies, and creative willingness, especially in smaller markets?
- What are the potential consequences if industry stakeholders decide that immersive music is too complex or too risky to prioritise?
- How do issues such as translation between loudspeaker-based and headphone listening fit into this broader picture?

The session is designed as an open, moderated discussion with panelists from production, research, mastering, education, and technology development.

Speakers

Stefan Bock

Managing Director, msm-studios GmbH

Stefan Bock, born 20.08.1964 in southern Germany was starting his career in 1987 as an audio engineer. After freelancing in different facilities in Munich, he co-founded msm-studios in 1991 where he was the Chief Mastering Engineer and General Manager.

He was leading msm-studios t... Read More →

Katarzyna Sochaczewska

Immersive Music Producer, Researcher, University of York

Michael Romanowski

Owner-Head Engineer, Coast Mastering

Morten Lindberg

Engineer and Producer, 2L (Lindberg Lyd)

Lars Tirsbæk

Head of Sonic Days, Sonic College

With expertise in Dolby Atmos and immersive sound, Lars Tirsbæk leads the way in teaching studio production at Sonic College. His innovative approach combines the best of both studio and live sound, focusing on efficient workflows, technical tools, and the creative process. Additionally... Read More →

Thursday May 28, 2026 11:00am - 12:00pm CEST
Aud 41 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Immersive Audio, Panel

Presentation Type Panel

11:00am CEST

Measurement tools for immersive audio production

Thursday May 28, 2026 11:00am - 12:00pm CEST

Aud 49

Multichannel audio formats require an attention to
channels' correlations and sometimes special approach. In
this workshop, we would like to continue the discussion
started at AES Show 2025 in LA and show how you can use
different measurement tools to avoid certain problems in
the final mix. For example, the mutual influence between
the upper and main beds in immersive layout or problems in
the LFE channel and how to check the mix for the
correlation issues outside the sweet spot.

Speakers

Pavel Smokotnin

RTW GmbH & Co. KG, RTW GmbH & Co. KG

Germany, Köln

Thursday May 28, 2026 11:00am - 12:00pm CEST
Aud 49 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Immersive Audio, Tutorial | Perception, Tutorial

Presentation Type Tutorial

11:30am CEST

System-Level Remapping for Electronic Music Spatial Reproduction: A Case Study of the Cross-Venue Reperformance of Symphonic Coding

Thursday May 28, 2026 11:30am - 12:00pm CEST

Aud 43

Taking the premiere and reperformance of the sci-tech symphonic suite Symphonic Coding as a case study, this paper discusses audio system organization, sound diffusion, and cross-venue migration in the co-performance of symphonic and electronic music. Given the challenges of diverse live inputs, real-time control of the electronic music part, concurrent recording and live streaming, and varying acoustic conditions, the article analyzes how a single workflow handles traditional miking, electronic music generation and control, live spatial diffusion, and multi-purpose distribution. The study is structured across four levels: system design requirements, signal organization, dual-venue implementation, and engineering discussion. It illustrates the development of an interconnected workflow comprising Content, Rendering, and Distribution Layers through mixing console organization, immersive rendering, and AoIP distribution. Results indicate that the significance of this work lies not in the reproduction of the listening experience of the entire performance, but in enabling the spatial presentation of the electronic music part to remain valid across different environments based on a consistent reference. Furthermore, the project enhances reperformance capability and production flexibility through the separation of functions, roles, and systems.

Authors

Chuhan Gao

Communication University of China

Xiuquan Yao

Communication University of China

Yilong Zhang

Communication University of China

Thursday May 28, 2026 11:30am - 12:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Applications and Technologies, Lecture | Immersive Audio, Lecture

Presentation Type Lecture

11:30am CEST

Virtualization-Based Mechanical Loudspeaker Protection Using Nonlinear Wave Digital Modeling

Thursday May 28, 2026 11:30am - 12:00pm CEST

Aud 44

Mechanical overload remains a primary limitation in
high-output loudspeaker operation, particularly at low
frequencies where large coil excursions are required.
Conventional mechanical protection strategies are typically
implemented as signal-domain limiters or filters, which act
indirectly on the loudspeaker’s mechanical state; may
introduce discontinuities, spectral modification, or
unnecessary attenuation.

This paper proposes a methodological framework for
mechanical loudspeaker protection based on the
virtualization of admissible system behavior. The approach
is formulated within a nonlinear wave digital loudspeaker
model; realized using a direct–inverse–direct
architecture. Mechanical protection is embedded directly
into the virtual loudspeaker dynamics by shaping the
nonlinear suspension compliance as a function of voice-coil
displacement. As the excursion approaches a prescribed
admissible limit, the virtual compliance is progressively
reduced using a smooth raised-cosine law, resulting in a
continuous increase of the virtual mechanical stiffness.
Excessive excursion is therefore prevented as a consequence
of the system dynamics, without explicit limiting,
clipping, or signal-domain intervention.

The proposed framework is evaluated through numerical
simulations using steady-state low-frequency sinusoids;
low-frequency sine bursts under free-air loading. Results
are compared against an unprotected loudspeaker; a fixed
high-pass filter configured to meet the same excursion
constraint. The simulations verify that the proposed method
enforces a soft excursion ceiling without discontinuities,
preserves low-frequency output in the near-limit operating
region,; exhibits stable; immediate recovery
following transient excitation. Distortion behavior is
characterized; shown to increase smoothly as a result of
the introduced mechanical nonlinearity.

The results demonstrate that mechanical protection can be
realized as an emergent property of a virtual loudspeaker
model rather than as an external control action. The
proposed approach provides a physically interpretable;
numerically robust foundation for virtualization-based
loudspeaker protection.

Authors

Lucio Bianchi

Elettromedia s.p.a.

Thursday May 28, 2026 11:30am - 12:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Equipment, Lecture

Presentation Type Lecture

11:30am CEST

The efficacy of phantom image perception: an active listener perspective.

Thursday May 28, 2026 11:30am - 12:00pm CEST

Aud 42

A “phantom image” is the illusion of an independent sound
source created by two or more loudspeakers. Most often
created by manipulating level differences between
stereophonic channels (aka, “panning”), the effect is used
to create a sense of auditory space between loudspeakers
; is largely taken for granted. In recent years,
surround; immersive audio systems have attempted to
utilize phantom image processing to render audio objects in
desired positions across multiple loudspeaker arrays. This
research examined the efficacy of phantom image perception
horizontally; vertically from an active listener
perspective. After listening to a target loudspeaker,
listeners (n = 442) were asked to move a phantom sound to a
position to match that of the target loudspeaker. The
listener’s phantom placement was then compared to the
target,; subjects were allowed “correct” their phantom
position. The horizontal experiment was based on a
standard stereophonic 60° loudspeaker array with the target
loudspeaker at 15° off center. The vertical experiment
utilized elevated loudspeakers in a 60° arc with the target
loudspeaker elevated 10° above the horizon (lower
loudspeaker). Results show nearly universal “undershoot” in
horizontal placement error on first attempts with gradual
improvement over trials that coalesced around the projected
target location. However, after repeated tries, final
perceptual image locations were spread over 2/3 of the
sound-field around the target loudspeaker. In the vertical
trials perceptual locations were spread across the entire
sound field in all three trials; failed to show any
patterns of coalescence around the target loudspeaker.

Authors

Song Hui CHON

Associate Professor, Belmont University

Associate Professor of Audio Engineering Technology, interested in the perception and cognition of music and sound, especially timbre and attention. An amateur historical keyboardist. And my first name sounds like "song-he" as in "The song he sang was beautiful."

Wesley Bulla

Belmont University

Thursday May 28, 2026 11:30am - 12:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Immersive Audio, Lecture | Perception, Lecture | Recording Production and Reproduction, Lecture

Presentation Type Lecture

12:00pm CEST

Opening Ceremony, Keynote Session, Awards

Thursday May 28, 2026 12:00pm - 1:30pm CEST

Aud 42

This is the official Opening Ceremony of the 160th AES Convention in Copenhagen.

AGENDA

WELCOME
Colleen Harper, AES Executive Director
Brecht De Man, AES President
Jan Abildgaard Pedersen, AES 160th Convention Chair

PRESENTATION OF AWARDS
Cesar Lamschtein, AES President Elect
Finn T. Agerkvist and Lars Tirsbæk, Papers Co-Chairs

KEYNOTE ADDRESS
Jan Abildgaard Pedersen, Committee Chair
Geoff Martin : “The Perceptual Irrelevance of Physical Measurements”

CLOSING REMARKS
Jan Abildgaard Pedersen, AES 160th Convention Chair

Moderators

Jan Abildgaard Pedersen

Convention Chair, Audio Engineering Society

Jan Abildgaard Pedersen Consult offers a wide variety of services: Sound Tuning, Innovation Process, Audio DSP Algorithms, Solving impossible Audio Problems, Room Adaptation, Audio System Development, Audio Research, Audio Strategy Advisor, Patent Advice, White Papers, Scientific... Read More →

Speakers

Cesar Lamschtein

President Elect, Audio Engineering Society

Colleen Harper

Executive Director, Audio Engineering Society

Brecht De Man

Head of Research, AES President

Brecht De Man is Head of Research at PXL-Music, guest lecturer at the Royal Conservatoire of The Hague, and author of Intelligent Music Production (Routledge 2019). He holds a PhD from the Centre for Digital Music at Queen Mary University of London, where he developed and evaluated... Read More →

Lars Tirsbæk

Head of Sonic Days, Sonic College

Geoff Martin

Director Specialist for Audio Quality, Bang & Olufsen

Authors

Finn Agerkvist

Technical University of Denmark

My interest are loudspeakers (measurements, modelling, (nonlinear) parameter estimation, nonlinear compensation. Active noise control, indoor and outdoor sound field control

Thursday May 28, 2026 12:00pm - 1:30pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Special Events, Special Event

1:30pm CEST

A New Reference Target Curve for Studio Headphones

Thursday May 28, 2026 1:30pm - 2:00pm CEST

Aud 44

Target curves for the sound signature of headphones are a
helpful design target during the development process. While
a lot of attention has been made to ﬁ nd target curves that
match the listening preference of consumers, equivalents
for studio headphones date back to the 90’s. In the context
of music production a mutual target or even standard is
essential as to make mixing; mastering more
gear-independent. This becomes even more important since
the main tool for sound engineers shifts from loudspeakers
in professional environments such as acoustically treated
studios to headphones, often additionally equipped with
virtualization algorithms. This enables them to be more ﬂ
exible; to rely less on potentially expensive
loudspeaker setups. The diffuse ﬁ eld target curve that is
currently still the only standardized target curve for
studio headphones is often reported to not match a real
loudspeaker-equivalent of studio environments. In this
paper, we approach to ﬁnd a new standard target curve for
studio headphones emulating the frequency response of a
loudspeaker setup in modern studio environments.
For this, we give an overview of current target curves;
match them to their equivalent loudspeaker setups.
Based on that we propose a new methodology for a
measurement-based target curve incorporating typical
panning paradigms of music signals based on measurements
inside multiple control rooms. To verify the results, we
conduct listening tests with professionals in multiple
studio environments.

Authors

Jonas Foerster

Signal Processing Engineer, beyerdynamic GmbH & Co. KG

Passionate about Headphones, Signal Processing and their interaction.

Focus on headphone target curves, spatial audio and ANC

Lukas Keppler

beyerdynamic GmbH & Co. KG

Thursday May 28, 2026 1:30pm - 2:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Acoustics of Music Rooms, Lecture | Audio Equipment, Lecture | Perception, Lecture | Recording Production and Reproduction, Lecture

Presentation Type Lecture

1:30pm CEST

Joint Neural Translation; Classification of Videos for Audio Processing

Thursday May 28, 2026 1:30pm - 2:00pm CEST

Aud 43

A low-parameter-count machine-learning model for
classifying streaming video can enable content-aware
audio/video processing on consumer edge devices with
latency, computational,; battery constraints. In this
paper, we propose a low-compute classification technique
that uses only text metadata from the streaming file
header, enabling near-instantaneous inference without
decoding; analyzing audio or video signals as is
traditionally done. In particular, to support multilingual
platforms such as YouTube, we first apply neural machine
translation as a pre-processing step for the text metadata
; optimize a lightweight neural classifier for a
three-class audio-centric classification taxonomy (movie,
music, dialog/other). Experiments on a mixed-language
YouTube dataset achieve $\approx$90\% classification
accuracy on a test set using a combined translation; a
classification model (with only $\sim22K$ parameters),
demonstrating a globally-scalable approach for robust
classification on the edge.

Authors

Alejandro Cajica

Samsung Research Mexico

Sunil Bharitkar

Samsung Research America

Thursday May 28, 2026 1:30pm - 2:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Processing, Lecture

Presentation Type Lecture

1:30pm CEST

Headphone development is not over yet

Thursday May 28, 2026 1:30pm - 2:30pm CEST

Aud 41

Headphones have become the dominant device for music
playback, and their design appears to have reached a
certain level of technical maturity. This workshop presents
an overview of the current state of the art in headphone
design and examines potential directions for future
technological development, addressing both acoustic
aspects—including transducer design—and signal-processing
approaches.

The workshop establishes a common foundation by introducing
the fundamentals of headphone acoustics and design
principles, together with a brief overview of the
historical development of headphones and the main headphone
types in use today.

Based on this foundation, the workshop addresses current
challenges and future development potential in headphone
technology, including:
• Transducer and acoustic development potential: materials,
design methodologies and simulation techniques, and
advances in measurement technology
• Characteristics of a high-quality headphone: What
differentiates an excellent headphone from a good one? To
what extent can headphone performance be characterized
using current measurement techniques, and what additional
metrics, target criteria, or perceptual considerations may
be required? What is the role of mechanical quality?
• Signal processing potential: from advanced noise
cancellation and augmented hearing to spatial audio
processing
• Challenges in realistic spatial reproduction: interaction
between auditory and visual environments
• Emerging wireless technologies: technologies such as UWB
and Bluetooth 6 offer not only increased bandwidth and
reduced latency but also the capability to localize the
playback device. What are the implications for conventional
headphone performance and for spatial audio applications?
• Changes in studio workflows: professional practice has
evolved from loudspeakers as the primary monitoring tools,
with headphones mainly used for detailed analysis, toward
headphones playing a central role in the early stages of
recording and mixing. What are the consequences of this
shift for headphone design and signal processing?
• Technically feasible but not yet commercialized
solutions: advanced headphone concepts that are achievable
with current technology but have not yet been adopted due
to economic or practical constraints

Speakers

Juha Backman

Bang & Olufsen

Naotaka Tsunoda

Huawei

Axel Grell

Grell Audio

Thursday May 28, 2026 1:30pm - 2:30pm CEST
Aud 41 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Applications and Technologies, Panel | Audio Equipment, Panel | Immersive Audio, Panel

Presentation Type Panel

1:30pm CEST

New Paths for Immersive Music Streaming: Channel-based and High Resolution

Thursday May 28, 2026 1:30pm - 3:00pm CEST

Aud 49

Streaming of immersive audio is known to western audiences
almost exclusively in the object-based format, Atmos,
developed by Dolby and employing lossy codecs to limit bit
rates. Other object-based formats like Sony 360 have had
limited success, and until recently there were no channel
based streamed versions. But this situation is changing,
as it has already done in Japan.

Responding to growing interest in very high quality
immersive music for both on-demand streaming and live
broadcast, two new services are now active that offer,
first, channel-based audio and second, audio streamed in
high res PCM. Binaural mixes, a range of PCM formats and
video are variously included, with extensions to portables,
loudspeakers, and home theater.

This workshop provides a forum for discussion of both the
genuine promise and the challenges in these new
initiatives. Included are the advantages of high
resolution over lossy; channel-based versus object-based;
the degree of adoption of transducers for multichannel;
adaptive bit rates; data sources; and the Japanese
approach; amongst others.

Speakers

Kimio Hamasaki

President, Artsridge LLC

Kimio Hamasaki, an AES Fellow, is a producer and balance engineer for music recordings, a researcher in spatial audio, an educator in audio engineering and acoustics, and a consultant in audio engineering. He has recorded and produced numerous orchestral and operatic works with the Vienna Philharmonic... Read More →

Stefan Bock

Managing Director, msm-studios GmbH

Bert van Daele

CTO, Goer Dynamics BV

Bert Van Daele is CTO at NewAuro.
After graduating as an Engineer in Digital Electronics in 1997, he started out as an electronics designer at Philips Electronics, mainly working on digital products related to Surround Sound.
During a sabbatical leave, he worked at the Galaxy Studi... Read More →

Morten Lindberg

Engineer and Producer, 2L (Lindberg Lyd)

Vicki Melchior

Chair, AES Technical Committee - HRA; also: IndependentnConsultant, Audio DSP and Software

Thursday May 28, 2026 1:30pm - 3:00pm CEST
Aud 49 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Applications and Technologies, Panel | Audio Processing, Panel | Immersive Audio, Panel | Recording Production and Reproduction, Panel

Presentation Type Panel

1:30pm CEST

Binaspect: A Python Library for Binaural Audio Analysis, Visualization & Feature Generation

Thursday May 28, 2026 1:30pm - 3:30pm CEST

Foyer Building 303A

We present Binaspect, an open-source Python library for
binaural audio analysis, visualization,; feature
generation. Binaspect generates interpretable “azimuth
maps” by calculating modified interaural time; level
difference spectrograms,; clustering those
time-frequency (TF) bins into stable time-azimuth histogram
representations. This allows multiple active sources to
appear as distinct azimuthal clusters, while degradations
manifest as broadened, diffused, or shifted distributions.
Crucially, Binaspect operates blindly on audio, requiring
no prior knowledge of head models. These visualizations
enable researchers; engineers to observe how binaural
cues are degraded by codec; renderer design choices,
among other downstream processes. We demonstrate the tool
on bitrate ladders, ambisonic rendering,; VBAP source
positioning, where degradations are clearly revealed. In
addition to their diagnostic value, the proposed
representations can be exported as structured features
suitable for training machine learning models in quality
prediction, spatial audio classification,; other
binaural tasks. Binaspect is released under an open-source
license with full reproducibility scripts at: (link removed
for blind review)

Authors

Alessandro Ragano

University College Dublin

Andrew Hines

Dan Barry

University College Dublin

Davoud Shariat Panah

University College Dublin

Jan Skoglund

Google, Google

Thursday May 28, 2026 1:30pm - 3:30pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Poster | Audio Processing, Poster | Immersive Audio, Poster | Perception, Poster

Presentation Type Poster

1:30pm CEST

Lightweight Real-time Spatial Audio Interpolation for Standalone VR using Hand Claps

Thursday May 28, 2026 1:30pm - 3:30pm CEST

Foyer Building 303A

Realistic spatial audio consistent with visual information
is essential for providing high immersion in Augmented
Reality (AR) environments. However, conventional
high-precision real-time acoustic simulations require
significant computational power, limiting their
implementation on standalone mobile VR devices such as the
Meta Quest. This study proposes a practical method to
enhance reverb realism using solely a standalone VR HMD,
without the need for additional external equipment. By
measuring impulse responses using a few hand claps in the
physical space, we interpolate room acoustic
parameters—specifically RT60; early/late energy
ratios—to reflect the environment's unique characteristics.
These extracted parameters are then applied to the VR
engine's built-in reverb effects, enabling dynamic,
location-aware real-time rendering with minimal
computational load. The proposed method demonstrates that a
brief calibration period of 3 to 5 minutes yields
significantly improved realism compared to static reverb
templates, offering an efficient; practical spatial
audio solution for mobile
AR environments.

Authors

Minsu Kim

Seoul National University

Thursday May 28, 2026 1:30pm - 3:30pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Applications and Technologies, Poster | Audio Processing, Poster | Immersive Audio, Poster

Presentation Type Poster

1:30pm CEST

Perceptual Evaluation of the MPEG-I Immersive Audio Standard

Thursday May 28, 2026 1:30pm - 3:30pm CEST

Foyer Building 303A

The recently finalized ISO international standard (IS) on
MPEG-I immersive audio enables interactive
six-degrees-of-freedom (6DoF) audio rendering for a
multitude of virtual-reality; augmented-reality (VR/AR)
acoustic scenarios; applications with comprehensive
modeling of room acoustics; intricate acoustic
phenomena, including e.g. occlusion, reflection,
transmission; diffraction caused by sound obstacles,
Doppler effect,; dynamic environment changes triggered
by user interactivity. This paper describes concept,
methodology; results of the final verification test of
this standard. In the verification test, the perceptual
quality of the renderer was assessed in an interactive
listening test using different in-; outdoor acoustic
scenes, testing the above-mentioned features of the
standard. More than 50 listeners participated in the test
distributed across six labs using the ITU‑R BS.2132 [1]
multi‑stimulus method on a 100‑point scale for three
conditions (IS, mid-; low anchor) in 10 VR scenes plus
two repetitions. The results of several anchor processing
configurations are presented. The selected mid; low
anchors have demonstrated stable quality across diverse
scenes with progressive timbre; spatial degradations.
The listening test results show a clear separation of the
conditions (IS > mid > low); the low anchor was stable
(around 16 points median value) while the mid anchor varied
by scene (around 47 points). The IS is rated with a median
of 84 points among all labs, which is the “excellent”
region of the scale. The individual scenes are rated
differently. The quartile range for some scenes can exhibit
20 points. The median value for the IS of the different
labs varied, some are a bit more critical than others.

Authors

Alejandro Restrepo Garcia

Fraunhofer IIS

Andreas Silzle

Fraunhofer IIS, Fraunhofer IIS

Germany

Erlendur Karlsson

Ericsson

Hiroyuki Ehara

Panasonic

Jussi Leppanen

Nokia

Leon Terentiv

Dolby, Dolby

Germany

Pablo Delgado

Fraunhofer IIS, Fraunhofer IIS

Erlangen, DE

Sam Jelfs

Philips

Sascha Disch

Fraunhofer IIS, Fraunhofer IIS

Sascha Disch received his Dipl.-Ing. degree in electrical engineering from the Technical University Hamburg-Harburg (TUHH) in 1999 and joined the Fraunhofer Institute for Integrated Circuits (IIS) the same year. Ever since he has been working in research and development of perceptual... Read More →

Thursday May 28, 2026 1:30pm - 3:30pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Applications and Technologies, Poster | Perception, Poster

Presentation Type Poster

1:30pm CEST

Can the individual winner HRTFs be determined in a shooting task during onboarding for an Audio Only VR?

Thursday May 28, 2026 1:30pm - 3:30pm CEST

Foyer Building 303A

The significance of individual versus generic HRTFs in
Virtual Audio can be difficult to ascertain given the
variety of scenarios; tasks related to the spatial
listening experience. Are we working on the most
significant 80% of the success or fine-tuning the last 5%
of the sound quality? When the VR users are blind it is
fair to assume that the quality of the spatial audio
becomes a critical; more important factor. This is the
challenge as we see it. In the present project, we will
investigate options for powerful game components relying on
spatialized sound, using effects that are natural for the
blind gamer. As a first step, we have implemented a test
platform, where different options for HRTFs will exist,;
where the on-boarding process shall reveal the optimal
solution for the given user. The test scenario is inspired
by a “classical” shooting down sound sources scenario,
where we will vary e.g. the task definition, success
criteria (hit zone, number of attempts; elapsed time) as
well as eavesdropping game internal parameters of more
complex nature (e.g. navigation trajectories). The results
will display the variation in normal seeing listeners;
produce normative data for later comparisons with blind
participants. The platform also includes options for simple
mirror-image room models,; standardized reverberation,
which will be used in later tests to learn, whether the
room acoustics may play a stronger role for the blind
gamers’ navigation; source identification, than for
normal seeing listeners.

Authors

Dorte Hammershøi

Professor, Acoustics and Hearing, AI and Sound, Department of Electronic Systems, Aalborg University

Flemming Christensen

Acoustics and Hearing, AI and Sound, Department ofnElectronic Systems, Aalborg University

Max Væhrens

PhD Fellow, Acoustics and Hearing, AI and Sound, Department ofnElectronic Systems, Aalbor...

Stefania Serafin

Department of Engineering Technology and Didactics,nTechnical University of Denmark

Thursday May 28, 2026 1:30pm - 3:30pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Immersive Audio, Poster

Presentation Type Poster

1:30pm CEST

Exploiting Source Directivity for Robust Asymmetric Crosstalk Cancellation

Thursday May 28, 2026 1:30pm - 3:30pm CEST

Foyer Building 303A

This study investigates the relationship between the
robustness of crosstalk cancellation; the symmetry of
system configuration. Analytical results show that, when
the positions of the sound sources are fixed, increasing
asymmetry caused by deviations in the listener’s head
position or orientation leads to a reduction in system
robustness, whereas optimal performance is consistently
achieved in symmetric layouts. For asymmetric
configurations, we propose a method to optimize the axial
angles of the sound sources. This method leverages source
directivity patterns to adjust level differences along the
acoustic propagation paths, thereby improving system
robustness. Experiments confirm the effectiveness of the
proposed method in asymmetric crosstalk cancellation
systems, demonstrating enhanced robustness; yielding
higher binaural channel separation under slight listener
head movements.

Authors

Jianbin Yang

Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark

Keyu Pan

Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark

Ning Cong

Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark

Xing Tian

Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark, Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark

Thursday May 28, 2026 1:30pm - 3:30pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Immersive Audio, Poster

Presentation Type Poster

1:30pm CEST

Capturing Immersive Sound in Concert Halls: A Comparative Analysis of PCMA-3D and Decca Cuboid Recording Techniques

Thursday May 28, 2026 1:30pm - 3:30pm CEST

Foyer Building 303A

This paper presents a comparative analysis of two immersive
recording techniques for classical music: the PCMA-3D
(Perspective Control Microphone Array); the Decca
Cuboid. While the Decca Cuboid relies primarily on
time-of-arrival differences to generate spatial
impressions, the PCMA-3D utilises intensity differences;
separates ambience from direct sound. A recording session
was conducted in a concert hall using a classical guitar
soloist; two distinct folk music ensembles to capture
performances simultaneously with both arrays. Subjective
evaluation was performed using a MUSHRA listening test with
18 participants, assessing parameters such as sensation of
space, localisation precision,; sound quality.
Statistical analysis reveals that while both systems
provide high-quality immersive experiences, the PCMA-3D
scored significantly higher in the sensation of space (p

Authors

Zechen Wang

University of York

Thursday May 28, 2026 1:30pm - 3:30pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Immersive Audio, Poster | Acoustics of Music Rooms, Poster | Perception, Poster | Recording Production and Reproduction, Poster

Presentation Type Poster

1:30pm CEST

Poster Session 1

Thursday May 28, 2026 1:30pm - 3:30pm CEST

Foyer Building 303A Posters

- Binaspect: A Python Library for Binaural Audio Analysis, Visualization & Feature Generation

- Lightweight Real-time Spatial Audio Interpolation for Standalone VR using Hand Claps

- Perceptual Evaluation of the MPEG-I Immersive Audio Standard

- Can the individual winner HRTFs be determined in a shooting task during onboarding for an Audio Only VR?

- Exploiting Source Directivity for Robust Asymmetric Crosstalk Cancellation

- Capturing Immersive Sound in Concert Halls: A Comparative Analysis of PCMA-3D and Decca Cuboid Recording Techniques

Thursday May 28, 2026 1:30pm - 3:30pm CEST
Foyer Building 303A Posters Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

POSTER SESSIONS

2:00pm CEST

Personalized VR for hearing research with embedded devices

Thursday May 28, 2026 2:00pm - 2:30pm CEST

Aud 42

Deep learning has significantly improved speech enhancement
performance in controlled laboratory conditions, yet these
advances rarely translate into robust real-world benefit
for hearing aid users. Current algorithms are trained;
evaluated in simplified acoustic scenarios, neglecting
multimodal cues, user interaction, environmental dynamics,
; the strict latency; power constraints of embedded
devices. As a result, a persistent gap remains between
algorithmic performance; everyday listening experience.
This position paper reviews recent progress in speech
enhancement, embedded Artificial Intelligence hardware,;
hearing aid systems,; argues for a shift toward
ecologically valid evaluation; hardware-aware design. We
propose virtual reality as a reproducible, multisensory
benchmarking platform enabling joint assessment of human
perception; algorithmic processing. This perspective
outlines a research roadmap toward adaptive, context-aware,
; practically deployable hearing technologies.

Authors

Romain Michon

INRIA

Romain Serizel

LORIA - Laboratoire Lorrain de Recherche en Informatique etnses Applications

Stefania Serafin

Department of Engineering Technology and Didactics,nTechnical University of Denmark

Thursday May 28, 2026 2:00pm - 2:30pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Applications and Technologies, Lecture | Audio Processing, Lecture | Perception, Lecture

Presentation Type Lecture

2:00pm CEST

The Perception; Measurement of Nonlinear Distortion in Headphones

Thursday May 28, 2026 2:00pm - 2:30pm CEST

Aud 44

Few studies exist on the perception; measurement of
nonlinear distortion in headphones. This paper reports the
detection thresholds; perceived sound quality from real
distortion in headphones. Five different distortion
measurements were made on the headphones to determine how
well they predict audibility; quality. Music samples
were binaurally recorded on six headphones at playback
levels ranging from 85 to +110 dBA at 3 dB increments. The
recordings were reproduced at a normal playback level (83
dBA) through a reference headphone with low distortion. The
headphone recordings were post-processed to remove both
level; frequency response differences so only nonlinear
distortions; residual noise remained. In a second test,
listeners rated the similarity in quality of headphones
relative to an undistorted reference; a hidden version
of it. The results provide evidence audible distortion in
headphones with music occurs at significantly higher
playback levels (104 to 112 dBA SPL) than what is
considered typical; safe. The percentage of measured THD
in the headphone had the highest correlation with the
detection thresholds while the non-coherent distortion with
music best predicted the similarity ratings. We discuss the
results; the practical implications they might have on
future headphone design, testing; measurement.

Authors

Pierre-Emmanuel Lelièvre

Rtings

Sean Olive

Audio Consultant, Sean Olive Audio Consulting

United States

Thursday May 28, 2026 2:00pm - 2:30pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Equipment, Lecture | Perception, Lecture

Presentation Type Lecture

2:00pm CEST

Perceptual Model Considering Comodulation Masking Release by Postmasking Adaptation

Thursday May 28, 2026 2:00pm - 2:30pm CEST

Aud 43

This work presents a perceptual model based on a complex
IIR filterbank. The filterbank with a frequency resolution
of 4 bands per Bark consists of 104 filters whose slopes
are designed to take spectral masking effects into account.
The filter outputs are used to obtain masking thresholds
with the following post processing. To obtain resonable
masking thresholds from the spreading outputs, a post
masking stage is required. Here, we propose a comodulation
dependent adaptation of the postmasking decay to model
Comodulation Masking Release (CMR) effects. This approach
explicitely considers the dip-listening effect known from
literature. The final masking thresholds are obtained by
weighting the postmasking outputs by a tonality dependent
gain, controlled using spectral flatness estimation. A
listening test compares the proposed method to an already
known approach using direct CMR based modification of the
masking threshold gains.

Authors

Bernd Edler

International Audio Laboratories Erlangen, Germany

Fabian Schaller

Fraunhofer IIS, Erlangen, Germany

Thursday May 28, 2026 2:00pm - 2:30pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Processing, Lecture | Perception, Lecture

Presentation Type Lecture

2:00pm CEST

Florian Camerer: 3D Masterclass

Thursday May 28, 2026 2:00pm - 3:00pm CEST

Aud 31

Florian details the design of his brilliant and durable
Double-Ufix 3D mic array, capable of high resolution
outdoor recording. Attendees are treated to memorable
listening examples from natural and rural environments in
Austria and the Nordics.

This masterclass series, featuring remarkable recording
artists, is a chance to hear 3D audio at its best; as we
discuss qualities that make it truly worth the effort.

In each masterclass, we explore the new spatial
possibilities in recording and production, detailing also
this specific listening room, regarding ITU-R BS.1116
compliance and auditory envelopment (AEV) transparency.
Seats are limited to keep playback variation at bay.

Speakers

Florian Camerer

Senior Sound Engineer, ORF

Thomas Lund

Genelec Oy, Genelec Oy

Denmark

Thursday May 28, 2026 2:00pm - 3:00pm CEST
Aud 31 Technical University of Denmark Asmussens Alle, Building 306 DK-2800 Kgs. Lyngby Denmark

Immersive Audio, Masterclass | Perception, Masterclass | Recording Production and Reproduction, Masterclass | Sound Design, Masterclass

Presentation Type Masterclass

2:00pm CEST

Student Recording Competition Category 2: Studio Recording

Thursday May 28, 2026 2:00pm - 3:00pm CEST

Building 302, 2nd floor

Join us to hear the finalists selected for this category of
the Student Recording Competition. We will hear their
presentations and recordings, and comments and feedback
from the judges. Award and prize placements will be
announced on the last day of the convention.

Speakers

Magdalena Piotrowska

Margaret Luthar

Dark Sky Mastering

Ian Corbett

AES / Kansas City Kansas Community College / off-beat-open-hats LLC, AES

Kia Eshghi

CUNY LaGuardia Community College, CUNY LaGuardia Community College

New York City

Authors

Szymon Zaporowski

Gdańsk University of Technology

Thursday May 28, 2026 2:00pm - 3:00pm CEST
Building 302, 2nd floor Technical University of Denmark Asmussens Alle, Building 302 DK-2800 Kgs. Lyngby Denmark

Recording Production and Reproduction, Special Event

Presentation Type Special Event

2:30pm CEST

A Recursive Attractor Network for Long-Form Sound Source Localization; Identity Tracking with a Variable Number of Sources

Thursday May 28, 2026 2:30pm - 3:00pm CEST

Aud 42

Sound source localization; identity tracking are
fundamental tasks in acoustic scene analysis, enabling
machines to determine what, where; when produces sound
events. While deep attractor-based networks have
demonstrated improved performance under an unknown number
of sources, maintaining continuous source tracking over
long-form audio remains challenging due to memory
limitations; permutation ambiguities across adjacent
segments. In this paper, we propose a Recursive Attractor
Network (RANet) for long-form sound source localization;
identity tracking with a variable number of sources. RANet
explicitly represents source attractors as transferable
embeddings; recursively propagates them across adjacent
audio segments using a LSTM-based model, thereby preserving
source identity continuity over time. Experimental results
on simulated datasets demonstrate that RANet achieves
robust long-form sound source localization; consistent
source identity tracking, outperforming baseline approaches
under variable; dynamic source conditions.

Authors

Jiaqi Du

Peking University

Tianshu Qu

Peking University

Xihong Wu

Peking University

Thursday May 28, 2026 2:30pm - 3:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Processing, Lecture

Presentation Type Lecture

2:30pm CEST

Optical MEMS microphones leverage architectural advantages to achieve 80dB SNR

Thursday May 28, 2026 2:30pm - 3:00pm CEST

Aud 44

There are three architectural approaches to
microelectromechanical systems (MEMS) microphones,
miniature devices used in a wide range of products.
Capacitive microelectromechanical systems (MEMS)
microphones are embedded in billions of consumer
electronics. Solder-compatible; providing tight
part-to-part sensitivity matching—all in a small
footprint—capacitive MEMS microphones have demonstrated
improved performance in recent years. State-of-the-art
digital capacitive MEMS microphones can now achieve up to
72dB signal-to-noise ratio (SNR), with a 22dBA noise floor
; overall dynamic range in the order of 106 dB.

However, capacitive MEMS microphone technology has now
reached the limits of its architecture, which constrains
the key audio performance metrics: SNR; acoustic
overload point (AOP).

Piezoelectric MEMS microphones have not demonstrated SNR
performance exceeding 65dB,; require new materials to be
developed to increase their performance.
Optical MEMS microphones—a new architectural approach that
combines a laser optical subsystem, a MEMS; advanced
CMOS circuit design—has exceeded the limits of capacitive
technology. With 80dB SNR supporting a 14 dBA noise floor,
132 dB dynamic range,; a 146dB AOP, optical MEMS
microphones accomplish studio-quality performance in a tiny
form factor that supports semiconductor-level yields in
high-volume manufacturing.

This presentation will explain the architectural
advancements of optical MEMS microphones in comparison to
capacitive MEMS microphones. It will provide example use
cases of high-SNR; high-AOP microphones in high volume
applications.

Authors

Jakob Vennerød

sensiBel

Thursday May 28, 2026 2:30pm - 3:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Applications and Technologies, Lecture | Audio Equipment, Lecture

Presentation Type Lecture

2:30pm CEST

EMORSION – Examining the Impact of Audio Features on Emotional Responses; Immersion in Film.

Thursday May 28, 2026 2:30pm - 3:00pm CEST

Aud 43

EMORSION is an exploratory study examining how film audio
design shapes audience emotion; immersion. It was
conducted using scenes from four films in the horror (2)
; drama (2) genres, with two mainstream; two
independent productions. For each scene, multiple
alternative audio mixes were created by systematically
manipulating three core aspects of audio design; frequency
(pitch), dynamics (loudness),; directionality (spatial
placement). Three audience groups were exposed to the
scenes in a cinema setting, with each group experiencing
either one manipulated audio mix; a control mix.
Audience responses were assessed through a multimodal
framework combining self-reported emotion; immersion via
a questionnaire,; physiological measures, including
heart rate monitoring; video-based motion tracking.
Results show that subtle changes in audio design
significantly affect emotional perception; immersion.
Unconventional mixes produced greater variability in
interpretation, while conventional immersive mixes led to
stronger agreement across audiences. Notably, participants
often reported perceived visual changes despite no
alterations to the visual content.

Authors

Bleiz Macsen Del Sette

Charalampos Saitis

Queen Mary University of London

George Fazekas

Queen Mary University of London

Josh Reiss

Professor, Queen Mary University of London

Josh Reiss is Professor of Audio Engineering with the Centre for Digital Music at Queen Mary University of London. He has published more than 200 scientific papers (including over 50 in premier journals and 6 best paper awards) and co-authored two books. His research has been featured... Read More →

Nelly Garcia

PhD Researcher, Queen Mary University of London

I'm Nelly Garcia.
I'm an engineer in communications and electronics with the specialty in acoustics.
Now, I'm a PhD Researcher at the Centre for Digital Music (C4DM) at Queen Mary University of London.
My main interest is sound design, ways to create sounds from scratch, optimize the workflow of a sound designer and innovative ways to label, categorise or access samples... Read More →

Ruby Crocker

Queen Mary University of London

Thursday May 28, 2026 2:30pm - 3:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Immersive Audio, Lecture | Perception, Lecture | Sound Design, Lecture

Presentation Type Lecture

2:30pm CEST

Be A Leader!

Thursday May 28, 2026 2:30pm - 3:30pm CEST

Aud 41

Have you ever wondered how AES works? Let's meet up and
talk about the benefits of volunteering and the path to
leadership in AES! You could be our next Chair, Vice
President, or even AES President!

Speakers

Jan Abildgaard Pedersen

Convention Chair, Audio Engineering Society

Cesar Lamschtein

President Elect, Audio Engineering Society

Agnieszka Roginska

Professor of Music Technology, New York University

Professor of Music Technology

Lars Tirsbæk

Head of Sonic Days, Sonic College

Ewa Łukasik

Poznan University of Technology, Institute of ComputingnScience

Brecht De Man

Head of Research, AES President

Thursday May 28, 2026 2:30pm - 3:30pm CEST
Aud 41 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Special Events, Panel

Presentation Type Panel

3:00pm CEST

Sound Absorber Estimation with Deep Neural Network

Thursday May 28, 2026 3:00pm - 3:30pm CEST

Aud 42

Boundary conditions are a critical part of room acoustic
simulations. In the case of ray tracing, absorption
coefficients of nearly all materials are measured;
provided. However, wave-based simulations face several
issues. The first one is the variety of boundary conditions
used. Depending on the method, surface impedance or
admittance might be needed, either in the frequency or in
the time domain, as an angle-dependent or averaged
variable. This limitation hinders the development of a
standard measured quantity for boundary conditions in
wave-based simulations. In turn, this leads to the second
issue encountered, which is the lack of widely available
data to describe the characteristics of the different
materials commonly found in rooms. In this study, a deep
neural network has been trained to estimate the material
properties of porous absorbers from their absorption
coefficient in octave bands. These estimated material
properties can then be used to calculate any boundary
condition needed. This method thus allows to characterize
the boundary conditions for any type of room acoustic
simulation from the most commonly available data. Moreover,
it provides a new tool to identify the sound absorber
corresponding to a desired absorption profile during the
design phase of a project. The training dataset in this
study was generated from finite element method simulations.
The poroelastic properties of the material, the sample
thickness, as well as the depth of the air cavity backing
the material were varied to create the training dataset.

Authors

Boris Mondet

COMSOL A/S

Thursday May 28, 2026 3:00pm - 3:30pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Acoustics of Music Rooms, Lecture

Presentation Type Lecture

3:00pm CEST

Deep-Learning-Driven Sensory Profiling of Headphone Target Curves with Adaptive Listening Test Validation

Thursday May 28, 2026 3:00pm - 3:30pm CEST

Aud 44

Identifying robust headphone target curves is challenging
when preference data from untrained listeners are
interpreted without explicit perceptual structure. This
work presents a methodological framework in which deep-
learning-driven sensory-profile analysis serves as the
primary interpretive layer for listening data.
Candidate target curves are generated using an Interactive
Differential Evolution (IDE) listening experiment that
combines paired comparisons with a second- stage
absolute-rating task, enabling continuous exploration of the
perceptually relevant tuning space while reducing cognitive
load. Converged gain sets are analyzed using a Virtual
Listener Panel (VLP), a Deep Learning (DL) model trained on
large-scale expert evaluations to predict perceptual
attributes from rendered musical material. Predicted
attributes are reported as relative scores along key sensory
dimensions, including bass strength, timbral balance,;
brilliance, enabling exploration of sensory clusters,
perceptual trade-offs,; potential families of target
tunings.
Adaptive listening data from three culturally distinct
listener panels (Denmark, Japan,; Colombia; 20
participants
per site) support the DL-based interpretation. Convergence
is quantified as a reduction in population variance,
; cross-site analyses assess the similarity of clustering
structures; the consistency of relationships between
preference; sensory attributes. Overall, the framework
provides a scalable, perceptually grounded approach to
interpreting listener-preference data when developing
headphone target curves.

Authors

Gabriele Ravizza

Perceptual Audio Evaluation Specialist, FORCE Technology

▪ Acoustics, psychoacoustics, product development, and digital communication as an Audio Engineer in the consumer electronics industry.
▪ Currently employed as a specialist at FORCE Technology's SenseLab department, contributing to enhancing sound quality in a wide range of consumer electronics products, collaborating with audio companies from across the globe... Read More →

Julian Villegas

University of Aizu, University of Aizu

Japan

Thursday May 28, 2026 3:00pm - 3:30pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Applications and Technologies, Lecture | Audio Processing, Lecture | Perception, Lecture

Presentation Type Lecture

3:00pm CEST

Emergence; Spatial Directionality of Sa Quintina in the Sacred Vocal Tradition of Castelsardo, Sardinia, Italy: An Early-Stage Sonological–Acoustical Study

Thursday May 28, 2026 3:00pm - 3:30pm CEST

Aud 43

Sa quintina is a distinctive emergent vocal phenomenon
almost exclusively associated with the sacred polyphonic
singing tradition of Castelsardo, perceived as an
autonomous “fifth voice” arising during collective
performance by four male singers. Although widely
acknowledged in ethnomusicological literature, its
formation mechanisms remain only partially explored within
audio engineering; acoustical research.
This paper presents an early-stage, descriptive sonological
case study proposing new hypotheses on the formation;
spatial reinforcement of sa quintina. The phenomenon is
interpreted as a physically grounded, measurable outcome of
harmonic fusion; spatial interference, observable
through spectral energy distribution; coherence. It is
hypothesized to emerge from a converging set of
conditions—including non-tempered harmonic textures,
differentiated vocal emission techniques, intentional
formant tuning,; circular spatial configuration—none of
which is assumed to be strictly sufficient in isolation.
Building upon previous spectral coherence analyses, the
study introduces a Quintina Directionality Index (QDI) to
quantify the spatial dimension of the phenomenon. QDI is
defined as the ratio between spectral energy in two
frequency bands associated with sa quintina (600–750 Hz;
1200–1400 Hz); total spectral energy. The index is
evaluated as a function of direction using ambisonic
recordings in an anechoic chamber; as a function of
microphone position in a controlled field setting.
Preliminary observations suggest that sa quintina
corresponds to localized regions of enhanced spectral
coherence; energy reinforcement, supporting its
interpretation as an emergent physical phenomenon that
precedes; enables its perceptual salience, rather than a
purely auditory illusion.

Authors

Felicita Brusoni

PhD candidate Musikhögskolan i Malmö, Lund University

Luca Frigo

Conservatorio G. Nicolini Piacenza

Martino Sarolli

Conservatorio Paganini Genova

Riccardo Dapelo

Conservatorio Nicolini Piacenza

Thursday May 28, 2026 3:00pm - 3:30pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Processing, Lecture | Cross-Disciplinary Sound Studies, Lecture | Perception, Lecture

Presentation Type Lecture

3:00pm CEST

Jim and Ulrike Anderson: 3D Masterclass

Thursday May 28, 2026 3:00pm - 4:00pm CEST

Aud 31

Jim and Ulrike have been recording in and for immersive
audio for broadcast, film and audiophile staples for
decades. They specialize in turning traditional acoustic
New York Studio recordings into vast spatial experiences.
The audiences will be experiencing the breathtaking
virtuosity of the likes of Jane Ira Bloom, the Secret Trio,
Donald Vega and large format ensembles under Franco
Ambrosetti and Jim Pugh.

This masterclass series, featuring remarkable recording
artists, is a chance to hear 3D audio at its best; as we
discuss qualities that make it truly worth the effort.

In each masterclass, we explore the new spatial
possibilities in recording and production, detailing also
this specific listening room, regarding ITU-R BS.1116
compliance and auditory envelopment (AEV) transparency.
Seats are limited to keep playback variation at bay.

Speakers

Jim Anderson

Professor, Anderson Audio New York

Jim has been the President of the AES Educational Foundation since 2020 and is a professor of recorded music with the Clive Davis Institute of Recorded Music in the Tisch School of the Arts at New York University. Jim was the Institute’s Chair from 2004 – 2008. A graduate of the... Read More →

Thomas Lund

Genelec Oy, Genelec Oy

Denmark

Ulrike Anderson

Anderson Audio New York

Thursday May 28, 2026 3:00pm - 4:00pm CEST
Aud 31 Technical University of Denmark Asmussens Alle, Building 306 DK-2800 Kgs. Lyngby Denmark

Acoustics of Music Rooms, Masterclass | Immersive Audio, Masterclass | Perception, Masterclass | Recording Production and Reproduction, Masterclass

Presentation Type Masterclass

3:00pm CEST

TC-NAS : AES Technical Committee on "NETWORK AUDIO SYSTEMS"

Thursday May 28, 2026 3:00pm - 4:00pm CEST

Aud 93

AES Technical Committee on "NETWORK AUDIO SYSTEMS"

The AES Technical Committees (TC) lead the Society's involvement in science and technology, and are a hub of networking, knowledge and expertise. Each TC specializes in a specific area of audio, and helps forge links between each of these areas and the society as a whole. Connect and engage!

Speakers

Kevin Gross

Thursday May 28, 2026 3:00pm - 4:00pm CEST
Aud 93 Technical University of Denmark Asmussens Alle, Building 302 DK-2800 Kgs. Lyngby Denmark

AES Technical Committee Meetings

3:00pm CEST

From Spec to Studio: Immersive Audio Creation and Delivery using Eclipsa Audio

Thursday May 28, 2026 3:00pm - 4:00pm CEST

Aud 49

Eclipsa Audio, based on the Immersive Audio Model and
Format (IAMF) specification developed by members of the
Alliance for Open Media, represents an open and
royalty-free approach to immersive audio creation and
delivery. Eclipsa Audio provides a growing ecosystem for
producing and distributing spatial audio content, with
hardware integration and streaming platform support,
including YouTube, actively being rolled out. This panel
brings together practitioners, researchers, and engineers
directly involved in the development of IAMF and Eclipsa
Audio to inform the audio engineering community about the
current state of the format and its evolving toolkit.
Presenters will provide an overview of the specification's
design principles, discuss the collaborative research and
development effort behind the Open Audio Renderer (OAR) and
Open Audio Codec (OAC), introduce the content creation
tools currently available within the Eclipsa Audio
ecosystem, and propose practical workflows for immersive
audio production and delivery. The session will include
presentations followed by an open discussion addressing
format interoperability, integration with existing
production environments, listener experience
considerations, and future directions for development.
Audience participation is encouraged.

Speakers

Katarzyna Sochaczewska

Immersive Music Producer, Researcher, University of York

Toni Hirvonen

Researcher, Samsung Research America

Toni Hirvonen studied acoustics at the Helsinki University of Technology (now Aalto University), where he obtained a PhD in audio signal processing and spatial audio. After a position as a Marie Curie fellow, he has worked internationally in the audio industry since 2010. His projects... Read More →

Jani Huoponen

Google, Google LLC

With 25+ years of media industry product development, Jani Huoponen is a seasoned expert in developing cutting-edge audio and video technologies for consumer devices and streaming systems. Joining Google in 2010, he’s served as a product manager across key multimedia initiatives... Read More →

Tomasz Rudzki

University of York

Thursday May 28, 2026 3:00pm - 4:00pm CEST
Aud 49 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Immersive Audio, Panel

Presentation Type Panel

3:00pm CEST

Student Recording Competition Category 1: Traditional Acoustic Recording

Thursday May 28, 2026 3:00pm - 4:00pm CEST

Building 302, 2nd floor

Speakers

Richard King

McGill University, McGill University

Montreal

Kseniya Kawko

Tonmeister, msm studios

Ian Corbett

AES / Kansas City Kansas Community College / off-beat-open-hats LLC, AES

Kia Eshghi

CUNY LaGuardia Community College, CUNY LaGuardia Community College

New York City

Thursday May 28, 2026 3:00pm - 4:00pm CEST
Building 302, 2nd floor Technical University of Denmark Asmussens Alle, Building 302 DK-2800 Kgs. Lyngby Denmark

Recording Production and Reproduction, Special Event

Presentation Type Special Event

3:30pm CEST

Center Extraction GAN

Thursday May 28, 2026 3:30pm - 4:00pm CEST

Aud 42

This paper presents a method for extracting a center signal
from two-channel stereo signals for upmixing;
reproduction with additional center loudspeakers.
It uses a generative adversarial network with a generator
trained with multiple reconstruction losses; adversarial
losses obtained from a discriminator.
The processing is of low computationally complexity, causal
; can be configured for latencies down to one audio frame
of 46 ms length.
It is described how training data are created using only
publicly available signals; how the generation of target
data enables to control the attenuation of diffuse signals
; direct signals panned off-center.
An evaluation with listening test; computational metrics
SI-SDR; F2 measure is presented.
It shows an advantage compared to methods based on
classical signal processing in terms of computational
metrics for source separation; listeners preference.

Authors

Andreas Walther

Fraunhofer IIS

Christian Uhle

Chief Scientist, Fraunhofer Institute for Integrated Circuits IIS

Christian Uhle is chief scientist in the Audio and Media Technologies division of the Fraunhofer IIS, Erlangen, Germany, and in the International Audio Laboratories Erlangen.
He received the Dipl.-Ing. and PhD degrees from the Technical University of Ilmenau, Germany, in 1997 and... Read More →

Julian Klapp

Fraunhofer Institute for Integrated Circuits IIS

Pablo Panter

Fraunhofer Institute for Integrated Circuits IIS

Thursday May 28, 2026 3:30pm - 4:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Applications and Technologies, Lecture | Audio Processing, Lecture

Presentation Type Lecture

3:30pm CEST

Measurement Uncertainty of MEMS Microphone Sensitivity in A Free-Field Condition

Thursday May 28, 2026 3:30pm - 4:00pm CEST

Aud 44

This work presents a measurement uncertainty evaluation of
the free-field sensitivity of a MEMS microphone using a
substitution comparison method. The measurement setup is
based on principles used in secondary microphone
calibration, with sensitivity determined relative to a
calibrated reference microphone. The uncertainty analysis
follows the Guide to the Expression of Uncertainty in
Measurement (GUM), where Type A; Type B uncertainty
evaluations are propagated through a defined measurement
model to obtain the final measurement result. The MEMS
microphone sensitivity is estimated together with an
expanded uncertainty, where the calibration uncertainty of
the reference microphone is identified as the dominant
contributor. Broadband results show that the measured
sensitivity is close to the typical manufacturer
sensitivity over a wide frequency range; follows a
similar frequency trend. The proposed approach enables
reproducible estimation of the free-field sensitivity of
MEMS microphones; provides a clear framework for
uncertainty evaluation.

Authors

Salvador Barrera Figueroa

Danish Fundamental Metrology A/S, 2970 Hørsholm, Denmark

Teguh Aditanoyo

DTU Electrical and Photonics Engineering, TechnicalnUniversity of Denmark (DTU), 2800 Kgs. Lyngby, Denmark

Thursday May 28, 2026 3:30pm - 4:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Equipment, Lecture

Presentation Type Lecture

3:30pm CEST

NAVIQUAL: Creating Spatial Audio Quality Maps for Virtual Live Music Environments

Thursday May 28, 2026 3:30pm - 4:00pm CEST

Aud 43

Live music environments can be simulated; evaluated
through spatial audio; augmented reality (AR)
technology. However, conducting perceptual studies on AR
environments can be challenging, as multiple design
considerations; uncontrolled variables come into play.
Hence, we developed Naviqual, a tool to create a spatial
audio quality map for a virtual live music environment. We
generated objective quality contour; polar maps to
predict the quality of experience (QoE) across listener
locations; directions respectively. We found that these
maps strongly aligned with perceptual evaluations by
normal-hearing listeners through listening tests. We also
found that binaural objective metrics; signal-to-noise
ratio both strongly predict QoE across listener
translations, with the former outperforming the latter in
predicting QoE across listener directions. Overall,
Naviqual provides a QoE map for virtual live music
environments robust across various listener locations;
directions, noise locations, music content,; room
acoustics.

Authors

Andrew Hines

Carl Timothy Tolentino

University College Dublin

Thursday May 28, 2026 3:30pm - 4:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Immersive Audio, Lecture | Perception, Lecture

Presentation Type Lecture

3:30pm CEST

Audio engineering music for listeners with hearing loss

Thursday May 28, 2026 3:30pm - 4:30pm CEST

Aud 41

Audio engineering often implicitly assumes a uniformity in
hearing across listeners; this is an assumption that does
not reflect real-world diversity. How could technologies
and practices in production, mixing, and reproduction be
adapted to create music that is more inclusive? While the
AES has a conference series on Audio and Music Induced
Hearing Disorders, this has focused on the causes of
hearing loss with little on audio engineering for listeners
who have a hearing loss.

In western countries, about one in three adults are deaf,
have hearing loss or suffer from tinnitus. Hearing loss can
lead to many challenges with music such as: inaudibility of
quieter passages, distortion, degraded pitch perception,
and difficulty in identifying and picking out lyrics and
instruments. The most common intervention for mild to
moderately severe hearing loss is hearing aids. But while
many of these devices have music programs, their efficacy
is mixed, to the point that many opt not to use them. With
the rise of machine learning within Audio Engineering,
there are opportunities to better personalise music, and
therefore address issues listeners face. Consumer devices
are also increasingly having audio accessibility features
added, but the usefulness of these lack independent
testing. This workshop will consider opportunities for
making music more accessible.

The workshop will start by exploring how hearing loss harms
the experience of listening to music and how this varies
between people. This will lead to discussion of why no
technology can fully ‘correct’ music to achieve a ‘perfect’
listening experience for those with hearing loss. There is
no technology to recreate a ‘golden-ears’ experience. This
leads to a key research question: what is the best,
rendition of a piece of music for someone who has hearing
loss? What do listeners want from music, and how can we get
closest to achieving that?

We will bring in findings from research projects and
listening tests to explore what is known, and also to
highlight that there are significant gaps in knowledge that
require further research. We will then explore
state-of-the-art in wearables such as hearing aids and
sound reproduction systems. This will include the current
Cadenza project, which has been running a series of machine
learning challenges to improve music for those with hearing
loss.

Throughout, we will encourage questions and engagement from
delegates. We want to hear about lived experience of
hearing difference and how that has changed professional
practice and personal lives. We are also keen to hear
suggestions from delegates on what approaches might be used
to improve music for those with hearing loss.

We aim to raise awareness of the importance of considering
diverse audiences in Audio Engineering practice. Where
possible, the workshop will provide practical guidance for
audio engineers, highlighting techniques and emerging
technologies that can better support listeners with diverse
hearing profiles.

The Workshop will be organised by the Cadenza Project Team
https://cadenzachallenge.org/ A large UK-funded project
about improving music for those with hearing loss.

Speakers

Josh Reiss

Professor, Queen Mary University of London

Trevor Cox

University of Salford

Sara Madsen

GN Store Nord

Adam Steed

Contact Theatre, Manchester

Thursday May 28, 2026 3:30pm - 4:30pm CEST
Aud 41 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Presentation Type Panel

4:00pm CEST

Flow-HOA: Generative Joint Optimization for Ambisonics Encoding via Flow Matching

Thursday May 28, 2026 4:00pm - 4:30pm CEST

Aud 42

Higher-Order Ambisonics (HOA) encoding from sparse,
irregular microphone arrays remains a critical challenge
for consumer spatial audio capture in immersive
communication; XR. We propose Flow-HOA, a generative
framework that jointly optimizes a multi-dimensional
perceptual objective while producing a deployable,
time-invariant bank of Finite Impulse Response (FIR)
encoding filters. Using conditional flow matching, the
model learns to map a simple prior distribution to the
target distribution of FIR filter coefficients. Training is
guided by a composite loss that balances time-domain
waveform fidelity, multi-resolution spectral consistency,
sub-band energy preservation,; spatial directivity
constraints. Objective evaluations demonstrate improved
performance over strong model-based baselines in both
signal fidelity; spatial accuracy metrics. Subjective
listening tests further confirm that Flow-HOA yields higher
overall sound quality with reduced artifacts.

Authors

Tianshu Qu

Peking University

Xueyang Lv

Xiaomi Communications Co., Ltd

Yufan Qian

Peking University

Yuhuan You

Master, Peking University

Thursday May 28, 2026 4:00pm - 4:30pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Recording Production and Reproduction, Lecture

Presentation Type Lecture

4:00pm CEST

Accurate Characterization of Integrated Microphone Arrays for Device--Related Transfer Function Synthesis

Thursday May 28, 2026 4:00pm - 4:30pm CEST

Aud 44

This paper presents an improved method for characterizing
integrated microphone arrays for Device‑Related Transfer
Function (DRTF) synthesis. A probe‑array extension of the
IMPro technique is introduced to measure all device
microphones simultaneously, eliminating unknown timing
offsets that arise in asynchronous device–probe recordings.
A custom four‑element probe array; modular test jig were
developed to evaluate relative inter‑channel propagation
delay (RIPD) accuracy across varied microphone‑port
geometries. Hybrid free‑field DRTFs were synthesized by
combining IMPro data with Boundary Element Method (BEM)
acoustic scattering simulations, demonstrating that the
probe‑array measurements capture small delay variations
essential for precise spatial‑audio modeling. The extended
IMPro method offers a practical, scalable alternative to
anechoic‑chamber measurements for modern multi‑microphone
devices.

Authors

Hannu Pulakka

John Cozens

JCoustics

Matti Hamalainen

Head of Audio Technologies and Ecosystems, Nokia Technology Standards

Matti S. Hämäläinen is a seasoned expert in audio technologi...

Mikko Pekkarinen

Nokia Technology Standards

Thursday May 28, 2026 4:00pm - 4:30pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Applications and Technologies, Lecture | Audio Equipment, Lecture

Presentation Type Lecture

4:00pm CEST

Influences of Nonlinear Distortion in Music Playback on Listeners’ Stress Evaluated by PPI; RMSSD of PPG

Thursday May 28, 2026 4:00pm - 4:30pm CEST

Aud 43

The phenomenon in which listeners’ impressions of music are
unintentionally altered even when the same sound source is
played back remains an important issue. Previous research
has shown that the state; combination of audio equipment
affect the characteristics of nonlinear distortion in music
playback. Hence, we conducted a subjective evaluation of
auditory; musical impressions using sound sources with
various nonlinear distortions. However, the subjective
evaluation was unstable; difficult to assess. The reason
was that the sound change was perceived emotionally as a
slight change in sound image; musicality,; the
interpretation of evaluation terms varies widely among
subjects due to the difficulty of verbalizing the
impression. Therefore, we evaluated the change in
listeners’ stress caused by nonlinear distortion in music
playback using the photoplethysmography (PPG). In this
study, we conducted a follow-up experiment with improved
accuracy.
In the experiment, 41 subjects listened to sound sources
with even-order harmonic distortion at 2.69% THD, odd-order
harmonic distortion at 2.69% THD,; no distortion. The
musical piece of sound sources is an original to eliminate
familiarity; bias toward existing music.
We evaluated changes in subjects’ stress states using the
mean pulse-pulse interval (PPI); the root mean square of
successive differences (RMSSD), computed from the PPG
signal, as indicators of stress.
These results reconfirm that nonlinear distortion in music
playback affects listeners’ vital responses, as evidenced
by significant differences in both mean PPI; RMSSD, as
assessed by Cochran's Q test at the 5% significance level.

Authors

Kenshin Nakada

Tokyo University of Science

Shun Muramatsu

The University of Tokyo

Takahiro Yoshida

Professor, Tokyo University of Science

Thursday May 28, 2026 4:00pm - 4:30pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Perception, Lecture | Recording Production and Reproduction, Lecture

Presentation Type Lecture

4:00pm CEST

Stefan Bock: 3D Masterclass

Thursday May 28, 2026 4:00pm - 5:00pm CEST

Aud 31

Stefan reports from the front lines of recording, mixing,
and live streaming immersive music, highlighting the
technical and creative challenges of delivering
three-dimensional sound in real time. He shares practical
insights into spatial mixing, format compatibility, and the
realities of reliable immersive streaming across diverse
playback environments.

This masterclass series, featuring remarkable recording
artists, is a chance to hear 3D audio at its best; as we
discuss qualities that make it truly worth the effort.

In each masterclass, we explore the new spatial
possibilities in recording and production, detailing also
this specific listening room, regarding ITU-R BS.1116
compliance and auditory envelopment (AEV) transparency.
Seats are limited to keep playback variation at bay.

Speakers

Stefan Bock

Managing Director, msm-studios GmbH

Thomas Lund

Genelec Oy, Genelec Oy

Denmark

Thursday May 28, 2026 4:00pm - 5:00pm CEST
Aud 31 Technical University of Denmark Asmussens Alle, Building 306 DK-2800 Kgs. Lyngby Denmark

Acoustics of Music Rooms, Masterclass | Immersive Audio, Masterclass | Perception, Masterclass | Recording Production and Reproduction, Masterclass

Presentation Type Masterclass

4:00pm CEST

TC-AA : AES Technical Committee on "AUTOMOTIVE AUDIO"

Thursday May 28, 2026 4:00pm - 5:00pm CEST

Aud 93

AES Technical Committee on "AUTOMOTIVE AUDIO"

The AES Technical Committees (TC) lead the Society's involvement in science and technology, and are a hub of networking, knowledge and expertise. Each TC specializes in a specific area of audio, and helps forge links between each of these areas and the society as a whole. Connect and engage!

Speakers

Roger Shively

Thursday May 28, 2026 4:00pm - 5:00pm CEST
Aud 93 Technical University of Denmark Asmussens Alle, Building 302 DK-2800 Kgs. Lyngby Denmark

AES Technical Committee Meetings

4:00pm CEST

Best practices for wireless audio in modern RF environments

Thursday May 28, 2026 4:00pm - 5:00pm CEST

Aud 49

The demand for wireless audio expands constantly, while the
available RF spectrum over recent decades has shrunk and
become more crowded. This session will explore strategies
for making wireless audio work cleanly and reliably,
essential information for live production, as well as TV
and film production.

Speakers

Robert Lee

Applications Engineer / Trainer, RF Venue, Inc.

Thursday May 28, 2026 4:00pm - 5:00pm CEST
Aud 49 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Equipment, Tutorial

Presentation Type Tutorial

4:00pm CEST

Student Mix Critiques 1

Thursday May 28, 2026 4:00pm - 5:00pm CEST

Building 302, 2nd floor

These sessions are an opportunity for AES student members
to receive feedback on their mixes from a panel of industry
professionals, in a live, non-competitive setting. Join us
to hear mixes by other students, and get tips, tricks, and
advice to push your skills to the next level! Mixes can be
submitted in advance by following the instructions are
posted at:
https://www.aesstudents.org/competitions/student-mix-critiques/
Very limited on-site submission may also be possible on
site. Maybe one of your mixes can be featured!

Speakers

Ian Corbett

AES / Kansas City Kansas Community College / off-beat-open-hats LLC, AES

Thursday May 28, 2026 4:00pm - 5:00pm CEST
Building 302, 2nd floor Technical University of Denmark Asmussens Alle, Building 302 DK-2800 Kgs. Lyngby Denmark

Recording Production and Reproduction, Masterclass

Presentation Type Masterclass

4:30pm CEST

Personalized Timbre Optimization for Stereophonic Sound Reproduction via Earphones: Part 2 – Practical Implementation; Validation on Consumer TWS Devices

Thursday May 28, 2026 4:30pm - 5:00pm CEST

Aud 44

This paper presents Part 2 of our study on personalized
timbre optimization for stereophonic sound reproduction via
earphones, following our previous work presented at the AES
International Conference on Headphone Technology in 2025.
While Part 1 established a novel auditory-model-based
framework for reproducing a listener’s natural timbre
reference; demonstrated its perceptual validity under
controlled conditions, the present study focuses on the
practical implementation; validation of this approach
for real-world use with consumer True Wireless Stereo (TWS)
earphones.

Conventional headphone; earphone personalization
techniques primarily target spatial audio reproduction or
rely on preference-based equalization, often overlooking
the accurate reproduction of natural timbre in stereophonic
content. Our approach explicitly addresses this limitation
by isolating; optimizing perceptually relevant timbral
cues while excluding spatial encoding components, thereby
improving timbral fidelity without degrading stereo imaging.

The proposed method originally consists of four stages:
high-resolution anatomical scanning of the listener’s upper
body, including the pinnae, individualized HRTF computation
using the boundary element method, selective removal of
spatial encoding components to derive a personalized
reference target response curve (PR-TRC),; perceptual
optimization using a listener-specific weighting
coefficient grounded in auditory reference fidelity rather
than preference. In this paper, each stage is simplified
; automated using smartphone-based scanning;
AI-assisted processing, enabling end users to complete the
entire personalization process via a smartphone connected
to a cloud-based server. The resulting personalized target
response curve is implemented within the computational;
memory constraints of the DSP pipeline of commercial
consumer TWS earphones.

A subjective evaluation using the Semantic Differential
Method was conducted to assess the perceptual impact of the
simplified implementation. Twenty-four listeners evaluated
personalized target curves generated by both the original
; simplified methods, as well as two non-personalized
target curves commonly used in commercial TWS earphones.
The results show that both personalized methods
consistently outperform non-personalized conditions in
overall sound quality; listener preference. Importantly,
no statistically significant degradation in perceived
timbral naturalness was observed between the simplified;
original methods.

These findings demonstrate that auditory-model-based
personalized timbre optimization can be effectively
translated into a practical, consumer-ready technology. The
proposed approach represents a foundational contribution to
future audio personalization; has broad applicability
across headphone; earphone systems for stereophonic
sound reproduction.

Authors

Atsushi Hara

final Inc.

Haruto Hirai

final Inc.

Kimio Hamasaki

President, Artsridge LLC

Mitsuru Hosoo

final Inc.

Nao Tojo

final Inc.

Shun Saito

final Inc./post-doc

Thursday May 28, 2026 4:30pm - 5:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Applications and Technologies, Lecture | Audio Equipment, Lecture | Audio Processing, Lecture | Perception, Lecture

Presentation Type Lecture

4:30pm CEST

A Parametric Dual-Channel Audio Coding via Learned Time-Frequency Masking

Thursday May 28, 2026 4:30pm - 5:00pm CEST

Aud 42

While Neural Audio Codecs (NAC) have revolutionized
monaural audio compression, achieving high-fidelity
dual-channel coding at low bitrates remains a significant
challenge. Existing approaches often rely on naive
independent channel quantization, leading to phase
incoherence, or entangled latent modeling, which sacrifices
spatial precision for spectral energy. This paper proposes
a novel dual-channel coding framework based on
contentspatial disentanglement. Reframing spatial
reconstruction as an informed source separation task, our
architecture synergizes a frozen, pre-trained DAC encoder
for robust mono content preservation with a
parameter-efficient side information encoder that predicts
fine-grained time-frequency masks. To ensure precise
spatial imaging, we introduce explicit physical constraints
into the end-to-end training. Experimental results indicate
that at low bitrates of 9; 11 kbps, the proposed method
outperforms state-of-the-art dual-mono neural baselines;
industry standards in both objective spatial metrics;
subjective MUSHRA evaluations.

Authors

Qingbo Huang

MMLab，ByteDance

Tianshu Qu

Peking University

Yihan Wang

Peking University

Yufan Qian

Peking University

Thursday May 28, 2026 4:30pm - 5:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Lecture | Audio Processing, Lecture

Presentation Type Lecture

4:30pm CEST

From Gaze to Gnosis: A Critical Framework for Embodied Audio Production

Thursday May 28, 2026 4:30pm - 5:00pm CEST

Aud 43

Audio engineering standards often present as objective, yet
they frequently rely on a systemic data bias which Perez
characterises as the 'default male bias' [1]. This paper
examines the hegemony of the male ear, a system of norms
that privileges masculine modes of hearing by prioritizing
technical structure; text over affective experience;
timbre [2]. By transitioning from a visual centric auditory
gaze toward an embodied sonic gnosis, researchers can
recover haptic; physiological ways of knowing sound.
Drawing on the feminist listening praxis of the Female Ear
[3], this work explores the recording studio as an
analytical space where sonic microaggressions [4] enforce
rigid technical standards. The author argues for a new
audio praxis that centers ear pleasures [5], validating
subjective; affective sensory data as legitimate
engineering input. This approach seeks to dismantle the
regulatory fiction [6] of a universal hearing standard,
promoting a pluralistic understanding of musicking [7] that
is inclusive of non normative perspectives.

Authors

Katie Ambrose

PhD Student, University of York

Katie is a postgraduate researcher at the University of York, working on a th...

Thursday May 28, 2026 4:30pm - 5:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Cross-Disciplinary Sound Studies, Lecture | Perception, Lecture | Recording Production and Reproduction, Lecture

Presentation Type Lecture

5:00pm CEST

Richard Heyser Memorial Lecture : From head-related transfer functions to risk of damage and hearing rehabilitation

Thursday May 28, 2026 5:00pm - 6:30pm CEST

Aud 42

This years famous Richard Heyser Momorial Lecture will be given by Professor Dorte Hammershøi from Aalborg University.

Throughout a distinguished academic career, the lecturer’s work in measuring outer ear transfer functions and headphone characteristics served not only to develop and refine methods for binaural recording and reproduction, but eventually provided a stepping stone into the field of technical audiology and hearing-aid rehabilitation. In 2026, an earphone is rarely just a sound reproduction device, and a hearing aid is rarely just a medical device. The talk will give highlights from 36 years of work in the field, and discuss what the presenter considers to be the contemporary challenges when earphones become hearing aids and vice versa. Finally, the presenter may address the challenges of creating audio-only virtual reality for blind gamers.

Moderators

Jayant Datta

Technical Counsil Vicechair, Audio Engineering Society

Worked in various fields of audio – digital mixer design at Wheatstone (broadcast), DSP at Motorola (consumer, professional), R&D and product development at THX (amplification, line arrays, automotive sound), engineering strategy as CTO of Audio Precision (test & measurement); worked... Read More →

Authors

Dorte Hammershøi

Professor, Acoustics and Hearing, AI and Sound, Department of Electronic Systems, Aalborg University

Thursday May 28, 2026 5:00pm - 6:30pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Special Events, Special Event

6:00pm CEST

Auro-3D & Dynaudio Immersive Sound Evening at Black Tornado Studios

Thursday May 28, 2026 6:00pm - 10:00pm CEST

Black Tornado Studios

Step into Immersive Sound with the High-Res Immersive Audio @ Dynaudio Reference Studio.

Join us for an evening of immersive sound.
Connect, listen and experience!

Black Tornado Studios
Refshalevej 209, 1432 København, Denmark

28 May, 2026 from 6pm

30 minutes car journey from the AES Europe 2026 Convention at DTU.

Sponsored by Auro-3D and Dynaudio

Thursday May 28, 2026 6:00pm - 10:00pm CEST
Black Tornado Studios Refshalevej 209, 1432 København, Denmark

Special Events, Special Event

Keywords Immersive, Event

6:30pm CEST

Opening Reception: Drinks, snacks and live music with vocal ensemble "Tonika".

Thursday May 28, 2026 6:30pm - 7:30pm CEST

Foyer Building 303A

This is the social start of the convention - following directly after the famous "Richard Heyser Memorial Lecture" held by Professor Dorte Hammershøiwith the title: "From head-related transfer functions to risk of damage and hearing rehabilitation"

The will be Drinks, snacks and live music with vocal ensemble "Tonika"!

Come and join us - catch up with your connections and make new connections!

Thursday May 28, 2026 6:30pm - 7:30pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Special Events, Special Event