Loading…
Schedule as of May 16, 2022 - subject to change

Default Time Zone is CEST - Central European Summer Time
You can change your view to your time zone (look for "Timezone" on the right)


LIVESTREAMS : A and B


ON DEMAND VIDEOS (previous days)
 
arrow_back View All Dates
Friday, May 29
 

8:00am CEST

Attendee Registration
Friday May 29, 2026 8:00am - 2:30pm CEST

Friday May 29, 2026 8:00am - 2:30pm CEST
Foyer Building 306 Technical University of Denmark Asmussens Alle, Building 306 DK-2800 Kgs. Lyngby Denmark

9:00am CEST

A method to synchronize dynamic media stream on heterogenous media playback devices
Friday May 29, 2026 9:00am - 9:30am CEST
Audio synchronization across heterogeneous media playback
devices is essential for delivering immersive sound
experiences in applications such as speaker group play;
multi-room audio playback. Existing synchronization
techniques predominantly rely on tightly coupled network
infrastructures; often embed a media sequence;
timestamp information to the media packet at the
transmitting source end, which restrict flexibility of
selecting the transmitting source; also compromises
robustness under dynamic network conditions. This paper
proposes a network; source independent audio
synchronization framework that eliminates dependency on
embedding media sequence; timestamps. The proposed
system employs an audio fingerprinting-based media
sequencing algorithm amongst the media playback devices
without relying on the type of transmitting source; the
network availability. A novel audio synchronization
algorithm is proposed which first determines a common
sequence start information given a dynamic media stream
from the transmitting source; then communicates the
fingerprint; timestamp amongst the media playback
devices without modifying the original audio packet
structure. Experimental results demonstrate that the
proposed approach achieves a high audio-audio
synchronization of less than 10ms across media playback
devices in a no network environment, thereby extending the
scope of immersive audio application irrespective of the
transmitting source.
Authors
AS

Avinash Singh

Samsung Research Institute, Delhi (SRID)
MS

Mohit Singh

Samsung Research Institute, Delhi (SRID)
avatar for Natasha Meena

Natasha Meena

Samsung Research Institute, Delhi (SRID)
I am working as Software developer in Samsung Research Institute India - Delhi and am responsible for development of features related to Samsung sound device’s
Friday May 29, 2026 9:00am - 9:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Exploring 2D Ambisonics by Amplitudes; Phases
Friday May 29, 2026 9:00am - 9:30am CEST
We present a spectral-like reformulation of 2D ambisonics,
enabling an alternative representation of the sound field
in terms of amplitudes; phases. We hypothesise that it
simplifies the representation; creative manipulation of
2D ambisonics, beyond encoded directional point sources.

In 2D high-order ambisonics (HOA) of order N, a sound field
can be represented as a 2π-periodic angular function as a
combination of circular harmonics (Y_m) weighted by the
coefficients (a_m) with m ∈ [-N, N]. This representation
can be reformulated in terms of N+1 amplitudes; N
phases, similarly to a Fourier decomposition.

A simple example of this representation is the ambisonic
encoder at an angle theta. Phases are then multiples of a
phase phi = theta/2π, as frequencies are multiples of a
fundamental in harmonic sounds. Therefore, the
amplitude-phase approach can draw on the field of sound
synthesis, between harmonic; inharmonic modelling.
Operations on ambisonic vectors in amplitude-phase also
rely on Fourier representation, namely the spectral
convolution of two vectors (element-wise products of the
amplitudes, element-wise sums of the phases). Spectral
convolution has vast potential in ambisonics, allowing to
represent all the usual spatial operations (geometric;
transformative) in a simple manner.

To test this approach, we are currently developing an
ambisonic synthesiser based on Faust functions running in
Max environment. We are evaluating the scope of this
representation, both theoretical; compositional,;
then attempt to expand this approach to 3D ambisonics.
Authors
avatar for Alain Bonardi

Alain Bonardi

Professor in Computer Science and Music Creation, University of Paris 8
Alain Bonardi is a Professor of Computer Science and Music Creation at Paris 8 University, where he is based in the Music Department and is a member of the Musidanse laboratory.
There, he co-directs the CICM (Center for Research in Computer Science and Music Creation) with Anne... Read More →
A

AxelChemla-Romeu-Santos

University of Paris 8
EF

Emma Frid

University of Paris 8
PG

Paul Goutmann

University of Paris 8
Friday May 29, 2026 9:00am - 9:30am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Altering the Immersive Potential: The Case of the Heilung Concert at Roskilde Festival
Friday May 29, 2026 9:00am - 9:30am CEST
Immersive audio systems are increasingly deployed in
large-scale live music contexts, yet there is limited
research addressing how immersive concerts are perceived
; experienced by audiences. This paper presents a
practice-based; ethnographically informed study of the
immersive audio design; audience experience of the band
Heilung’s concert at Roskilde Festival, staged in the Arena
Tent where a large-scale multichannel loudspeaker system
including main, surround,; overhead arrays was used.
The study combines insights in technical system design;
pre-production methods with qualitative audience research
in order to explore how immersive sound alters perception,
embodiment,; social engagement in live concerts.
Pre-production involved scaled system simulations,
reference listening positions, timing strategies,;
power-matched test environments to translate an immersive
studio mix to a festival-scale venue. During; after the
concert, audience experience was investigated through
in-depth interviews, focus group discussions, participant
observation, binaural; ambisonic recordings,;
phenomenologically inspired interview techniques.
Findings indicate that immersive audio contributes to
heightened affective engagement, bodily involvement,; a
sense of envelopment that exceeds conventional stereo
concert experiences. Audience members described the
experience as multisensory, ritualistic,; spatially
ambiguous, often lacking technical vocabulary but
emphasizing embodied; emotional responses. Importantly,
immersion was not perceived as sound alone, but as emerging
from the interaction of sound, visuals, architecture,
social presence,; narrative framing.
The paper argues that understanding immersive concerts
requires the integration of anthropological insights with
audio engineering knowledge. While technical approaches
explain how immersive sound systems operate,
anthropological perspectives are essential for
understanding how such systems are experienced,
interpreted,; given meaning by audiences. The study
contributes to the limited body of research on the effects
of immersive concert formats by examining how audiences
perceive immersion; how they ascribe meaning to
immersive sound.
Authors
avatar for Birgitte Folmann

Birgitte Folmann

Senior Associate Professor, Sonic College
Friday May 29, 2026 9:00am - 9:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Student Recording Competition Category 3: Sound for Visual Media
Friday May 29, 2026 9:00am - 10:00am CEST
Join us to hear the finalists selected for this category of
the Student Recording Competition. We will hear their
presentations and recordings, and comments and feedback
from the judges. Award and prize placements will be
announced on the last day of the convention.
Speakers
avatar for Niels Böttcher

Niels Böttcher

Sound Designer and more, Floppy Club
20+ years experience in working with sound professionally from many perspectives. My main focus has been on sound design for computer games, toys and other interactive applications. The last couple of years I have also been working in the field of UX sound design for mobility.
I h... Read More →
avatar for Ken Candelas

Ken Candelas

Metal Mastermind®, Metal Mastermind®
New York, NY
avatar for Ian Corbett

Ian Corbett

AES / Kansas City Kansas Community College / off-beat-open-hats LLC, AES
Dr. Ian Corbett is the Coordinator and Professor of Audio Engineering and Music Technology at Kansas City Kansas Community College. He also owns and operates "off-beat-open-hats LLC”, providing live sound, audio production, and recording services to clients in the Kansas City area. Highly active... Read More →
avatar for Kia Eshghi

Kia Eshghi

CUNY LaGuardia Community College, CUNY LaGuardia Community College
New York City
Friday May 29, 2026 9:00am - 10:00am CEST
Aud 31 Technical University of Denmark Asmussens Alle, Building 306 DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Immersive Sound Volume II - Authors Table
Friday May 29, 2026 9:00am - 10:30am CEST
As spatial audio moves to a consumer standard, the demand
for sophisticated design frameworks has never been higher.
This panel brings together key contributors from the
recently published "Immersive Sound Volume II: The Design
and Practice of Binaural and Multi-Channel Experiences" to
bridge the gap between theoretical research and real-world
production. The session will explore the recent and
developing trends and practices of immersive audio,
focusing on the core design principles that define modern
binaural and multi-channel workflows. Panelists will
discuss themes from the text, including current guiding
principles and system design, and the creative practice of
building immersive sound experiences. Through a mix of case
studies and technical perspectives, the authors will
provide insights into a roadmap for engineers and creators
looking to master the immersive soundstage.
Speakers
avatar for Kimio Hamasaki

Kimio Hamasaki

President, Artsridge LLC
Kimio Hamasaki, an AES Fellow, is a producer and balance engineer for music recordings, a researcher in spatial audio, an educator in audio engineering and acoustics, and a consultant in audio engineering. He has recorded and produced numerous orchestral and operatic works with the Vienna Philharmonic... Read More →
avatar for Lasse Nipkow

Lasse Nipkow

CEO, Silent Work LLC
Lasse Nipkow – 3D Audio Opinion Leader & Spatial Design Specialist Lasse Nipkow shapes the intersection of immersive audio production, spatial perception, and high-end listening environments.He develops perceptual approaches to 3D audio in collaboration with leading producers and engineers, connecting musical practice with how sound is actually experi... Read More →
avatar for Hyunkook Lee

Hyunkook Lee

Professor, University of Huddersfield
Professor
avatar for Jim Anderson

Jim Anderson

Professor, Anderson Audio New York
Jim has been the President of the AES Educational Foundation since 2020 and is a professor of recorded music with the Clive Davis Institute of Recorded Music in the Tisch School of the Arts at New York University. Jim was the Institute’s Chair from 2004 – 2008. A graduate of the... Read More →
avatar for Michael Romanowski

Michael Romanowski

Owner-Head Engineer, Coast Mastering
avatar for Agnieszka Roginska

Agnieszka Roginska

Professor of Music Technology, New York University
Professor of Music Technology
PG

Paul Geluso

Director of the Music Technolo, New York University
Authors
SS

Stefania Serafin

Department of Engineering Technology and Didactics,nTechnical University of Denmark
Friday May 29, 2026 9:00am - 10:30am CEST
Aud 41 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Use of Headphones in Stereo Mastering and 3D Recording
Friday May 29, 2026 9:00am - 10:30am CEST
Loudspeaker monitoring is the reference when audio
professionals evaluate content. Headphones are also
important quality-checking tools; and many consumers enjoy
music using “close-fitting listening devices”, as all
different flavours of headphones are known in recent
standards writing.

We discuss the two reproduction methods from perceptual,
recording and mastering perspectives; especially
differences in timbre, imaging and auditory envelopment
when listening to stereo. Applications of headphones in
recording, when setting up and trimming stereo or 3D
microphone arrays, are also practically detailed.

In the last part of the workshop, attendees are invited to
personally compare the two domains on the qualities and
applications discussed; with guided listening to audio
examples between a pair of precision nearfield monitors,
Genelec 8351B, and a pair of excellent headphones, Audeze
CRBN2.
Speakers
avatar for Stefan Bock

Stefan Bock

Managing Director, msm-studios GmbH
Stefan Bock, born 20.08.1964 in southern Germany was starting his career in 1987 as an audio engineer. After freelancing in different facilities in Munich, he co-founded msm-studios in 1991 where he was the Chief Mastering Engineer and General Manager.

He was leading msm-studios t... Read More →
avatar for Thomas Lund

Thomas Lund

Genelec Oy, Genelec Oy
Denmark
avatar for Morten Lindberg

Morten Lindberg

Engineer and Producer, 2L (Lindberg Lyd)
Recording Producer and Balance Engineer with 50 GRAMMY-nominations, 42 of these in craft categories Best Engineered Album, Best Surround Sound Album, Best Immersive Audio Album and Producer of the Year. Founder and CEO of the record label 2L. Grammy Award-winner 2020 and 2026. Immersive... Read More →
UA

Ulrike Anderson

Anderson Audio New York
avatar for Chris Berens

Chris Berens

Artist and Industry Relations, Audeze
Brand ambassador for Audeze, I love all aspects of audio production and engineering, especially immersive audio!

Friday May 29, 2026 9:00am - 10:30am CEST
Aud 49 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Music generation model based on global emotional feature perception
Friday May 29, 2026 9:00am - 11:00am CEST
The rapid development of artificial intelligence
composition technology has brought innovation to music
creation. However, current deep learning music generation
models often neglect the global correlation of emotional
features, resulting in fragmented emotional expression in
generated works; insufficient alignment with human
emotional perception, making it difficult to meet the core
demand for emotional conveyance in diverse music creation.
This study aims to propose a music generation method that
integrates a global perception mechanism for emotional
features. Taking the EMOPIA; VGMIDI preprocessed
datasets as the research objects, an improved model based
on EMelodyGen (EMelodyGen-PPO) is constructed: a GLU
network layer is introduced in the feature extraction stage
to enhance the model's ability to filter; represent
emotion-related features; an improved PPO-Clip algorithm is
integrated in the training process,; a multi-dimensional
emotional reward function is designed to achieve global
dynamic perception; optimization of emotional features.
Experimental results show that the music21 parsing rate of
the EMelodyGen-PPO model on the target dataset is 3%; 4%
higher than that of the baseline model, respectively. An
automated quality assessment system based on fluency,
rhythm stability, harmony richness, melodic smoothness,;
structural integrity verifies that the comprehensive score
of the model's generated works is significantly better than
that of the comparative model. This study provides an
efficient technical path for emotion-oriented music
generation, which can empower grassroots cultural workers
; independent musicians at low cost, facilitate diverse
music creation practices; emotional audio content
dissemination,; align with the diversity; innovative
development concept of the AES audio community.
Authors
CL

Chen Li

Wuhan Polytechnic University
HW

Heng Wang

Wuhan Polytechnic University
LC

Lingzhi Chen

Wuhan Polytechnic University
MG

Mingyan Gao

Wuhan Polytechnic University
XW

XUETING WANG

Wuhan Polytechnic University
Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

A General Model for Deepfake Speech Detection: Diverse Bonafide Resources or Diverse AI-Based Generators
Friday May 29, 2026 9:00am - 11:00am CEST
In this paper, we analyze two main factors of Bonafide
Resource (BR) or AI-based Generator (AG) which affect the
performance; the generality of a Deepfake Speech
Detection (DSD) model. To this end, we first propose a
deep-learning based model, referred to as the baseline.
Then, we conducted experiments on the baseline by which
we indicate how Bonafide Resource (BR); AI-based
Generator (AG) factors affect the threshold score used to
detect fake or bonafide input audio in the inference
process. Given the experimental results, a dataset, which
re-uses public Deepfake Speech Detection (DSD) datasets;
shows a balance between Bonafide Resource (BR) or AI-based
Generator (AG), is proposed. We then train various
deep-learning based models on the proposed dataset;
conduct cross-dataset evaluation on different benchmark
datasets. The cross-dataset evaluation results prove that
the balance of Bonafide Resources (BR); AI-based
Generators (AG) is the key factor to train; achieve a
general Deepfake Speech Detection (DSD) model.
Authors
DT

Dat Tran

FPT University
DF

David Fischinger

Austrian Institute of Technology
DA

Davide Antonutti

Austrian Institute of Technology
IM

Ian McLoughlin

Singapore Institute of Technology
KV

Khoi Vu

FPT University
LP

Lam Pham

Austrian Institute of Technology
MH

Marcel Hasenbalg

Austrian Institute of Technology
MB

Martin Boyer

Austrian Institute of Technology
S

SimonFreitter

Austrian Institute of Technology

Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Semantic Audio Encoders from EQ Parameters Alone: Effects of Training Data Composition on Limited-Data Learning
Friday May 29, 2026 9:00am - 11:00am CEST
We investigate how training data composition influences
semantic audio encoders that learn perceptual descriptors
such as "warm," "bright,"; "muddy" from equalization
(EQ) parameter datasets without labeled audio examples.
Using the SAFE-DB dataset of 1,369 labeled EQ settings, we
train audio encoders via an inverse problem formulation in
which labeled EQ parameters are applied to source audio;
the encoder is trained to recognize the resulting semantic
characteristics. Three training configurations are
compared, varying both class sampling strategy (uniform
versus balanced); source audio type (pink noise versus
real music). Despite severe class imbalance in SAFE-DB,
where 76 percent of examples are labeled "bright" or
"warm," balanced class sampling combined with mixed-source
training (50 percent pink noise; 50 percent FMA music)
successfully learns physically meaningful semantic-spectral
relationships: "warm"; "muddy" show negative correlation
with spectral centroid (r = -0.56), while "bright";
"thin" show positive correlation (r = +0.49). However,
prediction confidence decreases substantially (from 0.96 to
0.76 to 0.86),; top-1 predictions remain dominated by
the "bright" class across all evaluated music genres,
reflecting inherent dataset bias rather than training
failure. These results demonstrate that training data
composition significantly affects model calibration but
cannot fully overcome fundamental bias in the underlying
label distribution, highlighting key challenges for
semantic audio understanding systems.
Authors
Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Voice-Based Fatigue Detection for Military Personnel: A Multi-Modal Machine Learning Framework with Acoustic Feature Emphasis
Friday May 29, 2026 9:00am - 11:00am CEST
This study presents a voice-centered machine learning
framework for detecting mental fatigue in military
personnel, integrating acoustic analysis with physiological
biosensors to enhance detection robustness. Mental fatigue
poses critical safety; performance challenges in
military operations, yet cultural stigma often prevents
self-reporting. We collected multi-modal data from 23
participants across two fatigue states, extracting
comprehensive acoustic features including sound pressure
level (SPL), formants, mel-frequency cepstral coefficients
(MFCCs), jitter, shimmer, harmonic-to-noise ratio (HNR),
; temporal speech characteristics. These voice features
were combined with electroencephalography (EEG),
photoplethysmography (PPG),; temperature data to train
multiple machine learning classifiers. The voice-based
models achieved accuracies between 82-85\%, with support
vector machines (SVM); long short-term memory (LSTM)
networks demonstrating superior performance. When acoustic
features were combined with physiological markers,
classification accuracy improved to 92\%, with
Classification; Regression Trees (CART); Linear
Discriminant Analysis (LDA) emerging as top performers.
Statistical analysis identified SPL; formant variance as
the most discriminative voice features, while Lempel-Ziv
Complexity (LZC); theta/beta ratio proved most reliable
for EEG. Evaluation on new participants yielded 67\%
accuracy, revealing model generalization challenges that
inform future research directions. This work demonstrates
that voice-based machine learning systems, when augmented
with physiological data, offer a promising non-invasive
approach to real-time fatigue monitoring in operational
military environments.
Authors
CC

Claire Courchene

Applied Perception Associate Engineer, GN
I’m a creative technologist and interaction designer exploring how sound, technology, and human experience meet. With an MScEng in Sound & Music Computing, I prototype audio interactions, build ML‑driven tools, and design experiments around perception. My background spans music... Read More →
Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Exploring Perceptual; Physiological Auditory Models for Assessing Speech Intelligibility in Enhanced Signals
Friday May 29, 2026 9:00am - 11:00am CEST
Current deep learning approaches to speech enhancement rely
heavily on objective measures like mean squared error or
scale-invariant signal-to-distortion ratio as both training
objectives; evaluation metrics. While analytically
convenient, these benchmarks often fail to capture the
nuances of human perception or actual intelligibility.
Furthermore, the inconsistent integration of metrics like
Short-Term Objective Intelligibility or Perceptual
Evaluation of Speech Quality into training; evaluation
pipelines leaves a gap between algorithmic performance;
perceptual reality. This paper proposes a transition
towards evaluation methodologies grounded in
psychoacoustics; audiological modeling. Our study
explores two distinct methods to characterise enhanced
signals. On one hand, we employ a perceptual approach based
on the Cambridge loudness model to assess the preservation
of spectral excitation patterns; perceived intensity. On
the other hand, we adopt a biophysical approach by
utilising CoNNear, a convolutional model of the human
auditory periphery. This allows us to simulate
representations of responses at different stages of the
auditory periphery to observe how speech enhancement
processing affects the physiological representation of
speech. We analyse pre-trained speech enhancement models
using automatic speech recognition; Short-Term Objective
Intelligibility as an additional proxy for human
intelligibility. By mapping automatic speech recognition
performance against loudness; peripheral response
patterns, we investigate the extent to which current
enhancement strategies maintain the perceptual;
physiological integrity of the speech signal. This work
aims to identify features predictive of intelligibility,
providing a foundation for speech enhancement systems
optimised for the human listener rather than purely
signal-based objective functions.
Authors
FE

François Effa

Université de Lorraine, CNRS, Inria, Loria, Nancy, France
RS

Romain Serizel

LORIA - Laboratoire Lorrain de Recherche en Informatique etnses Applications
Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Objective Quality Models for Decision-Making in Speech Coding
Friday May 29, 2026 9:00am - 11:00am CEST
Objective quality evaluation is widely used in speech
coding, yet objective estimates often show limited
agreement with subjective listening-test results. Rather
than focusing on absolute score accuracy, this paper
evaluates objective speech quality models from a
decision-making perspective, defined as their ability to
support comparative judgments between speech codecs or
codec configurations. A formal ITU-R P.800 Absolute
Category Rating (ACR) listening test was conducted with 30
listeners across 24 conditions, covering conventional;
neural monophonic speech codecs operating under
clear-channel conditions at sampling frequencies from 16 to
48 kHz; bit rates ranging from below 1 kbps to above 16
kbps. The speech material consisted of internally recorded,
clean French-language speech that was not used in the
development or training of any of the evaluated codecs or
objective quality models. Seven objective quality models,
namely PESQ, VISQOL Speech, VISQOL Audio, WARP-Q, NISQA,
UTMOS,; DistillMOS, were evaluated on the same material.
Decision-making performance was assessed by comparing
subjective; objective rankings using Kendall’s rank
correlation coefficient; by analyzing pairwise codec
comparisons using t-tests at a 95% confidence level. The
results show that some objective quality models are
effective for comparing bit rate variations within a given
speech coding technology, provided that all other codec
parameters remain unchanged (e.g., sampling frequency).
However, all models exhibit limitations, including
tendencies toward over- or underestimation for certain
technologies, as well as reduced reliability when applied
across different sampling frequencies. Despite its
conventional origins, PESQ remains capable of supporting
decision-making even when applied to neural speech codecs.
Authors
CL

Clémence Lamballe

Universite de Sherbrooke
PG

Philippe Gournay

Universite de Sherbrooke
RL

Roch Lefebvre

Universite de Sherbrooke
Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

The Ambisonic Denoising Paradox: U-Net Processing Degrades ASR Transcription Quality for Medical Speech
Friday May 29, 2026 9:00am - 11:00am CEST
Spatial audio recording using higher-order Ambisonics
offers rich directional information for medical speech
capture, yet challenging hospital acoustic environments
motivate preprocessing with neural denoising algorithms.
This study investigates whether U-Net-based denoising of
third-order ambisonic recordings improves automatic speech
recognition (ASR) quality for medical applications. We
developed the Medical Immersive Audio Corpus (MIAC),
comprising 1,759 utterances (6.43 hours) of Polish medical
speech recorded with a Zylia ZM-1 microphone in
uncontrolled hospital environments, capturing 16-channel
third-order Ambisonics across multiple specializations
including thyroid ultrasonography, surgical procedures,;
general diagnostics. We applied a U-Net architecture with
dual attention mechanisms trained using the Noise2Noise
paradigm to denoise the corpus, then evaluated
transcription quality using ten Whisper ASR models ranging
from 39 million to 1.55 billion parameters, including
domain-adapted medical variants. Surprisingly, we
discovered a "noise reduction paradox" where denoising
degraded transcription quality for seven of ten models,
with statistically significant increases in Word Error Rate
(WER); Character Error Rate (CER) for general-purpose
base, small,; medium models. Only the domain-adapted
whisper-medium-68000-abbr model showed statistically
significant improvement (p=0.0008), while large-scale
models (large-v2, large-v3) exhibited robustness with
negligible changes. Effect sizes remained small (Cohen's d
< 0.2) across all models. These counterintuitive findings
suggest modern ASR systems implicitly utilize background
noise characteristics as informative features,; that
preprocessing pipelines should be reconsidered for
domain-specific applications. Our results provide practical
guidance for medical speech processing system design.
Authors
avatar for Bartlomiej Mroz

Bartlomiej Mroz

Assistant Professor, Gdańsk University of Technology
PhD, Spatial Audio & Immersive Media Researcher, Recording Engineer, Statistics enthusiast
SZ

Szymon Zaporowski

Gdańsk University of Technology
Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

A perceptual evaluation of various commercial models of music source separation, with a focus on model performance against non-traditional source material
Friday May 29, 2026 9:00am - 11:00am CEST
Music source separation (MSS) systems are commonly used in
production, remixing,; audio analysis work, yet
questions arise regarding the extent that objective
evaluations of model performance align with human
perceptual evaluations, particularly when tasked with
non-traditional source material (in this case, heavily
processed electronic music). This study seeks to set a
framework for an evaluation of 3 machine learning
approaches to MSS: a spectrogram-domain model (spleeter), a
waveform-domain model (Demucs v2),; a hybrid-domain
model (HTDemucs). Subjective evaluations of model
performance were accumulated via a MUSHRA-style listening
test, while objective evaluations were assessed using
signal-to-distortion ratio (SDR); Frechet Audio Distance
(FAD). Results showed consistent agreement across objective
metrics, with the hybrid-domain model outperforming the
other singular-domain models. Perceptual ratings also
favored the hybrid model, with listeners occasionally
rating the model output as equal or better quality than the
original reference, interestingly. Preliminary analysis
indicates some moderate but insignificant correlations
between the two assessment paths, reinforcing concerns
about relying solely on numerical evaluations when
discussing MSS model performance. Implications for model
design; future evaluation procedures are discussed.
Authors
avatar for Sahan Wijewardane

Sahan Wijewardane

University of Miami
Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Automating sound design for adaptive video game narration
Friday May 29, 2026 9:00am - 11:00am CEST
HAMLET is a research project that investigates the
integration of Artificial Intelligence; co-creation
practices within the creative industries. The project
proposes AI-driven enablers to support artists through
collaborative workflows between creative practitioners;
technology providers. This work focuses on an automated
sound design framework for text-based role-playing games,
where the game narration is dynamically generated through
player textual interaction with an LLM. To address this
unpredictability, the proposed system generates adaptive
soundscapes automatically from textual scene descriptions.
An LLM identifies semantically relevant sound sources,
which are then matched to audio libraries through metadata
alignment. The files are assessed for quality,; are fed
to an automated mixing module. The framework addresses
challenges related to semantic alignment, audio quality,
aesthetic balance,; file size constraints.
Authors
CD

Charalampos Dimoulas

Aristotle University of Thessaloniki
GK

George Kalliris

Aristotle University of Thessaloniki
LV

Lazaros Vrysis

Aristotle University of Thessaloniki
ME

Marina Eirini Stamatiadou

Aristotle University of Thessaloniki
avatar for Nikolaos Vryzas

Nikolaos Vryzas

Aristotle University of Thessaloniki
Dr. Nikolaos Vryzas was born in Thessaloniki in 1990. He studied Electrical & Computer Engineering in the Aristotle University of Thessaloniki (AUTh). After graduating, he received his master degrees on Information and Communication Audio Video Technologies for Education & Production from the Interdepartme... Read More →
Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Poster Session 2
Friday May 29, 2026 9:00am - 11:00am CEST
- Music generation model based on global emotional feature perception


- A General Model for Deepfake Speech Detection: Diverse Bonafide Resources or Diverse AI-Based Generators


- Semantic Audio Encoders from EQ Parameters Alone: Effects of Training Data Composition on Limited-Data Learning


- Voice-Based Fatigue Detection for Military Personnel: A Multi-Modal Machine Learning Framework with Acoustic Feature Emphasis


- Exploring Perceptual; Physiological Auditory Models for Assessing Speech Intelligibility in Enhanced Signals


- Objective Quality Models for Decision-Making in Speech Coding


- The Ambisonic Denoising Paradox: U-Net Processing Degrades ASR Transcription Quality for Medical Speech


- A perceptual evaluation of various commercial models of music source separation, with a focus on model performance against non-traditional source material


- Automating sound design for adaptive video game narration
Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Posters Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Exhibit Hall
Friday May 29, 2026 9:00am - 5:00pm CEST

Friday May 29, 2026 9:00am - 5:00pm CEST
Aud 36 Technical University of Denmark Asmussens Alle, Building 306 DK-2800 Kgs. Lyngby Denmark

9:30am CEST

Design; analysis of sound insulation soft-solid metamaterials with periodic inclusions
Friday May 29, 2026 9:30am - 10:00am CEST
One of the many applications of acoustic metamaterials is
the ability to substantially improve acoustic insulation in
the low-frequency range compared to traditional materials.
The objective of this study was to investigate a
vibroacoustic metamaterial consisting of a soft solid plane
with embedded inclusions. The analysed structure consists
of a porous layer with periodic solid elements, which
allows for enhanced insulation properties. A numerical
model considering interactions between the acoustic domain
; a solid was developed using COMSOL Multiphysics. The
influence of selected material; geometric parameters,
such as the shape of the inclusions; their placement, on
the overall effectiveness of the structure was analysed.
Based on the simulation results, a variant of the structure
was selected; used to create a prototype of the
metamaterial. The acoustic insulation of the constructed
structure was then measured in the diffuse field. The next
step is to conduct an optimization using the PSO algorithm
in order to find geometry of the structure that can achieve
the most favourable results in the selected frequency
range. The optimized structure will then be validated by
creating an additional sample; conducting another
measurement.
Authors
AM

Agata Maciuszek

Department of Mechanics and Vibroacoustics, AGH Universitynof Cracow, Poland
KC

Klara Chojnacka

Department of Mechanics and Vibroacoustics, AGH Universitynof Cracow, Poland
Friday May 29, 2026 9:30am - 10:00am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:30am CEST

Generate 4pi SBA reverberation from virtual sound sources detected from x-y-z sound intensities. -Improvement of the source detection method.
Friday May 29, 2026 9:30am - 10:00am CEST
Generating 4pi acoustical atmosphere of a target space is
important for creating an immersive sound content. A
SBA-based reverb is a useful tool for this purpose. We
developed VSVerb, a SBA reverb that generates 4pi
reverberation from the virtual sound sources detected from
three orthogonal x-y-z sound intensities measured at the
target space. A virtual sound source, also known as a
mirror source, is an acoustic concept in geometrical
acoustics. According to this theory, many virtual sound
sources are considered to be located outside the room;
provide reflection sounds inside the room. Since the
spatial information of virtual sound sources is a kind of
fingerprint of a room’s reverberant characteristics,
correctly sampled virtual sound sources enables us to
recreate room's reverberation precisely.
Several methods have been proposed for detecting virtual
sound sources of a room, i.e., dominant reflection sounds
in a room, by using the spatial room impulse responses
(SRIRs). However, these methods have the disadvantage of
failing to detect small virtual sound sources that provide
late reflections, because they detect sources by focusing
on the peak amplitude values in SRIRs. It is difficult to
distinguish if a small peak in the latter part of SRIR
indicates the reflection or noise component. Additionally,
in low-band analysis, side robes of the band pass filter
add many large peaks to the SRIRs,; they make it
difficult to detect true reflection peaks.
To overcome these disadvantages of the conventional
methods, we developed a method that detects virtual sound
sources without using the amplitude characteristics of
SRIRs. We call this method “Speed Detection.” This method
detects virtual sound sources based on the spatial moving
speed of the sound intensity. Instead of measuring SRIRs of
the sound pressure, we measure SRIRs of x-y-z instantaneous
sound intensities. Since we can assume that the reflection
sound comes from a “certain-sized” virtual sound source
over a “certain period,” the sound intensity provided by
the virtual sound source is considered to remain within a
small area; move slowly while the source emits the
reflection sound. We focused on this behavior of sound
intensities; developed the new detection method.
First, we identify the portions of the sound intensity that
move slowly; isolate them as the “Source intensity.”
Then, we calculate the positions, strengths,; phase
characteristics of the virtual sound sources from these
Source intensities of the x, y,; z directions. We
examined Speed Detection method by generating several types
of 4pi reverbs from the virtual sound sources detected
using this method,; verified that it works well in many
cases. However, we have also found that it does not always
work well. We have realized the necessity of improving the
threshold value for classifying sound intensity into the
source intensity or other.
We have used to classify sound intensities into source
intensities; others by referring a threshold value,
vt=40(1000t+10)^1.5 [m/s], where t indicates the arrival
time [s] of the sound intensity. This equation is based on
our practical experience, rather than scientific facts. It
works well in most cases, but some adjustments are required
in very rare cases. To apply the threshold value to various
acoustical conditions of the target spaces, we propose
switching the threshold value from our conventional
equation to an averaged value using a time-varying time
window. To examine the newly proposed threshold value, we
conducted experiments on detecting virtual sound sources of
a simple rectangular room. The results demonstrated the
validity of the new threshold value. We expect this new
threshold value to improve the sound quality of VSVerb;
V2MA as well.
Authors
AO

Akira Omoto

Kyushu University / ONFUTURE Ltd.
avatar for Masataka Nakahara

Masataka Nakahara

President / Senior Managing Director, ONFUTURE Ltd. / SONA Corp.
Masataka Nakahara is an acoustician specializing in studio acoustic design and room acoustics R&D. After studying acoustics at the Kyushu Institute of Design, he began his professional career as an acoustic designer at SONA Corporation. He earned his Ph.D. in acoustics from Kyushu... Read More →
Friday May 29, 2026 9:30am - 10:00am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:30am CEST

Who Controls the Space? Artistic Intent; Sound Diffusion in Immersive Concert Performance
Friday May 29, 2026 9:30am - 10:00am CEST
Recent advances in large-scale multichannel loudspeaker
systems have enabled immersive concert formats that extend
spatial control beyond conventional stereo; small
multichannel configurations. High-density loudspeaker
arrays (HDLAs) allow sound to be distributed across complex
architectural spaces, challenging established distinctions
between composition, performance,; live sound practice.
In live contexts, however, the realization of spatial
attributes is often constrained by system complexity,
limited rehearsal time,; the lack of artist-facing
spatial control interfaces. As a result, spatial
realization; sound diffusion are frequently delegated to
sound engineers, who translate artistic material to the
acoustic; architectural conditions of the venue in real
time.

This paper examines three immersive concerts presented
during Sonic Days 2025 in Denmark, realized on both
large-scale; small-scale multichannel loudspeaker
systems. The concerts represent contrasting production
contexts, including a site-specific spatial composition
conceived explicitly for a high-density loudspeaker array
; performances by artists whose practices are typically
oriented toward stereo or small multichannel formats.
Across these cases, spatialization functioned variously as
compositional material, interpretive layer,; adaptive
live-mixing practice.

The paper analyzes how control over spatial attributes is
negotiated between artists; sound engineers in live
immersive concert settings,; how this negotiation
affects the interpretation of artistic intent; audience
experience. Particular attention is given to the role of
sound engineers as active mediators whose decisions shape
spatial form, listening perspective,; the relationship
between sound; architecture. The findings suggest that
immersive concert formats redistribute creative agency
across artists, technicians,; technological
infrastructures,; point toward the need for revised
conceptual frameworks for authorship, performance,;
listening in large-scale spatial audio environments.
Authors
avatar for Kasper Fangel Skov

Kasper Fangel Skov

Assistant Professor, PhD, Sonic College (UC SYD)
Friday May 29, 2026 9:30am - 10:00am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:00am CEST

Designing Music Spaces in Educational Buildings: Challenges; Considerations
Friday May 29, 2026 10:00am - 10:30am CEST
The acoustic design of music rooms is well supported by
existing guidances which are covering recording spaces,
practice rooms, green rooms,; large-scale performance
environments. However, the direct application of these
standards to high school; college buildings is often
constrained by limitations in budget, space, client
requirements; construction timelines. As a result,
educational music spaces present various design challenges
that require specially considered solutions. This paper
examines key architectural; acoustic issues for music
teaching; performance spaces in high schools, including
wall performance between non-compatible spaces, limited
room volumes,; other acoustic challenges, i.e.
interconnecting doors; windows between the spaces. A
case study of a good design implemented at the large school
project is presented to demonstrate how strategic planning
; interdisciplinary coordination can result in
high-quality, functional,; acoustically successful
learning environments. It is highlighted that the
collaboration between the design team; acoustic
consultants was the key to resolve the major project
challenges to achieve the best possible performance results
across all spaces.
Authors
EP

Elena Prokofieva

Edinburgh Napier University
Friday May 29, 2026 10:00am - 10:30am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:00am CEST

Detecting Bandwidth Variation Artifacts in Perceptual Audio Coding
Friday May 29, 2026 10:00am - 10:30am CEST
Accurate identification of audio coding artifacts is
instrumental in encoder design, audio post-processing,;
perceptual quality assessment. This paper addresses the
detection of artifacts arising from changes in the
effective bandwidth of coded audio signals caused by coarse
spectral quantization. Such bandwidth variations give rise
to two prominent artifact types: bandwidth limitation (BL)
; birdies, also referred to as spectral islands (SI).
Blind detection methods, requiring no reference signal, are
presented for both artifact types. Bandwidth limitation
is detected by analyzing variations in the zero-crossing
count across time-domain subband signals, enabling
estimation of both fixed; time-varying cutoff
frequencies. Spectral islands are identified through
analysis of the spectrogram by detecting clusters of
isolated components in the time–frequency domain,
characterized by their temporal; spectral extents. The
proposed methods are evaluated using audio material from
the ODAQ; USAC verification datasets. Results show that
the BL detection method achieves an average bandwidth
estimation error of approximately 160 Hz; demonstrates
robustness to noisy bandwidth-limited signals. In addition,
the detected birdie artifacts are perceptually validated
through listening tests, indicating an improvement in
perceived quality following detection; subsequent
suppression of the birdie artifacts.
Authors
AN

Andreas Niedermeier

Fraunhofer IIS, Erlangen

BE

Bernd Edler

International Audio Laboratories Erlangen, Germany
DD

Dipanjan Datta Roy

International Audio Labs, Erlangen
avatar for Sascha Dick

Sascha Dick

Fraunhofer IIS, Fraunhofer IIS, Erlangen
Germany
Friday May 29, 2026 10:00am - 10:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Audio Processing, Lecture

10:00am CEST

Predicting Sonic Atmospheres - Expectation; Attunement
Friday May 29, 2026 10:00am - 10:30am CEST
Soundscapes; sonic atmospheres are often approached as
environmental conditions perceived; evaluated through
their acoustic properties; affective qualities. Recent
predictive; inferential accounts of perception, however,
suggest a different understanding: that perception operates
as an anticipatory process in which sensory input is
primarily used to minimise error in an ongoing predictive
model of the world, rather than to construct experience
from the bottom up. From this perspective, auditory
perception is an active, temporally extended process shaped
by expectation, memory, attention,; action.

This paper explores what such a predictive understanding
contributes to the study of everyday sonic atmospheres.
Drawing on predictive processing as a conceptual
framework—while acknowledging its contested status—the
paper situates auditory perception alongside other sensory
modalities as part of a broader inferential engagement with
environments. Classical auditory phenomena; longer-term
perceptual “illusions” motivate this reframing by
highlighting how expectations shape experience across
multiple timescales.
The main analytical focus is the case of transitioning from
one atmosphere to another. Atmospheres are approached here
as multimodal, quasi-objective phenomena that do not reside
in sound, space, or subjects alone, but emerge through
shared, situated engagement. Transitions foreground this
process by exposing how expectations, attentional
strategies,; perceptual norms are recalibrated over
time. From a predictive perspective, atmospheres are
constituted through collective anticipatory activity, in
which agents continuously negotiate environmental cues;
affordances across sensory modalities. Attunement is thus
understood as a temporally extended, socially coordinated
process shaped by prior experience; anticipated action.
By analysing atmospheric transitions through a predictive
lens, the paper argues that sonic atmospheres can be
understood as dynamically constituted; reconfigurable
achievements. This reframing challenges object-centred or
purely subjective accounts of atmospheres; opens new
ways of thinking about how sonic environments are shaped,
staged,; transformed in everyday life.
Authors
avatar for Jonas Kirkegaard

Jonas Kirkegaard

Lecturer & Internship coordinator, UC SYD
BIO: Jonas R. Kirkegaard (1982) is a danish sound artist, composer and sound designer working in the field of interaction design, sound installations, multi channel composition and designing “place specific” atmospheres through sound. Upon replacing nano science with music studies back in 2005, he now... Read More →
Friday May 29, 2026 10:00am - 10:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:00am CEST

Introduction to Active Acoustics Systems in Recording Studios
Friday May 29, 2026 10:00am - 11:00am CEST
Virtual acoustics and active acoustic systems are
increasingly used in architectural acoustics to extend the
acoustic response of performance spaces. While these
technologies have traditionally been associated with
concert halls, theaters, and multipurpose venues, their
application has recently expanded to more controlled
environments such as recording studios and music production
spaces.

This tutorial introduces the fundamental principles of
virtual acoustics implemented through active acoustic
systems, starting from their role in architectural
acoustics and room acoustics enhancement. Basic concepts
such as room impulse responses, acoustical parameters,
system architectures, and feedback control strategies are
presented at an introductory level, with emphasis on common
practices and practical limitations. The discussion then
progressively narrows to the specific case of recording
studios, where virtual acoustics are used not only to
simulate performance spaces, but also to influence musical
performance, comfort, and interaction during recording
sessions, including the use of immersive microphone
techniques.

Through practical examples and listening demonstrations
developed at the Immersive Medial Laboratory in the
Department of Music Research of McGill University, the
tutorial illustrates how different virtual acoustic
conditions can be designed and applied in studio contexts,
highlighting their perceptual effects and implications for
musicians, recording engineers, and producers. The tutorial
aims to provide attendees with a clear conceptual framework
and practical insight into how virtual acoustics and active
sound reinforcement systems can be effectively employed
across architectural and studio applications, preparing the
audience for more advanced technical discussions on these
topics.
Speakers
avatar for Gianluca Grazioli

Gianluca Grazioli

Montreal, Canada, McGill University
Friday May 29, 2026 10:00am - 11:00am CEST
Aud 31 Technical University of Denmark Asmussens Alle, Building 306 DK-2800 Kgs. Lyngby Denmark

10:00am CEST

TC-MSQ : AES Technical Committee on "MEASUREMENT & SOUND QUALITY"
Friday May 29, 2026 10:00am - 11:00am CEST
AES Technical Committee on "MEASUREMENT & SOUND QUALITY"



The AES Technical Committees (TC) lead the Society's involvement in science and technology, and are a hub of networking, knowledge and expertise. Each TC specializes in a specific area of audio, and helps forge links between each of these areas and the society as a whole.  Connect and engage!
Speakers
Friday May 29, 2026 10:00am - 11:00am CEST
Aud 93 Technical University of Denmark Asmussens Alle, Building 302 DK-2800 Kgs. Lyngby Denmark

10:00am CEST

Student Mix Critiques 2
Friday May 29, 2026 10:00am - 11:00am CEST
These sessions are an opportunity for AES student members
to receive feedback on their mixes from a panel of industry
professionals, in a live, non-competitive setting. Join us
to hear mixes by other students, and get tips, tricks, and
advice to push your skills to the next level! Mixes can be
submitted in advance by following the instructions are
posted at:
https://www.aesstudents.org/competitions/student-mix-critiques/
Very limited on-site submission may also be possible on
site. Maybe one of your mixes can be featured!
Speakers
avatar for Ian Corbett

Ian Corbett

AES / Kansas City Kansas Community College / off-beat-open-hats LLC, AES
Dr. Ian Corbett is the Coordinator and Professor of Audio Engineering and Music Technology at Kansas City Kansas Community College. He also owns and operates "off-beat-open-hats LLC”, providing live sound, audio production, and recording services to clients in the Kansas City area. Highly active... Read More →
Friday May 29, 2026 10:00am - 11:00am CEST
Building 302, 2nd floor Technical University of Denmark Asmussens Alle, Building 302 DK-2800 Kgs. Lyngby Denmark

10:30am CEST

Spatial Estimation of Room Acoustic Parameters using Sound Field Reconstruction Methods
Friday May 29, 2026 10:30am - 11:00am CEST
The acoustic characterisation of indoor spaces is crucial
for a wide range of applications. While global metrics
provide convenient descriptors of a room's overall
behaviour, a more spatially detailed analysis offers deeper
insight into the spatio-temporal structure of the sound
field, albeit at a higher experimental cost. This paper
proposes a methodology that leverages the predictive
capabilities of sound field reconstruction methods to
estimate room acoustic parameters as a function of
position. The approach is experimentally evaluated in an
auditorium, where it achieves accurate estimation of
temporal; energetic room acoustic parameters across the
entire audience area. In addition, the reconstructed field
yields higher intelligibility indices compared to the raw
measurements. Overall, these results highlight the
potential of sound field reconstruction techniques as a
practical tool for room acoustic characterisation; for
supporting assistive listening technologies.
Authors
avatar for Antonio Figueroa-Duran

Antonio Figueroa-Duran

Universidad Politécnica de Madrid
EF

Efren Fernandez-Grande

Universidad Politécnica de Madrid
Friday May 29, 2026 10:30am - 11:00am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:30am CEST

Lossless Audio Coding revisited
Friday May 29, 2026 10:30am - 11:00am CEST
MPEG-4 SLS (scalable lossless coding) was published more
than 20 years ago. In the meantime several tools to improve
coding efficiency; flexibilities have been invented.
Currently, in MPEG WG6 (audio coding) there are two
standardization activities on lossless audio coding: Audio
Coding for Machines (ACoM); Biomedical; general
waveform signal coding (BWC).
ACoM phase 1 originally was targeted only towards lossless
storage formats for training of machine listening schemes,
but additional uses cases like “user generated content
analysis”, “live stream content analysis”,; “artistic
creation” have been added. The focus was extended to the
transmission of audio data from microphone (arrays) to
central processing units.
BWC is a joint activity with TU-R SG21. While ACoM started
with a large number of use cases; includes the
specification of a rich set of metadata BWC started with a
focus on medical data like electroencephalogram (EEG);
electrocardiogram (ECG). However, BWC can be used for audio
signals, too; medical data coding are on the list of use
cases for ACoM.
The call for proposals (CfP) for ACoM was completed in
January 2025. Two proposals, both outperforming MPEG-4 SLS,
had been submitted. Both proposals reused; optimized
core codecs from BWC. Currently, MPEG audio investigates
how the ACoM proposals can be merged into BWC. This merge
process must be completed end of April 2026.
The presentation will give details about ACoM use cases,
the ACoM CfP process, the results of the CfP; results
from the merge process.
Authors
avatar for Thomas Sporer

Thomas Sporer

Deputy Director IDMT / Convenor MPEG audio, Fraunhofer IDMT
Friday May 29, 2026 10:30am - 11:00am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:30am CEST

The cognition of sound in museums: Toward a spectrum of meanings
Friday May 29, 2026 10:30am - 11:00am CEST
This presentation develops a conceptual framework for
understanding how visitors cognize sound in museum
exhibitions. While sound increasingly features in museum
practice, research has focused primarily on measuring
visitor enjoyment; engagement rather than examining the
specific meanings sound generates. This gap reflects the
absence of a framework conceptualizing sound's
meaning-making capacities to guide empirical investigation.
Drawing on scholarship from music studies, semiotics,
phenomenology,; embodied cognition, I propose a
seven-component spectrum identifying distinct yet
interrelated meanings that sound can convey in museums:
aesthetic, representational, emotional, sensorial,
imaginative, social,; political. These meanings can be
apprehended independently or in combination, typically
through emergent, pre-conscious perception rather than
deliberate awareness.
The spectrum builds on the premise that museum sound
meaning-making unfolds through dynamics internalized from
early childhood as we attune to the world sonically. It
draws on the notion of sound as a "sonic aggregate"
(Grimshaw; Garner 2015)—encompassing social, contextual,
temporal,; embodied experiences—rather than reducing
sound to wave phenomena. Visitors actively co-produce
meanings by drawing on their moods, memories, knowledge,
; imagination during exhibition encounters.
Each meaning category is illustrated with exhibition case
studies, demonstrating the spectrum's applicability across
diverse sound-based multimodal museum practices—from
popular music exhibitions to sound art installations. The
spectrum aims to catalyze research through varied
methodological approaches; establish analytical
standards for studying sound in museums, with potential
adoption by international standardization bodies.
Authors
avatar for alcina cortez

alcina cortez

Sound Studies Researcher, INET-md | NOVA University lisbon
A PhD in ethnomusicology and museum studies and a curator, I am committed to exploring the diverse meaning-making capabilities of sound when exhibited in museums, encompassing the representational, emotional, sensorial, and social, as well as its ability to foster imagination and... Read More →
Friday May 29, 2026 10:30am - 11:00am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:30am CEST

Credibilitizing in Immersive Audio: Impact Panel
Friday May 29, 2026 10:30am - 12:00pm CEST
Join leaders in immersive audio who work with affinity
groups: women and miniorities who are building social
capital in audio in unexpected ways. Following the
"Credibilitizing" framework proposed by Dr Leslie
Gaston-Bird, the panel discussions how role models,
networking, and the right kind of mentoring opportunities
are helping people from underrepresented groups upskill,
innovate, and find new purpose and career pathways in
immersive audio.
Speakers
avatar for Rebekah Wilson

Rebekah Wilson

CEO, Source Elements
As a New Zealand-native living in the northern hemisphere, Rebekah understands very well how important it is to be connected even when separated by oceans; there is no excuse for creativity to suffer just because of where we are in the world. Her background as composer, software and... Read More →
avatar for Aline Bruijns

Aline Bruijns

CEO & Sound designer, AudioRally Sounddesign
Aline has a never ending drive to create sound design that enriches a soundtrack and that elevates the story. As a jazz vocalist (and multi-instrumentalist), she obtained her Bachelor’s Degree at the Conservatory in Enschede (the Netherlands) in 2004. She continued at the HKU (school... Read More →
avatar for Leslie Gaston-Bird

Leslie Gaston-Bird

Audio Engineering Society, City St George's University of London
Dr. Leslie Gaston-Bird (AMPS, MPSE) is Past President of the Audio Engineering Society and author of the books "Women in Audio", part of the AES Presents series and published by Focal Press (Routledge); and Math Fundamentals for Audio (A-R Editions). She is a voting member of the... Read More →
avatar for Agnieszka Roginska

Agnieszka Roginska

Professor of Music Technology, New York University
Professor of Music Technology
avatar for Banu Sahin

Banu Sahin

Founder | Technical Consultant & Application Engineer, Sonostein
Banu Sahin has been involved in a wide range of projects from small-scale venues to large open-air events with a focus on immersive audio applications. She supported various artists and productions through spatial audio system design and operation. She received her M.Sc. in Music... Read More →
PO

Phebean Oluwagbemi

Audio Girl Africa
Friday May 29, 2026 10:30am - 12:00pm CEST
Aud 41 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Immersive Audio, Panel
  • Presentation Type Panel

10:30am CEST

What Makes Binaural Audio Truly Immersive?
Friday May 29, 2026 10:30am - 12:00pm CEST
Binaural audio delivered over headphones is currently the
most widely used method for experiencing immersive sound.
However, significant challenges remain in achieving the
highest possible quality of immersive listening through
current binaural delivery systems. Numerous technical
factors influence the listening experience, including the
personalisation of head-related transfer functions (HRTFs),
room simulation, head tracking, headphone type,
equalisation, etc. Beyond these technical considerations,
however, content types, production practices, and user
expectations and preferences also play critical roles in
shaping perceptual outcomes. Binaural audio is often
treated primarily as a post-processing technique for
simulating loudspeaker reproduction. Yet binaural
approaches also offer substantial creative opportunities
when considered from the earliest stages of content
production. This panel workshop will discuss these topics,
bringing together perspectives from creative practice,
research and development. The session will explore how
immersive listening experiences can be more effectively
designed, produced, and evaluated using binaural audio
technologies.
Speakers
avatar for Hyunkook Lee

Hyunkook Lee

Professor, University of Huddersfield
Professor
avatar for Stefan Bock

Stefan Bock

Managing Director, msm-studios GmbH
Stefan Bock, born 20.08.1964 in southern Germany was starting his career in 1987 as an audio engineer. After freelancing in different facilities in Munich, he co-founded msm-studios in 1991 where he was the Chief Mastering Engineer and General Manager.

He was leading msm-studios t... Read More →
avatar for Katarzyna Sochaczewska

Katarzyna Sochaczewska

Immersive Music Producer, Researcher, University of York

avatar for Morten Lindberg

Morten Lindberg

Engineer and Producer, 2L (Lindberg Lyd)
Recording Producer and Balance Engineer with 50 GRAMMY-nominations, 42 of these in craft categories Best Engineered Album, Best Surround Sound Album, Best Immersive Audio Album and Producer of the Year. Founder and CEO of the record label 2L. Grammy Award-winner 2020 and 2026. Immersive... Read More →
avatar for Chris Berens

Chris Berens

Artist and Industry Relations, Audeze
Brand ambassador for Audeze, I love all aspects of audio production and engineering, especially immersive audio!

Friday May 29, 2026 10:30am - 12:00pm CEST
Aud 49 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

11:00am CEST

Acoustic and Perceptual Consequences of Time Misalignments in Line Array Speakers
Friday May 29, 2026 11:00am - 11:30am CEST
Variable‑curvature line arrays achieve their intended directivity and spectral balance through phase‑coherent summation across cabinets. Even small timing disparities between elements perturb the interference patterns that shape the array response, with consequences for both spatial coverage and timbre. In this work we quantify these effects end‑to‑end. Using simulations for a typical 12‑element array, we examine how inter‑element delays modify the frequency response across an audience area. We then apply an auditory coloration model to predict the perceived impact of those modifications and validate the predictions through controlled listening tests. We observe that delays of a few dozen microseconds generate pronounced spectral coloration that listeners consistently judge as degraded quality, whereas coloration becomes detectable at delays on the order of one microsecond. These results translate into synchronization accuracy targets for high‑fidelity line‑array deployments.
Authors
avatar for Nicolas Epain

Nicolas Epain

Application Research Engineer, L-Acoustics
Friday May 29, 2026 11:00am - 11:30am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

11:00am CEST

Experimental; Numerical Design of Vibroacoustic Metamaterials for Guitar Soundboard Resonance Control
Friday May 29, 2026 11:00am - 11:30am CEST
Metamaterials are engineered structures whose acoustic;
mechanical behavior arise from their geometric
configuration; internal architecture rather than their
material properties. Within this group, vibroacoustic
metamaterials offer the ability to influence elastic wave
propagation by introducing frequency bands in which
flexural vibrations are either suppressed or selectively
altered. The integration of such structures into musical
instruments, particularly acoustic guitars, provides a
promising approach to shaping their vibroacoustic response
; mitigating undesirable structural resonances.
The objective of this project is to design a vibroacoustic
metamaterial capable of modifying the resonance properties
of an acoustic guitar soundboard. For this purpose,
vibration measurements with Laser Doppler Vibrometer were
conducted to identify the fundamental resonant modes of the
soundboard. Based on these measurements, a coupled
structural-acoustic numerical model was developed using
COMSOL Multiphysics; subsequently calibrated with the
experimental data. In the following phase, various
vibroacoustic metamaterial configurations were designed,
; their influence on the resonance characteristics of the
soundboard was investigated. The most effective resonator
design was fabricated using 3D printing; its performance
was experimentally evaluated.
The anticipated outcome of this research is the development
of an effective method for tailoring; enhancing the
tonal response of an acoustic guitar without modifying its
conventional construction, thereby contributing to new
design strategies for stringed musical instruments.
Authors
AS

Aleksandra Sawczuk

AGH University of Krakow
KC

Klara Chojnacka

Department of Mechanics and Vibroacoustics, AGH Universitynof Cracow, Poland
Friday May 29, 2026 11:00am - 11:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

11:00am CEST

Technologies of Everyday Life: “Yupiter”; the Formation of a Personal Acoustic Environment in the Ukrainian SSR
Friday May 29, 2026 11:00am - 11:30am CEST
This article examines open-reel tape recorders marketed
under the “Yupiter” brand as a key technology of everyday
life in late Soviet Ukraine; as a material foundation
for the formation of a personal acoustic environment in the
Ukrainian SSR. The study aims to reconstruct the
“biography” of the device, including its design, serial
production ramp-up, distribution,; use. It shows how the
institutional constraints of a planned economy;
defense-sector priorities were translated into domestic
regimes of listening; recording. Methodologically, the
article combines approaches from sound studies, the history
of technology,; the history of everyday life,
supplemented by concepts of the “domestication” of
technology, DIY culture,; “phonographic labor.” The
source base includes internal documents of the Kyiv
“Kommunist” plant (annual reports, explanatory memoranda,
plans,; quality-related materials for 1968-1976),
interdepartmental reviews; programmatic materials of the
sector, technical handbooks; instructions, as well as
oral interviews with users. Bringing together the “upper”
level of managerial reporting; the “lower” level of user
experience makes it possible to identify a gap between
quality as a planning category; quality as a daily
practice: repairability, shortages of parts; tape,
re-recording,; selective choice of media were more the
norm than the exception. The article demonstrates that the
“fine-tuning” of tape recorders became institutionalized
through networks of amateur knowledge; informal service,
while fluctuating availability (shortage; overstock)
shaped the social geography of purchase. Ultimately,
“Yupiter” emerges not as a symbol of progress or nostalgia,
but as a material trace of late-socialist modernization -
one that helps integrate the Ukrainian case into
international debates on media materiality, listening,;
the politics of audibility. Particular attention is paid to
the temporality of the object: the extension of “Yupiter’s”
normative life cycle through repair; re-recording, as
well as its “outliving” of the Soviet system in the 1990s.
This makes it possible to interpret the tape recorder as a
carrier of acoustic memory; an indicator of social
hierarchies of access to technology. The findings refine
the understanding of shortage not as mere lack, but as an
everyday regime in the life of things.
Authors
avatar for Rostyslav Konta

Rostyslav Konta

Professor, Taras Shevchenko National University of Kyiv
Professor at Taras Shevchenko National University of Kyiv (Ukraine) working in the fields of cultural anthropology, ethnology, history of science and technology, and sound studies. My research focuses on everyday life, music, media, and technology in Eastern Europe, especially in... Read More →
Friday May 29, 2026 11:00am - 11:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

11:00am CEST

Li Dakang: 3D Masterclass
Friday May 29, 2026 11:00am - 12:00pm CEST
Prof. Li Dakang is a preeminent recording engineer and
pioneer of 3D recording of Chinese traditional music,
ancient instruments and spaces. Attendees are treated to a
selection of unique 3D recordings, including a new and
glorious version of China’s National Anthem. Prof. Li
describes the LDK-Cube for capturing the envelopment of an
acoustic space, and questions reliable reproduction of this
important quality.

This masterclass series, featuring remarkable recording
artists, is a chance to hear 3D audio at its best; as we
discuss qualities that make it truly worth the effort.

In each masterclass, we explore the new spatial
possibilities in recording and production, detailing also
this specific listening room, regarding ITU-R BS.1116
compliance and auditory envelopment (AEV) transparency.
Seats are limited to keep playback variation at bay.
Speakers
avatar for Hanying Feng

Hanying Feng

China National Director, Communications University of China
avatar for Thomas Lund

Thomas Lund

Genelec Oy, Genelec Oy
Denmark
Friday May 29, 2026 11:00am - 12:00pm CEST
Aud 31 Technical University of Denmark Asmussens Alle, Building 306 DK-2800 Kgs. Lyngby Denmark

11:00am CEST

TC-SA : AES Technical Committee on "SPATIAL AUDIO"
Friday May 29, 2026 11:00am - 12:00pm CEST
AES Technical Committee on "SPATIAL AUDIO"



The AES Technical Committees (TC) lead the Society's involvement in science and technology, and are a hub of networking, knowledge and expertise. Each TC specializes in a specific area of audio, and helps forge links between each of these areas and the society as a whole.  Connect and engage!
Speakers
Friday May 29, 2026 11:00am - 12:00pm CEST
Aud 93 Technical University of Denmark Asmussens Alle, Building 302 DK-2800 Kgs. Lyngby Denmark

11:00am CEST

Obsidian Neural: Open-Source VST3 for Real-Time Generative AI – Architecting the AI as a Live Performance Instrument
Friday May 29, 2026 11:00am - 12:00pm CEST
Obsidian Neural is a novel, open-source VST3 plugin that
addresses the technical challenges of integrating
generative AI models directly into a low-latency digital
audio workstation (DAW) environment. This workshop will
provide a deep dive into the architecture designed to use
AI as a real-time performance instrument. We will cover the
C++/DSP strategies necessary for minimizing latency during
the asynchronous generation of audio loops via models like
Stable Audio Open. Crucially, we will detail the system's
ability to maintain musical coherence during a live mix,
achieved through an internal LLM "Brain" that processes
contextual session data (BPM, key, existing tracks) to
enrich generation prompts. Furthermore, we will explore the
technical solutions implemented for seamless integration
with the live mixing paradigm: quantized MIDI triggering,
multi-output routing, and the novel "Draw-to-Sound"
feature, which employs a Vision Language Model (VLM) to
translate visual input into musical parameters. This work
demonstrates a robust framework for generative AI to
function as an instantaneous, adaptable partner within
professional audio engineering workflows.
Speakers
AC

Anthony Charretier

Independent Developer
Friday May 29, 2026 11:00am - 12:00pm CEST
Building 302, 2nd floor Technical University of Denmark Asmussens Alle, Building 302 DK-2800 Kgs. Lyngby Denmark

11:30am CEST

Qualifying Timing Errors in Audio-over-Ethernet Networks for Live Sound
Friday May 29, 2026 11:30am - 12:00pm CEST
Audio-over-Ethernet (AoE) protocols have become fundamental
in modern live sound reinforcement systems, yet their
real-world synchronization behavior under diverse stress
conditions, both in terms of load; configuration, is not
accurately characterized. Microsecond-scale timing
mismatches between amplifier outputs can disrupt line-array
interference patterns, reducing directivity control;
spectral consistency. Ensuring robust timing accuracy
across large, mixed-traffic network topologies is therefore
critical for predictable system performance.
This paper presents a comprehensive, application-oriented
evaluation of Dante, AES67; Milan-AVB. A representative
multi-hop architecture typical of touring deployments has
been considered. A controlled measurement campaign,
combining eight daisy-chained switches, heavy concurrent
data traffic approaching link saturation,; sub-sampled
latency tracking, assesses each protocol under ideal
conditions, typical field situations,; common
misconfigurations.
Results reveal clear performance distinctions. Dante
exhibits substantial timing variations, exceeding
100~$\mu$s under load. AES67 provides tighter
synchronization but remains vulnerable to configuration
errors, which can induce latency drift or even audio packet
loss. Milan-AVB consistently maintains sub-microsecond
accuracy across all scenarios.
Authors
BD

Benjamin Duval

L Acoustics
avatar for Genio Kronauer

Genio Kronauer

Executive Director of Electronics & Networks Technologies, L Acoustics
Executive Director of Electronics & Networks Technologies
avatar for Nicolas Epain

Nicolas Epain

Application Research Engineer, L-Acoustics
Friday May 29, 2026 11:30am - 12:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

12:00pm CEST

Saul Walker Student Design Competition
Friday May 29, 2026 12:00pm - 1:30pm CEST
The Saul Walker Student Design Competition is a long-running event of the Audio Engineering Society that highlights practical and creative work in audio design. It brings together experienced judges and a wide range of strong student submissions each year.

During this session, students from around the world will present their projects and bring their hardware designs for hands-on inspection by the judges. The format encourages open discussion, giving attendees a chance to hear how ideas are evaluated and improved in a professional setting.

Sponsored by API, the competition includes cash prizes for the winners. More importantly, it offers students valuable feedback and the opportunity to connect with people working in the industry. The session is open to everyone—students and non-students alike—who are interested in seeing what participants have created and learning more about current work in audio design.
Speakers
avatar for Jamie Angus-Whiteoak

Jamie Angus-Whiteoak

Emeritus Professor/Consultant/VP-Northern Europe, AES
Jamie Angus-Whiteoak Is Emeritus Professor of Audio Technology at Salford University and VP for Northern Europe.

Her interest in audio was crystallized aged 11 when she visited the WOR studios, NYC, in 1967 on a school trip. After this she was hooked, and spent much of her free ti... Read More →
avatar for Christoph Thompson

Christoph Thompson

Director of Music Media Production, AES Education Committee, Ball State University
Christoph Thompson is vice-chair of the AES audio education committee. He is the chair of the AES Student Design Competition and the Matlab Plugin Design Competition. He is the director of the music media production program at Ball State University. His research topics include audio... Read More →
EL

Ewa Łukasik

Poznan University of Technology, Institute of ComputingnScience
Authors
avatar for Sascha Disch

Sascha Disch

Fraunhofer IIS, Fraunhofer IIS
Sascha Disch received his Dipl.-Ing. degree in electrical engineering from the Technical University Hamburg-Harburg (TUHH) in 1999 and joined the Fraunhofer Institute for Integrated Circuits (IIS) the same year. Ever since he has been working in research and development of perceptual... Read More →
Friday May 29, 2026 12:00pm - 1:30pm CEST
Aud 49 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

12:30pm CEST

Spatial Quality Measure for Mixed-phase Impulse Response Equalization
Friday May 29, 2026 12:30pm - 1:00pm CEST
Mixed-phase impulse response equalization can improve
magnitude; phase response, but conventional objectives
such as mean-squared error (MSE) can favor solutions that
introduce objectionable temporal artifacts, including
pre-echo; extended post-echo ringing. This paper
proposes a Spatial Equalization Quality Measure (SEQM) to
select a mixed-phase equalization filter that better
controls these artifacts while remaining computationally
simple; applicable across multiple listening positions.
SEQM combines (i) a temporal-domain metric that penalizes
energy preceding the main pulse of an impulse response;
energy persisting after it, while also accounting for the
decay rate of the post-response tail, with (ii) a spatial
aggregation rule that summarizes quality across measurement
positions. We use SEQM to select the modeling delay for
mixed-phase finite-impulse-response (FIR) equalization;
to compare mixed-phase FIR designs with minimum-phase FIR
; IIR alternatives under a common multi-position
measurement framework. Experiments using semi-anechoic
measurements across 34 spatial positions for two
loudspeakers show that SEQM consistently selects
substantially shorter delays than MSE-based selection;
yields impulse responses with reduced pre-echo; faster
post-response decay, while maintaining comparable
frequency-response equalization. These results suggest that
SEQM is a practical objective tool for designing
multi-position mixed-phase equalization filters.
Authors
BD

Bill Decanio

Samsung Electronics
avatar for Sunil Bharitkar

Sunil Bharitkar

Samsung Research America

Friday May 29, 2026 12:30pm - 1:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Audio Processing, Lecture

12:30pm CEST

Perceptual Evaluation of the Open Binaural Renderer
Friday May 29, 2026 12:30pm - 1:00pm CEST
This paper presents the perceptual evaluation of the Open Binaural Renderer (OBR), an open-source librarydeveloped for headphone-based rendering of Immersive Audio Model and Formats (IAMF) content. The evaluationfollowed an iterative framework in which findings from a pilot listening study informed the tuning of renderingprofiles, and the resulting renderer was benchmarked against established proprietary solutions. In the pilot study,19 expert listeners rated the Overall Listening Experience (OLE) of the initial prototype (OBRv1) and five externalrenderers across diverse audio content. Qualitative feedback was analysed using inductive coding to identify salientperceptual dimensions. The pilot revealed content-dependent performance and showed that a single default profilewas inadequate, yielding mixed responses in both the numerical scale and in the qualitative feedback and motivatingthe development of multiple rendering profiles in OBRv2. The main study evaluated two OBRv2 profiles targetingdifferent reverberation characteristics (Direct and Ambient) alongside three top-performing external renderers. Atotal of 39 participants, divided into expert and non-expert groups, rated five perceptual attributes: Voice Quality,Envelopment, Externalisation, Overall Listening Experience, and Timbral Balance. Mixed-design ANOVA revealedsignificant main effects of renderer condition on all attributes. Pairwise comparisons showed that OBRv2,Ambientachieved significantly higher OLE ratings than one proprietary renderer and reached statistical parity with theremaining two, representing a measurable improvement over the prototype. A trade-off between Voice Qualityand Externalisation was observed, driven by the level of reverberation in each renderer. The results demonstratethat iterative, perceptually informed tuning can yield competitive binaural rendering quality in an open-sourceframework.
Authors
FL

Felicia Lim

Google LLC
avatar for Gavin Kearney

Gavin Kearney

Professor of Audio Engineering, University of York
Gavin Kearney graduated from Dublin Institute of Technology in 2002 with an Honors degree in Electronic Engineering and has since obtained MSc and PhD degrees in Audio Signal Processing from Trinity College Dublin. He joined the University of York as Lecturer in Sound Design in January... Read More →
avatar for Jan Skoglund

Jan Skoglund

Google, Google

avatar for Jani Huoponen

Jani Huoponen

Google, Google LLC
With 25+ years of media industry product development, Jani Huoponen is a seasoned expert in developing cutting-edge audio and video technologies for consumer devices and streaming systems. Joining Google in 2010, he’s served as a product manager across key multimedia initiatives... Read More →
avatar for Katarzyna Sochaczewska

Katarzyna Sochaczewska

Immersive Music Producer, Researcher, University of York

TR

Tomasz Rudzki

University of York
Friday May 29, 2026 12:30pm - 1:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

12:30pm CEST

Evaluation of Objective Speech Intelligibility Metrics for Hearing-Aid Users in Multi-Talker Spatial Environments
Friday May 29, 2026 12:30pm - 1:00pm CEST
Despite the growing number of hearing-impaired workers
wearing hearing-aids in occupational settings,
understanding speech in multi-talker situations remains
challenging. This difficulty is particularly pronounced in
open-plan offices, where simultaneous talkers; room
reverberation are prone to degrade speech intelligibility.
While spatial cues are essential for segregating target
speech from competing sources, hearing-aids signal
processing may alter binaural information that supports
spatial hearing.
Accurate evaluation of hearing-aids performance is
therefore crucial. Objective speech intelligibility metrics
offer an efficient alternative to time-consuming listening
tests; however, their validity in complex spatial scenarios
involving hearing-impaired listeners remains unclear.
Monaural metrics such as HASPI account for individual
hearing loss but neglect spatial information, whereas
binaural metrics such as MBSTOI incorporate spatial cues
but are primarily designed for normal-hearing listeners.
This study evaluates the ability of existing objective
metrics to predict speech intelligibility for hearing-aid
users in multi-talker spatial environments. Listening tests
are conducted on 20 hearing-impaired participants fitted
with binaural hearing-aids. Four types of multi-talker
auditory scenes representative of open-plan offices are
reproduced using a loudspeaker array. They involve a target
speech, combined with diffuse noise; a localized
competing speech source. Objective measurements are
performed using an acoustic mannequin fitted with the
participants’ hearing-aids. HASPI; MBSTOI values are
computed from the binaural signals recorded at the eardrums
; incorporating individual hearing losses.
Objective predictions are compared with subjective
intelligibility scores,; an ablation analysis is
conducted to distinguish the effects of hearing loss
modeling from those of binaural processing.
Authors
JA

Jean-Pierre Arz

INRS ( Vandoeuvre lès Nancy) - Institut national denrecherche et de sécurité (Vandoeuvre lès Nancy)
JD

Joël Ducourneau

LEMTA - Laboratoire d'Energétique et Mécanique Théorique etnAppliquée
LD

Louis Delebecque

LEMTA - Laboratoire d'Energétique et Mécanique Théorique etnAppliquée
RS

Romain Serizel

LORIA - Laboratoire Lorrain de Recherche en Informatique etnses Applications
Friday May 29, 2026 12:30pm - 1:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Perception, Lecture

12:30pm CEST

Morten Lindberg: 3D Masterclass
Friday May 29, 2026 12:30pm - 1:30pm CEST
Morten describes his excellent recording techniques, and
attendees are treated to
a unique selection of high resolution 3D music listening
examples.

This masterclass series, featuring remarkable recording
artists, is a chance to hear 3D audio at its best; as we
discuss qualities that make it truly worth the effort.

In each masterclass, we explore the new spatial
possibilities in recording and production, detailing also
this specific listening room, regarding ITU-R BS.1116
compliance and auditory envelopment (AEV) transparency.
Speakers
avatar for Thomas Lund

Thomas Lund

Genelec Oy, Genelec Oy
Denmark
avatar for Morten Lindberg

Morten Lindberg

Engineer and Producer, 2L (Lindberg Lyd)
Recording Producer and Balance Engineer with 50 GRAMMY-nominations, 42 of these in craft categories Best Engineered Album, Best Surround Sound Album, Best Immersive Audio Album and Producer of the Year. Founder and CEO of the record label 2L. Grammy Award-winner 2020 and 2026. Immersive... Read More →
Friday May 29, 2026 12:30pm - 1:30pm CEST
Aud 31 Technical University of Denmark Asmussens Alle, Building 306 DK-2800 Kgs. Lyngby Denmark

12:30pm CEST

Innovative Measurement of Speech Intelligibility – Applications of Listening Effort in Research & Practice
Friday May 29, 2026 12:30pm - 2:00pm CEST
Speech intelligibility is a key factor in successful
communication across various domains, including research,
post-production for film and television, live sound
reinforcement, and audio production. Traditional assessment
methods often lack objectivity or fail to capture the
listener’s experience in real-world scenarios. In this
workshop, we introduce an innovative approach to measuring
speech intelligibility based on the concept of “Listening
Effort.” We will present the underlying technology, share
practical examples from different application areas, and
demonstrate how this method can be integrated into
workflows to optimize intelligibility. Attendees will have
the opportunity to participate in a hands-on demonstration
and discuss potential use cases relevant to their own work.
This session is designed for professionals and researchers
seeking reliable and actionable tools for evaluating and
improving speech intelligibility in diverse environments.
In this workshop, we present a new technology for measuring
speech intelligibility (“Listening Effort”). The method is
used in research, post-production (film/TV), live sound,
and audio production. The session is aimed at professionals
from both academia and industry who are interested in
objectively assessing and optimizing speech intelligibility.

Participants will be able to join a short demo/exercise and
ask questions.

Introduction & Relevance: Overview of the importance of
speech intelligibility across different fields
Technology & Methodology: Presentation of the measurement
method and underlying concepts
Practical Examples: Case studies from research,
post-production (film/TV), live sound, and production
Live Demo / Interactive Exercise: Practical demonstration
and opportunity for active participation
Discussion & Outlook: Q&A, exchange of ideas, and future
perspectives
Speakers
HB

Hannah Baumgartner

Fraunhofer IDMT
JR

Jan Rennies-Hochmuth

Fraunhofer IDMT
Friday May 29, 2026 12:30pm - 2:00pm CEST
Aud 41 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

Systematization of Multiplier-less Convolution for 1-bit Audio Signal
Friday May 29, 2026 1:00pm - 1:30pm CEST
High-speed 1-bit signals generated by oversampling are
widely used in audio applications as they allow simple
demodulation via low-pass filtering while preserving
in-band spectral characteristics with high accuracy.
However, conventional FIR filtering of such signals
generally requires conversion to a multi-bit representation
at a common sampling frequency, which increases
computational cost; complicates the overall processing
flow. This paper addresses the convolution of high-speed
1-bit audio signals with multi-bit FIR impulse responses
; presents a systematic formulation of a multiplier-less
convolution approach. Based on a mathematical
reinterpretation of convolution, the proposed formulation
describes how time shifting; amplitude weighting can be
expressed through structured rearranging of 1-bit samples
without arithmetic operations. This provides a theoretical
description of previously reported 1-bit convolution
methods; however, its validity has not been fully
formalized. We examine the spectral characteristics of the
proposed convolution method; compare them with those
obtained by multi-bit convolution followed by ΔΣ
modulation. Experiments are conducted by convolving 1-bit
input signals with FIR filters having multi-band frequency
responses. Spectral analysis shows that the proposed method
achieves extremely high agreement with the standard
approach within the audible band while the differences
appear primarily at much higher frequencies outside the
audible range. These results demonstrate that convolution
of high-speed 1-bit audio signals can be achieved without
multipliers, suggesting the potential for highly efficient
hardware-oriented signal processing architectures.
Authors
IS

Iori Sakurai

Waseda University
TS

Tomohiro Sakaguchi

Doctoral student, Waseda University
YO

Yasuhiro Oikawa

Waseda University

YG

Yuta Gomi

Waseda University
Friday May 29, 2026 1:00pm - 1:30pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Audio Processing, Lecture

1:00pm CEST

Gaussian Splatting-Based Head; Pinna Reconstruction for Individualized HRTF Computation from Commodity Multi-View Images
Friday May 29, 2026 1:00pm - 1:30pm CEST
Individualized head-related transfer functions (HRTFs)
require accurate pinna geometry, yet commodity multi-view
captures leave the ear region self-occluded; weakly
textured. We present a practical pipeline that couples
ear-centric acquisition with 3D Gaussian splatting (3DGS)
; the boundary element method (BEM) for complete HRTF
computation. The protocol augments horizontal views with
per-ear elevated captures under directional lighting; 3DGS
training with depth-distortion regularization yields
watertight meshes via truncated signed distance function
(TSDF) fusion. Standardized head coordinates; ear-canal
annotations interface the mesh with BEM. Experimental
evaluations demonstrate that our method achieves lower
ear-region geometric error; lower full-band spectral
distortion compared to existing image-based personalized
reconstruction baselines including AudioEar, NeuS,;
Metashape MVS.
Authors
HZ

Houlin Zhu

Peking University
TQ

Tianshu Qu

Peking University
XW

Xihong Wu

Peking University
Friday May 29, 2026 1:00pm - 1:30pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

Assessing Situational Awareness of Hearing-Impaired People Through their Perception of Non-Speech Sound Events: a Literature Review
Friday May 29, 2026 1:00pm - 1:30pm CEST
Situational awareness is a multisensory ability that
enables individuals to perceive; appropriately take into
account their immediate environment. This perception of the
world through our senses is carried out continuously;
unconsciously throughout the day. When auditory perception
is degraded, an individual may no longer correctly perceive
a doorbell, a water leak, or an alarm signal, which
negatively affects quality of life; may lead to
dangerous situations. Auditory perception can in particular
be degraded by hearing loss, a common; widespread
condition. The most common treatment consists of wearing
hearing aids, which are mainly designed to improve speech
intelligibility, especially in noisy environments. Feedback
from hearing-impaired people; hearing-aid users
indicates that, although auditory situational awareness has
been recognised as an essential component of well-being, it
remains insufficiently studied; requires further
investigation. There is currently no standard method for
assessing to which extent one's situational awareness is
affected by hearing impairment; the use of hearing aids.
This is a complex process that requires assessing the
perception of relevant sound events within a continuous
stream of multisensorial information, by individuals who
have different subjective preferences. Most existing
methods are limited to evaluating only a subset of the
problem, such as identification; localisation of
non-speech sound events. The rise of new technologies, such
as virtual reality, enables the development of assessment
methods within more realistic yet controlled environments.
This study aims to review existing methods in order to
highlight their limitations in addressing the issue at hand.
Authors
AF

Adil Faiz

Université de Lorraine, CNRS, LEMTA, F-54000 Nancy, France
BM

Balbine Maillou

Université de Lorraine, CNRS, LEMTA, F-54000 Nancy, France

EG

Emma Granier

Université de Lorraine, CNRS, Inria, Loria
JD

Joël Ducourneau

LEMTA - Laboratoire d'Energétique et Mécanique Théorique etnAppliquée
RS

Romain Serizel

LORIA - Laboratoire Lorrain de Recherche en Informatique etnses Applications
Friday May 29, 2026 1:00pm - 1:30pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Perception, Lecture

1:00pm CEST

Geometry Sensitivity in Low-Count Virtual Microphone Arrays: From Tetrahedral Baselines to Stochastic Spherical Layouts
Friday May 29, 2026 1:00pm - 3:00pm CEST
Virtual Microphone Array techniques are being investigated
by the authors to support room acoustics optimisation in
live sound environments. In our recent AES paper, “Room
Acoustics Optimisation Using Virtual Microphone Arrays”, a
notable outcome was that a compact four-microphone
tetrahedral array performed strongly relative to its low
sensor count. Recent virtual sensing; Remote Microphone
Technique research treats microphone placement as an
explicit design variable. It reports improved remote
estimation performance when microphone layouts are
deliberately chosen for the task, rather than adopted as
fixed, standard configurations.
This submission builds on our prior VMA work by focusing on
the four-microphone case, where geometry choices are
especially constrained. We compare a tetrahedral baseline
with an ensemble of stochastically generated spherical
layouts at the same array aperture using Monte Carlo
simulation. We apply a consistent evaluation protocol
across multiple listening-region offsets; standard
beamforming estimators to isolate variability due to
geometry alone. The central proposition is that, for
low-count VMAs, geometry is a first-order design parameter.
Tetrahedral remains a credible baseline, but lightweight
stochastic exploration can reveal alternative layouts that
are competitive;, in some cases, superior without
increasing channel count.
Authors
avatar for Brian de Brit

Brian de Brit

Lecturer, Technological University Dublin
Brian de Brit is a lecturer in the School of Electrical and Electronic Engineering at Technological University Dublin. He holds a B.Sc. in Mathematical Physics (University College Dublin), an M.Phil. in Music and Media Technologies (Trinity College Dublin), and a Master of Engineering... Read More →
DD

David Dorran

Technological University Dublin
Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

Clustered Virtual Microphone Arrays for Listener-Level Monitoring; Room-Correction in Live Sound
Friday May 29, 2026 1:00pm - 3:00pm CEST
This paper introduces clustered virtual microphone arrays
as a step toward improving listener-level virtual
microphone estimation for live sound. Multiple compact
microphone sub-arrays are placed around a nominal overhead
position. Each sub-array produces a virtual microphone
estimate,; the estimates are fused. The aim is to attack
the estimation problem from multiple viewpoints; reduce
sensitivity to any one array placement or geometry.
The work builds on our earlier paper, “Room Acoustics
Optimisation Using Virtual Microphone Arrays”. That paper
proposed virtual microphones estimated from an overhead
array as a measurement layer for live sound optimisation.
It also highlighted a key limitation: in its initial form,
virtual microphone estimation quality was not yet strong
enough for reliable use across positions. The present paper
targets that limitation. We outline the clustered array
idea; treat cluster count; inter-cluster spacing as
design parameters. Virtual microphones are estimated using
beamforming; combined using simple fusion. Performance
is assessed with objective signal measures, including SNR
; frequency-; phase-related error measures, across
multiple listener-level target positions. The results
support further refinement under more realistic room
conditions; further study of the link between improved
estimation quality; FIR-based correction outcomes.
Authors
avatar for Brian de Brit

Brian de Brit

Lecturer, Technological University Dublin
Brian de Brit is a lecturer in the School of Electrical and Electronic Engineering at Technological University Dublin. He holds a B.Sc. in Mathematical Physics (University College Dublin), an M.Phil. in Music and Media Technologies (Trinity College Dublin), and a Master of Engineering... Read More →
DD

David Dorran

Technological University Dublin
Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

A Time–Frequency Integrated Framework for Frequency-Invariant Beamforming in Loudspeaker Arrays
Friday May 29, 2026 1:00pm - 3:00pm CEST
Loudspeaker array beamforming technology has been widely
used; however, current frequency-domain; time-domain
design methods for calculating FIR filters face challenges,
including the need for modeling delay; high
computational complexity. To address these issues, this
paper proposes a time–frequency integrated framework. This
framework supports both pressure matching; amplitude
matching methods, enabling not only the realization of
traditional superdirective beams but also the design of
frequency-invariant beams. For the nonlinear optimization
problem in amplitude matching, an efficient solving
algorithm based on the Alternating Direction Method of
Multipliers (ADMM) is introduced. Experimental results
demonstrate that the proposed method combines the
advantages of existing frequency-domain; time-domain
approaches, directly computing FIR filter coefficients
without delay modeling while maintaining high computational
efficiency. This provides an effective solution for beam
control in loudspeaker arrays.
Authors
JY

Jianbin Yang

Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark
KP

Keyu Pan

Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark
NC

Ning Cong

Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark
XT

Xing Tian

Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark, Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark
Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

The Impact of Frequency Gradient on Nonlinear Pulse Distribution in the Farina Technique
Friday May 29, 2026 1:00pm - 3:00pm CEST
The Exponential Sine Sweep (ESS) technique, popularized by
Angelo Farina, has become a cornerstone of modern
electroacoustic measurement due to its unique capability to
simultaneously extract a system’s linear impulse response
; its individual harmonic distortion components. Standard
implementation of this method almost exclusively utilizes a
low-to-high (upward) exponential sine sweep. However,
during a technical Q&A session at the AES Europe 2025
Convention in Warsaw, a question was raised: what are the
practical consequences of reversing the sweep direction?
This inquiry is particularly relevant given that several
industry-standard measurement platforms often employ
high-to-low (downward) sweeps to optimize the mechanical
; thermal stability of the device under test (DUT) while
performing stepped or swept sinusoidal analysis.
This paper provides an investigation into the temporal
behavior of nonlinearities when the frequency gradient of
an exponential sweep is inverted. Through formal
mathematical derivation; numerical simulations the study
proves that while the spacing between distortion orders
remains identical in magnitude, the polarity; time
distribution of these impulses is reversed. Specifically,
we demonstrate that in a downward sweep, the distortion
products shift from the "pre-causal" negative time region
to the "post-causal" positive time region. This shift
causes harmonic distortion pulses to emerge within the
reverberant tail of the impulse response, leading to
significant contamination of decay measurements;
energy-time curves. By contrasting the "tracking filter"
paradigm with "time-domain deconvolution," this work
clarifies why sweep direction is a critical parameter that
must be aligned with the specific goals of the measurement
protocol.
Authors
avatar for Daniele Ponteggia

Daniele Ponteggia

Materiacustica Srl
Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

Real-Time Heart Rate Sonification Using Spectral Filtering of Preferred Music for Running Training
Friday May 29, 2026 1:00pm - 3:00pm CEST
The purpose of this study was to evaluate a sonification
system that maps live heart rate data to real-time spectral
filtering of a runner's preferred music. Assessed using a
within-subjects design (n = 13), the system employs
high-pass; low-pass filters to indicate deviations from
target heart rate zones, providing instantaneous
biofeedback without requiring visual attention.
Quantitative analysis revealed no statistically significant
differences in target zone accuracy or response time
between auditory, visual,; combined conditions. However,
qualitative thematic analysis identified a clear division
in user preference. Participants favouring the auditory
condition demonstrated faster mean response times to audio
biofeedback. Findings suggest that while sonification
promotes environmental focus; "gamifies" training, its
efficacy is highly dependent on individual processing
styles; music familiarity.
Authors
avatar for Duncan Williams

Duncan Williams

Senior Lecturer, Acoustics Research Centre, University of Salford
JS

Jay Steel

Acoustics Research Centre, University of Salford
NR

Nicholas Ripley

School of Health and Society, University of Salford
Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

A Psychoacoustic Framework for In-Vehicle Audio-Light Mapping
Friday May 29, 2026 1:00pm - 3:00pm CEST
This paper proposes a psychoacoustic-based audio-visual
mapping framework for intelligent vehicle cabins to enhance
immersion; stabilize spatial auditory perception. By
establishing mappings between auditory descriptors—such as
Direction of Arrival (DOA), spectral centroid,; temporal
envelope—and ambient lighting parameters, the framework
leverages "ambient vision" to augment the perceptual
experience without increasing the driver's cognitive load.
Theoretical analysis based on Stevens’ Power Law indicates
that the proposed mapping strategies effectively
synchronize audio-visual intensities; mitigate
perceptual fatigue, providing a conceptual reference for
future multisensory HMI design.
Authors
avatar for Kangwei Wang

Kangwei Wang

Acoustic System Engineer, GoerDynamics Lab2
Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

Poster Session 3
Friday May 29, 2026 1:00pm - 3:00pm CEST
- Geometry Sensitivity in Low-Count Virtual Microphone Arrays: From Tetrahedral Baselines to Stochastic Spherical Layouts


- A Time–Frequency Integrated Framework for Frequency-Invariant Beamforming in Loudspeaker Arrays


- The Impact of Frequency Gradient on Nonlinear Pulse Distribution in the Farina Technique


- Real-Time Heart Rate Sonification Using Spectral Filtering of Preferred Music for Running Training


- A Psychoacoustic Framework for In-Vehicle Audio-Light Mapping


- Sound field creation with a cube-like loudspeaker array designed using Lamé function based on virtual sound source distribution



- Spatial Sound Field Reproduction Systems for Cabin Noise in Rail Vehicles: Performance Evaluation Based on Sound Quality Indices

- Clustered Virtual Microphone Arrays for Listener-Level Monitoring; Room-Correction in Live Sound


Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Posters Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

Sound field creation with a cube-like loudspeaker array designed using Lamé function based on virtual sound source distribution
Friday May 29, 2026 1:00pm - 3:00pm CEST
The diversification of audio content production has
increased the demand for realistic, immersive sound field
reproduction. Conventional methods struggle to separate
direct; reflected sounds, limiting accuracy. To address
this issue, this study proposes a method for sound field
reproduction that identifies the arrival directions of
reflected sounds based on the virtual sound source
distribution. In this study, the virtual sound source
distribution was calculated by using closely located four
point microphone method. Assuming that spherical waves
emitted from distant virtual sound sources arrive as plane
waves within the listening area, the target sound field is
generated through plane wave synthesis, enabling more
accurate; flexible sound field generation. Furthermore,
considering practical systems; typical room shapes, we
investigated the reproducibility of plane wave sound fields
using not only spherical array, but also cube-like
loudspeaker array configured by the Lamé function, which
allows continuous geometric transformation from a sphere to
a cube-like form. In this study, the ideal plane wave sound
field derived from the wave equation was regarded as the
reference,; the sound fields generated by the
loudspeaker arrays were evaluated; compared using mean
square error (MSE). Furthermore, the evaluation was
extended beyond a single time instant, enabling assessment
that also accounts for temporal variations. The results
indicated that changing the order of the Lamé function
maintained the desired level of reproducibility.
Consequently, it was confirmed that cube-like loudspeaker
arrays can achieve a level of reproducibility equivalent to
that of the spherical array.
Authors
TS

Tomohiro Sakaguchi

Doctoral student, Waseda University
YO

Yasuhiro Oikawa

Waseda University

YE

Yuzuki Eriguchi

Waseda University
Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

Spatial Sound Field Reproduction Systems for Cabin Noise in Rail Vehicles: Performance Evaluation Based on Sound Quality Indices
Friday May 29, 2026 1:00pm - 3:00pm CEST
Innovative railway vehicle systems such as high-speed rail,
maglev,; emerging transportation concepts are expected
to reduce conventional noise sources related to wheel–rail
; aerodynamic interactions. As these changes alter the
acoustic characteristics inside railway cabins, reliable
laboratory reproduction of interior noise becomes
increasingly important for evaluating passenger acoustic
comfort; guiding sound design during vehicle
development. Innovative railway vehicle systems such as
high-speed rail, maglev,; emerging transportation
concepts are expected to reduce conventional noise sources
related to wheel–rail; aerodynamic interactions. As
these changes alter the acoustic characteristics inside
railway cabins, reliable laboratory reproduction of
interior noise becomes increasingly important for
evaluating passenger acoustic comfort; guiding sound
design during vehicle development. The study focuses on
practical methods for assessing reproduction accuracy.
Conventional validation of reproduced sound fields
typically relies on sound pressure level; spectral
matching; however, these metrics alone may not fully
reflect perceptually relevant differences between in-situ
; reproduced environments. In this work, sound quality
indices are employed as complementary evaluation metrics to
examine whether reproduced sound fields maintain
perceptually meaningful characteristics of the original
cabin noise. Comparisons between in-situ recordings;
reproduced sound fields were conducted in terms of overall
sound pressure level, frequency characteristics,;
selected sound quality indices. In addition, the influence
of loudspeaker number; spatial configuration on
reproduction performance was examined. The results show
that sound quality–based evaluation provides useful
additional information for assessing perceptual fidelity
; for optimizing spatial sound reproduction systems for
railway cabin noise. The proposed reproduction platform
supports laboratory-based assessment of interior railway
noise; provides a practical framework for perceptually
informed acoustic evaluation; noise control during the
design of next-generation railway vehicles.
Authors
HK

Hyo-In Koh

Korea Railroad Research Institute
JH

Jiyoung Hong

Korea Railroad Research Institute
WS

Wooseok Song

University of Science and Technology
avatar for Yonghee Lee

Yonghee Lee

Research Associate, Changwon National University
Yonghee Lee
Ph D. Mechanical Engineeing.
Ultrasonic, Acoustic, SHM, NDE, fNIRS, and Bio-medical engineering.
Contact: [email protected]
Institute: Changwon National Uniersity, South Korea
Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

Transient Evoked Otoacoustic Emissions; Self Reported Sound Exposure
Friday May 29, 2026 1:30pm - 2:00pm CEST
Headphone listening has become an integral part of everyday
life, spanning music consumption, communication, online
media,; increasingly, computer gaming. These diverse
listening contexts make individual sound exposure highly
variable; difficult to quantify. While music listening
; occupational headphone use have been widely studied,
sound exposure from gaming remains comparatively
undocumented. This study investigated the relationship
between self‑reported exposure through headphones;
cochlear function assessed using transient evoked
otoacoustic emissions (TEOAE). Forty‑one university
students completed a detailed questionnaire on listening
habits,; TEOAEs were recorded in both ears across five
half‑octave frequency bands. Estimated weekly exposure
levels were derived from participants’ reported durations
; contexts of use. TEOAE amplitude, signal‑to‑noise ratio
(SNR),; reproducibility showed clear frequency‑dependent
patterns; small ear asymmetries, consistent with typical
OAE behaviour. Only limited associations were found between
self‑reported exposure; TEOAE measures, with significant
effects emerging primarily for SNR; reproducibility in
the highest‑exposure group. No consistent differences were
observed between long‑term gamers; non‑gamers. These
findings suggest that self‑reported exposure alone may be
insufficient to detect subtle cochlear changes in young
adults,; underscore the need for more precise
exposure‑monitoring methods when evaluating recreational
sound exposure risks.
Authors
DH

Dorte Hammershøi

Professor, Acoustics and Hearing, AI and Sound, Department of Electronic Systems, Aalborg University
RO

Rodrigo Ordoñez

Aalborg University
Friday May 29, 2026 1:30pm - 2:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

An Extended Multichannel Frequency-Domain FxLMS Algorithm for Real-Time Full-Band Adaptive Transaural Reproduction
Friday May 29, 2026 1:30pm - 2:00pm CEST
This paper presents a multichannel adaptive filtering
algorithm for real-time full-band adaptive transaural
reproduction on general-purpose hardware. It is based on a
multichannel frequency-domain FxLMS algorithm using an
overlap-save framework for both filtering; adaptation,
; is extended with (i) online plant identification for
fully adaptive operation, (ii) frequency-dependent
normalization for faster convergence,; (iii)
frequency-dependent regularization to stabilize adaptation.
The proposed algorithm is implemented in C language on a
standard desktop PC; evaluated on a 4x2 transaural
configuration running in real time at 48 kHz with 2048-tap
control filters. Two evaluation tests are conducted. The
first test consists of reproducing two uncorrelated
white-noise signals at the ears of a manikin using
crosstalk cancellation as the performance metric. An
average crosstalk cancellation of 32 dB over 100 Hz–20 kHz
is demonstrated. The second experiment considers binaural
signal reproduction as a more realistic use case of the
algorithm. In both cases, performance is assessed for both
a static listener; a moving listener scenario,
demonstrating the algorithm’s ability to rapidly re-adapt.
Friday May 29, 2026 1:30pm - 2:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

A Perceptual Evaluation Method for Binaural Rendering Algorithms via Minimum Audible Angle Measurements
Friday May 29, 2026 1:30pm - 2:00pm CEST
Binaural rendering is typically assessed via timbre;
localization accuracy, while its intrinsic spatial
resolution remains rarely quantified. This paper proposes a
perceptual evaluation method based on Minimum Audible Angle
(MAA) measurements to estimate the azimuthal
just-noticeable difference (JND) introduced by binaural
rendering algorithms. We systematically compared several
rendering algorithms across eight reference azimuths using
two participant-allocation paradigms. The results show that
spatial resolution is significantly influenced by Ambisonic
order; choice of the rendering alrorithm, with MAA
thresholds systematically decreasing as the truncation
order increases. Furthermore, the propsed method
successfully captures physiological spatial characteristics
; identifies resolution limits imposed by reference
angles. While both participant-allocation paradigms yield
consistent qualitative trends, the repeated-measures design
provides superior data stability. These findings
demonstrate that the proposed MAA-based method is an
effective tool for quantifying the spatial resolution of
binaural rendering algorithms.
Authors
HZ

Houlin Zhu

Peking University
TQ

Tianshu Qu

Peking University
XW

Xihong Wu

Peking University
YQ

Yufan Qian

Peking University
Friday May 29, 2026 1:30pm - 2:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

Richard King: 3D Masterclass
Friday May 29, 2026 1:30pm - 2:30pm CEST
Richard is a multiple Grammy Award–winning recording
engineer and a specialist in acoustic music recording. His
work is focused primarily on classical, jazz, and film
score music. A selection of his immersive recordings will
be presented, accompanied by a discussion of the microphone
configurations and mixing decisions employed in each
example.

This masterclass series, featuring remarkable recording
artists, is a chance to hear 3D audio at its best; as we
discuss qualities that make it truly worth the effort.

In each masterclass, we explore the new spatial
possibilities in recording and production, detailing also
this specific listening room, regarding ITU-R BS.1116
compliance and auditory envelopment (AEV) transparency.
Seats are limited to keep playback variation at bay.
Speakers
avatar for Richard King

Richard King

McGill University, McGill University
Montreal
avatar for Thomas Lund

Thomas Lund

Genelec Oy, Genelec Oy
Denmark
Friday May 29, 2026 1:30pm - 2:30pm CEST
Aud 31 Technical University of Denmark Asmussens Alle, Building 306 DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

Education & Career Fair
Friday May 29, 2026 1:30pm - 3:00pm CEST
The only education and career fair focused entirely on
degree and certificate programs in audio around the world.
Come meet professors, students, and college admissions
representatives, and discover how to advance your career as
an audio professional!

For institutions wishing to participate in the 2026 AES
European Convention Education and Career Fair, please sign
up here:
Speakers
avatar for Ian Corbett

Ian Corbett

AES / Kansas City Kansas Community College / off-beat-open-hats LLC, AES
Dr. Ian Corbett is the Coordinator and Professor of Audio Engineering and Music Technology at Kansas City Kansas Community College. He also owns and operates "off-beat-open-hats LLC”, providing live sound, audio production, and recording services to clients in the Kansas City area. Highly active... Read More →
Friday May 29, 2026 1:30pm - 3:00pm CEST
Aud 49 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:00pm CEST

Real-Time Implementation of Personal Sound Zones Using Partitioned Convolution in Purr Data
Friday May 29, 2026 2:00pm - 2:30pm CEST
Personal sound zones aim to reproduce distinct audio
contents in separate spatial regions using loudspeaker
arrays, while minimizing acoustic interference between
zones. Although well established theoretically, their
real-time implementation remains challenging due to the
long impulse responses involved; the latency constraints
of audio processing systems.
This work presents a real-time implementation of personal
sound zones based on the pressure matching method in a
static context, i.e. transfer functions between the
loudspeakers; the zones are assumed to remain constant.
Sound zone filters are computed in the frequency domain
from experimentally measured impulse responses between an
array of 18 loudspeakers; two microphone arrays of 9
microphones defining a bright zone; a dark zone. The
system performance is then evaluated in terms of acoustic
contrast, reproduction error,; effective frequency
range. To meet real-time constraints, a fast partitioned
convolution algorithm has been used, namely the
Uniformly-Partitioned Overlap Save (UPOLS). This methods
has been implemented in C++ as an external block for the
Purr Data real-time audio environment. Experimental
results, obtained in a semi-anechoic environment,
demonstrate that it enables stable real-time multichannel
convolution with negligible numerical error compared to
offline convolution. The proposed system results in a
functional real-time sound zones demonstrator, suitable for
experimental; interactive spatial audio applications.
The codes are shared in a GitHub repository so that the
scientific community can benefit from them.
Authors
GP

Guilhem Pagès

Laboratoire d'Acoustique de l'Université du Mans (LAUM),nUMR 6613
JB

Jean Beuchet

Laboratoire d'Acoustique de l'Université du Mans (LAUM),nUMR 6613
avatar for Manuel Melon

Manuel Melon

Professor, LAUM / LE MANS Université


TL

Titouan Lefrancois

Laboratoire d'Acoustique de l'Université du Mans (LAUM),nUMR 6613
Friday May 29, 2026 2:00pm - 2:30pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:00pm CEST

Toward an improved auditory model for predicting binaural coloration
Friday May 29, 2026 2:00pm - 2:30pm CEST
The evaluation of audio quality is important in the
development of immersive audio algorithms; reproduction
systems,; binaural models are often used for this as a
quick alternative to listening tests. Coloration (i.e.,
perceived loudness differences integrated across ears;
frequency) is one key quality aspect; however, the majority
of models used to predict coloration are often
oversimplified or are missing a dedicated binaural stage to
consider the relative contribution of the left; right
ear signals. A binaural coloration model is presented that
builds upon previous work; tests three different
approaches for its binaural stage. The proposed model is
evaluated in comparison with nine models that are
frequently used to predict coloration by using data from
five listening tests totaling 252 stimuli with various
audio contents; source positions. The proposed model
performed best with 85% of explained variance, followed by
predictions based on ISO 532-1 loudness, yielding 78%
explained variance. The commonly used log-spectral distance
performed worst, with only 44% explained variance. The
three tested binaural stages had little influence on the
performance of the proposed model. The model is made freely
available to download.
Authors
avatar for Thomas McKenzie

Thomas McKenzie

Lecturer in Acoustics, University of Edinburgh
Thomas McKenzie is a Lecturer in Acoustics and Architectural Acoustics at the Reid School of Music, Edinburgh College of Art, University of Edinburgh, UK. He completed a B.Sc. in Music, Multimedia, and Electronics at the University of Leeds, UK, in 2013, before completing his M.Sc... Read More →
Friday May 29, 2026 2:00pm - 2:30pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Immersive Audio, Lecture

2:00pm CEST

Audio Design Roundtable
Friday May 29, 2026 2:00pm - 3:00pm CEST
Join us for a panel discussion about audio design featuring some of the industry’s leading audio designers and educators. This session is meant to inspire upcoming designers and encourage dialogue with established audio designers.
 
The panelists will give a brief overview of their designs, their roles in the AES, and how and why educators and students should participate in the various design competitions that the AES has to offer. The panel discussion is followed by a Q&A session that allows for questions and exchange with the panelists.

Speakers
avatar for Jamie Angus-Whiteoak

Jamie Angus-Whiteoak

Emeritus Professor/Consultant/VP-Northern Europe, AES
Jamie Angus-Whiteoak Is Emeritus Professor of Audio Technology at Salford University and VP for Northern Europe.

Her interest in audio was crystallized aged 11 when she visited the WOR studios, NYC, in 1967 on a school trip. After this she was hooked, and spent much of her free ti... Read More →
avatar for George Massenburg

George Massenburg

Associate Professor of Sound Recording, Massenburg Design Works
George Y. Massenburg is a Grammy award-winning recording engineer and inventor. Working principally in Baltimore, Los Angeles, Nashville, and Macon, Georgia, Massenburg is widely known for submitting a paper to the Audio Engineering Society in 1972 regarding the parametric equali... Read More →
avatar for Christoph Thompson

Christoph Thompson

Director of Music Media Production, AES Education Committee, Ball State University
Christoph Thompson is vice-chair of the AES audio education committee. He is the chair of the AES Student Design Competition and the Matlab Plugin Design Competition. He is the director of the music media production program at Ball State University. His research topics include audio... Read More →
Friday May 29, 2026 2:00pm - 3:00pm CEST
Aud 41 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:30pm CEST

Exploring Rendering Variability in Next-Generation Audio Reproduction
Friday May 29, 2026 2:30pm - 3:00pm CEST
This study evaluates three Next-Generation Audio (NGA)
rendering systems through listening tests using real-life
audio content. The testing paradigm prioritized subjective
preference over adherence to a ground-truth reference.
Participants assessed perceptual spatial audio attributes
in both 5.1; 7.1.4 loudspeaker setups. The findings
suggest that strict adherence to the rendering algorithm
used during content creation is not mandatory in terms of
listener preference. While not advocating disregarding
artistic intent without consideration, this study proposes
that such flexibility in reproduction can be an acceptable
compromise.
Authors
ES

Ema Souza-Blanes

Samsung Research America
avatar for Toni Hirvonen

Toni Hirvonen

Researcher, Samsung Research America
Toni Hirvonen studied acoustics at the Helsinki University of Technology (now Aalto University), where he obtained a PhD in audio signal processing and spatial audio. After a position as a Marie Curie fellow, he has worked internationally in the audio industry since 2010. His projects... Read More →
WJ

Wonbeen Jo

Samsung Research
YK

Yongmin Kwon

Samsung Research

Friday May 29, 2026 2:30pm - 3:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:30pm CEST

Immersive Underwater Audio Capture Using a Wideband Spatial Hydrophone Array
Friday May 29, 2026 2:30pm - 3:00pm CEST
Immersive audio continues to expand beyond traditional
studio; terrestrial field-recording environments, yet
underwater soundscapes—particularly those involving marine
mammals—remain largely documented in mono or stereo
formats. This paper presents a practical; low-cost
approach for capturing immersive underwater audio using a
newly developed wideband hydrophone; a multichannel
array optimized for marine environments. The hydrophones,
designed by the author, feature a low noise floor, extended
frequency response exceeding 100 kHz,; direct
compatibility with standard P48 phantom-powered audio
recorders, enabling deployment without specialized
underwater preamplifiers or power systems.

To translate established immersive recording techniques
into the ocean environment, an array architecture was
developed based on a compact eight-element cube geometry.
Two array variants were constructed to account for the
significantly higher speed of sound in water compared to
air, allowing the spatial characteristics of underwater
sources to be captured with appropriate inter-element
spacing. Field recordings were conducted off the coast of
Hawaii in January during the peak season for humpback whale
song. Recordings were made at multiple depths; positions
to explore variations in reverberation, propagation,;
ambient biological activity.

Preliminary results indicate that the system captures
detailed spatial cues from humpback whale vocalizations
while simultaneously preserving the rich ambient marine
soundscape. The extended ultrasonic response further allows
slowed or pitch-shifted playback to reveal fine temporal
structures not typically audible. This work demonstrates a
feasible method for immersive underwater recording;
provides a foundation for both scientific research;
creative content production.
Authors
avatar for Jules Ryckebusch

Jules Ryckebusch

Sound Sleuth, Sound Sleuth
Jules career with audio and electronics started early. At 16 he built an analog synthesizer from a PAiA kit. While still in high school, he designed and built a mixing board then started doing sound for local bands.
Jules went to college, studied physics, and then joined the US Navy where he spent 20 years as a nuclear submariner. In between submarines, he was an instructor at the Naval Nuclear Power School in Orlando, Florida. He taught Reactor Kinetics by day, and spent many a night in local... Read More →
Friday May 29, 2026 2:30pm - 3:00pm CEST
Aud 31 Technical University of Denmark Asmussens Alle, Building 306 DK-2800 Kgs. Lyngby Denmark

2:30pm CEST

Transport to Copenhagen City Hall from 2:30pm - LAST BUS departs at 3:00pm !!!!
Friday May 29, 2026 2:30pm - 4:00pm CEST
If you registered for the Official Reception at Copenhagen City Hall (during registration for the Convention) then busses will take you to the City Hall in the City Center off Copenhagen.

The LAST Bus will leave at 3.00pm - EXACTLY!!!

The busses will start to board at 2:30pm and the first bus will leave at 2:40pm - so if you are ready - please come and start boarding the busses from 2:30pm !


There will be no possibility to go to City Hall AFTER 3.00pm !!!!


3.00pm will be the very LAST Bus to City Hall !
Speakers
avatar for Brecht De Man

Brecht De Man

Head of Research, AES President
Brecht De Man is Head of Research at PXL-Music, guest lecturer at the Royal Conservatoire of The Hague, and author of Intelligent Music Production (Routledge 2019). He holds a PhD from the Centre for Digital Music at Queen Mary University of London, where he developed and evaluated... Read More →
avatar for Jan Abildgaard Pedersen

Jan Abildgaard Pedersen

Convention Chair, Audio Engineering Society
Jan Abildgaard Pedersen Consult offers a wide variety of services: Sound Tuning, Innovation Process, Audio DSP Algorithms, Solving impossible Audio Problems, Room Adaptation, Audio System Development, Audio Research, Audio Strategy Advisor, Patent Advice, White Papers, Scientific... Read More →
Friday May 29, 2026 2:30pm - 4:00pm CEST
External to the Convention Venue Just outside building 302

3:00pm CEST

TC-CAS : AES Technical Committee on "CODING OF AUDIO SIGNALS"
Friday May 29, 2026 3:00pm - 4:00pm CEST
AES Technical Committee on "CODING OF AUDIO SIGNALS"



The AES Technical Committees (TC) lead the Society's involvement in science and technology, and are a hub of networking, knowledge and expertise. Each TC specializes in a specific area of audio, and helps forge links between each of these areas and the society as a whole.  Connect and engage!
Speakers
Friday May 29, 2026 3:00pm - 4:00pm CEST
Aud 93 Technical University of Denmark Asmussens Alle, Building 302 DK-2800 Kgs. Lyngby Denmark

3:00pm CEST

'Nexus Sonance - Real-Time Cosmic Decay' - A real-time spatial sonification of satellite space weather and geomagnetic data using machine learning for generative soundscape synthesis
Friday May 29, 2026 3:00pm - 4:00pm CEST
‘Nexus Sonance’ is a unique sonic space that is dependent
on time, place, and the people interacting with the
installation. At its core, the concept is a sonic portal
where the Cosmos, Earth, and Humans are unified through
sound in a single location. For this iteration of this
project, we decided to focus on real-time satellite data
sonification and put it as the core concept as our network
makes it possible to collaborate with ESA (European Space
Agency) therefore by sonifying real-time satellite data, we
create an evolving spatial environment where cosmic events
are felt and heard in the present moment.

This installation applies innovative machine-learning based
synthesis techniques with immersive spatial audio to embody
the experience of these unstoppable physical forces,
harnessing real-time satellite and geomagnetic data
provided by the European Space Agency (ESA), by simulating
neural network model corruption via cosmic radiation
following real-time patterns transmitted by the L1
satellite cluster.

While the cosmos is traditionally conceptualized through
heavily processed and "colorized" visual imagery, this
installation shifts the lens to a sonic perspective. Here,
the positioning and synthesis of sound are governed by
live-data streams from orbital satellites and global
geomagnetic stations, creating an immersive environment
where the listener experiences a unique, time-bound sonic
representation of our solar system. As our lives are
increasingly dependent on machine learning and neural
networks whose fundamentals are based on binary forms of
data, introducing the concept of cosmic radiation and data
corruption (based on a Single Event Upset phenomenon)
throughout the duration of the work references the idea of
Entropy as ever changing and expanding state.

Using an ESA-provided data from satellites that measure
specific states of solar weather, namely the Interplanetary
Magnetic Field (IMF) measurements including such variables
as Bt (the total induction of the IMF) which are assigned
to parameters such as intensity, saturation and amount of
the created sonic particles. Other variables include Bx,
By, Bz that compose a 3D magnetic force field measurements
are translatable to positioning, velocity of travel, and
direction of travel. In case ESA could not provide such
data, a backup plan is to use publicly available data from
https://norlys.live/rtsw. Furthermore, with data from L1
ESA-operated satellites, the readings regarding cosmic
radiation flux in space could be used to train a model
predicting the Single Event Upsets (SEU) and simulate
events that impacts the way that RAVE (Ircam) outputs
audio. This creates a metaphorical and literal decay of the
machine-learning output, mirroring the impact of radiation
on digital infrastructure

The earth and human layers that sit as a canvas background,
are a pre-sculpted soundscapes in combination with RAVEs
generative output that is trained on a private collection
of earth recordings (using geophones) and spatial audio
recordings of Georgian Polyphonic Choir “Adilei” which
symbolises the earth and human element within the
spatial-sonic setting. Furthermore, using publicly
available data on geomagnetic changes across the globe
through an intermagnet.org website with a MagPy package, we
are able to influence the textural output of the RAVE where
the changes are related to the positioning of the actual
reading stations. This way achieving a multi-layered sonic
representation of earth’s constantly changing geomagnetic
field, which in combination with real-time space weather
sonification in spatial audio creates a complex and
innovative spatial soundscape that is driven predominantly
by real-time data, meaning, that every time the work is
presented, the outcome is unique, depending on the
circumstances under which it is played.
Speakers
avatar for Daniel Jones

Daniel Jones

Distinguished Audio Research Engineer, University of Iceland
I am an artist and researcher based in South London, interested in algorithmic composition, sonification, systems music, sound installations, and spatial audio. As part of my work, I develop a series of open-source frameworks for making music with Python, all available on GitHub... Read More →
avatar for Mikołaj Tchórzewski

Mikołaj Tchórzewski

Creative Spatial Audio Implementation Specialist, VizAion Immersive
MIKO (Mikołaj Tchórzewski) is a interdisciplinary artist with a main focus on spatial audio and sound art. Based in London his work, shaped by experiences from backpacking across different countries and cultures, explores the subjects of sound, space and collective psychology within... Read More →
AE

Adilei Ensemble

Adilei Ensemble
Friday May 29, 2026 3:00pm - 4:00pm CEST
Aud 31 Technical University of Denmark Asmussens Alle, Building 306 DK-2800 Kgs. Lyngby Denmark

4:00pm CEST

Official Reception at Copenhagen City Hall including Technical Presentation (separate registration needed!)
Friday May 29, 2026 4:00pm - 5:30pm CEST
If you registered for the Official Reception at Copenhagen City Hall (separate registration during registration for the Convention) then you can participate in this unique Official Reception at City Hall in Copenhagen.

We are pleased to invite AES Europe 2026 participants to a special reception at Copenhagen City Hall, offering a warm and official welcome to Denmark’s capital. Located in the very heart of the city, Copenhagen City Hall is one of the city’s most iconic landmarks. Designed by architect Martin Nyrop and inaugurated in 1905, the building is inspired by the medieval town hall of Siena, Italy. Its impressive interiors and renowned clock tower make it a fitting and memorable setting for this occasion.
At the reception, guests will be welcomed by the City of Copenhagen and enjoy a relaxed afternoon of networking with fellow conference participants. Refreshments will be served alongside the city’s much-loved specialty, the traditional Town Hall Pancake. We will hear a speech by an official person from the City Council followed by an interesting technical talk by Lars Risbo, CTO of Purifi, who will share a brilliant example of the danish contributions to the audio world:

Unreasonable Audio Innovation
The HiFi community is split into two camps we call “subjectivist” and “objectivist”. Subjectivists reject all measurements and only trust their ears. No explanation is too absurd so long as it doesn’t involve actual data. One of our products once drew this comment from a reviewer: “sounds surprisingly good for something that measures this well”. Objectivists obsess over spot measurements and double-blinded trials. If they are to be believed, almost nothing is audible. In spite of which they mindlessly seek to improve a handful of fixed metrics that too often are bad surrogate markers: “we’ve a recipe for this measurement, so that’s what we measure,” no matter the relevance to the end point of how it sounds. The human ear is amazing and complex: to some defects it is nearly deaf while to others it is mind-bendingly sensitive. Standard metrics do not cover that complexity. The two camps are so entrenched that neither is open to new ideas. This is even recognised in patent law: “technical prejudice” means you can prove an invention is not “obvious” because it goes against common but flawed beliefs. Shall we still depend on G. B. Shaw’s unreasonable man for making any progress or can we do better? To the subjectivist, audio is art. To the objectivist, it is science. We propose it is neither. Audio is engineering. Our task as engineers is building equipment and doing so in a rational manner. Doing a full blown DBT is only rational if the decision that’s at stake is an expensive one. It’s often cheaper and faster to fix a defect than to prove it’s audible. Standard measurements often miss glaring problems, so measurements must be designed with the specifics of the DUT in mind. It’s only after looking hard for bad news and not finding it that we can have some confidence that the news is good. This subtlety of approach can’t be arrived at simply by compromising between the objectivist and subjectivist positions. The pendulum must stop because the truth isn’t even in the middle. “The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.” George Bernard Shaw


Moderators
avatar for Jan Abildgaard Pedersen

Jan Abildgaard Pedersen

Convention Chair, Audio Engineering Society
Jan Abildgaard Pedersen Consult offers a wide variety of services: Sound Tuning, Innovation Process, Audio DSP Algorithms, Solving impossible Audio Problems, Room Adaptation, Audio System Development, Audio Research, Audio Strategy Advisor, Patent Advice, White Papers, Scientific... Read More →
Speakers
LR

Lars Risbo

CTO, Purifi
avatar for Brecht De Man

Brecht De Man

Head of Research, AES President
Brecht De Man is Head of Research at PXL-Music, guest lecturer at the Royal Conservatoire of The Hague, and author of Intelligent Music Production (Routledge 2019). He holds a PhD from the Centre for Digital Music at Queen Mary University of London, where he developed and evaluated... Read More →
Friday May 29, 2026 4:00pm - 5:30pm CEST
Copenhagen City Hall Rådhuspladsen 1, 1553 København, Danmark

6:00pm CEST

Self Organized Dinners In City Center of Copenhagen
Friday May 29, 2026 6:00pm - 9:00pm CEST
Self-organised Dinners at Restaurant in the Heart of Copenhagen of self-organised transport to your Hotels.

We recommend that people find a group of interesting fellow participants during this reception and go for a dinner in one of the many restaurants around Copenhagen City Hall in the heart of Copenhagen (at your own expense).



 
Friday May 29, 2026 6:00pm - 9:00pm CEST
Copenhagen City Center

7:30pm CEST

Student Party - meet at "Christians Brygge 12, 1221 Copenhagen" at 7.30pm
Friday May 29, 2026 7:30pm - 11:30pm CEST
AES Student Social/Party:
Students are invited to the AES Student Social/Party on Friday evening!
It’s a FREE scenic boat cruise to Reffen, a bar, street-food, entertainment complex – where you can make new friends, get drinks and food, and check out the music the DJ spins. Some limited refreshments will be provided, and there are multiple different bars and street food options there, for you purchase more! For more information visit: www.reffen.dk (Street food is open until 21:30, bars until 01:00.)


This event is made possible through the generous sponsorship of Genelec and Interfacio.


YOUR AES Convention Student badge is your boat ticket – don’t forget to bring it!
Meet Friday, 19:30, at:Christians Brygge 12, 1221 CopenhagenBoat departs at 20:00.


When it’s time to return to the city, walk 10-12 minutes to bus 2A, and it’s about a 20 minute ride to the city center. Make sure you have the Rejsebillet app on your phone to buy a single or day bus ticket, or exact cash fare for the bus driver. Bank cards are not accepted on the bus.

Speakers
avatar for Jesper Anderson

Jesper Anderson

Head of Tonmeister Programme, AES
As a Grammy-nominated producer, engineer and pianist Jesper has recorded around 100 CDs and produced music for radio, TV, theatre, installations and performance. Jesper has also worked as a sound engineer/producer at the Danish Broadcasting Corporation.


A recent album-production i... Read More →
avatar for Ian Corbett

Ian Corbett

AES / Kansas City Kansas Community College / off-beat-open-hats LLC, AES
Dr. Ian Corbett is the Coordinator and Professor of Audio Engineering and Music Technology at Kansas City Kansas Community College. He also owns and operates "off-beat-open-hats LLC”, providing live sound, audio production, and recording services to clients in the Kansas City area. Highly active... Read More →
Friday May 29, 2026 7:30pm - 11:30pm CEST
Student Party Christians Brygge 12, 1221 Copenhagen, Danmark
 


Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -