Loading…
Schedule as of May 16, 2022 - subject to change

Default Time Zone is CEST - Central European Summer Time
You can change your view to your time zone (look for "Timezone" on the right)


LIVESTREAMS : A and B


ON DEMAND VIDEOS (previous days)
 
Type: Audio Processing clear filter
Thursday, May 28
 

9:00am CEST

Deep Learning-Based Lower-Layer Upmixing
Thursday May 28, 2026 9:00am - 9:30am CEST
This paper introduces a novel approach for generating a
lower layer in multichannel audio upmixing, addressing a
gap in existing methods that primarily focus on mid; top
layers. Leveraging Harmonic-Percussive Separation (HPS),
the proposed framework dynamically adjusts key parameters
(separation factor, harmonic attenuation,; phase shift)
to enhance percussive components while diffusing harmonic
elements. We compared three neural network architectures
for this task: LSTM, TCN,; Transformer. Experimental
results show comparable perceptual quality; objective
metrics across all models, with the TCN being the most
balanced; suitable for deployment on edge devices.
Authors
ES

Ema Souza-Blanes

Samsung Research America
LM

Luis Madrid

Samsung Research Tijuana
avatar for Thaddeus Páez

Thaddeus Páez

Research Engineer, Samsung Research Tijuana
Research Engineer at Samsung Mexico.
Thursday May 28, 2026 9:00am - 9:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:30am CEST

Spectral Optimization for Automatic Multitrack Mixing Using Answer Set Programming
Thursday May 28, 2026 9:30am - 10:00am CEST
The mixing stage in music production involves a complex set
of interdependent technical; creative decisions aimed at
achieving a coherent; industry-level result. Intelligent
Music Production (IMP) is an emerging research area that
integrates Artificial Intelligence techniques into music
creation; post-production processes, spanning from
composition to mastering. Within this context, Answer Set
Programming (ASP), a declarative paradigm from Knowledge
Representation; Reasoning, has proven effective for
modeling; solving complex optimization problems. This
article presents frmixerr, an ASP-based intelligent system
designed to optimize the mixing process by automatically
generating balanced mixes. The system formulates mixing as
a combinatorial optimization problem; evaluates
candidate solutions against a reference spectral profile.
To assess its performance, a subjective listening test was
conducted comparing mixes generated by frmixerr with mixes
produced by human engineers with varying levels of
professional experience. The results indicate no
significant differences in perceived quality between
frmixerr mix; those created by professionals, suggesting
that ASP constitutes a viable approach for intelligent
assistance in music mixing.
Authors
CB

Carlos Benítez

Tec de Monterrey
FE

Flavio Everardo

Tec de Monterrey, University of Potsdam
Thursday May 28, 2026 9:30am - 10:00am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:00am CEST

Experimental study of sound zone methods for indoor/outdoor active noise cancellation
Thursday May 28, 2026 10:00am - 10:30am CEST
The development of personal sound zone systems in recent
years show great potential for low-frequency noise control
outside of noisy spaces. These approaches show promising
applications to manage noise pollution arising from
concerts in large venues or urban festivals. However, most
of the literature considered that the created sound zones
would exist in the same room or acoustic space as the noise
source. This premise hence discards all setups where the
disturbances would occur outside of concert venues (e.g in
neighboring houses). This paper presents a first
experimental study of the behavior of sound zone methods
for indoor sound zones; outdoor noise sources. These
initial results present a good efficiency of these methods
in this edge case, opening new use cases for these
approaches.
Authors
LH

Lucas Hocquette

L-Acoustics
avatar for Yves Pene

Yves Pene

Research Engineer, L-Acoustics
Thursday May 28, 2026 10:00am - 10:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:00am CEST

Beyond Species Identification: Real-Time Spatial Interaction Analysis in Avian Bioacoustics Using Microphone Arrays; Hybrid Beamforming on Edge Architectures
Thursday May 28, 2026 10:00am - 10:30am CEST
Conventional ornithological monitoring systems rely heavily
on single-channel recorders; deep learning classifiers
to identify "what" species is present, but fail to capture
"where" it is located or how individuals interact
spatially. This limitation hinders the study of complex
ecological behaviors, such as inter-specific spacing in
dense vegetation; predator-prey dynamics. We propose a
novel, dual-mode acoustic localization system designed to
unify semantic classification; spatial tracking.
Utilizing an economically scalable 16-channel Uniform
Rectangular Array (UMA-16) interfaced with edge-computing
platforms, we implement a hybrid spatial filtering pipeline
structured to balance real-time latency constraints with
achievable angular resolution. The first stage employs a
computationally efficient, noise-robust linear scanning
technique to generate an acoustic energy map; estimate
source multiplicity. This preliminary data initializes a
second-stage, super-resolution spectral estimation
algorithm predicated on signal-noise subspace
orthogonality, allowing the noise robustness of
non-parametric beamforming methods with the precision of
parametric approaches. By integrating these spatial filters
with standard deep learning classifiers, the system
resolves overlapping vocalizations in "Cocktail Party"
scenarios; improves Signal-to-Noise Ratio (SNR) for
cryptic species detection. We address the physical
"Localization-Detection Range Disparity," demonstrating
that while detection is viable at long ranges, precise
localization is constrained by the array aperture to the
near-to-mid field. The system outputs real-time video
overlays of acoustic heatmaps for field observation;
generates autonomous volumetric territory maps in fixed
deployments, collectively providing ornithologists with a
robust capability for analyzing the spatial ecology of
avian vocalizations.
Authors
avatar for Emre Göktuğ AKTAŞ

Emre Göktuğ AKTAŞ

Istanbul Technical University
MK

Mesut Kartal

Istanbul Technical University
Thursday May 28, 2026 10:00am - 10:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:00am CEST

Distortion Measurements; Can We Measure What We Hear?
Thursday May 28, 2026 10:00am - 11:00am CEST
There are many types of different distortions that can be
measured from linear to non-linear distortion. Often the
two are convoluted together and the linear distortion
influences the non-linear distortion. Distortion is also
very signal and level dependent and it is hard to compare
one type of distortion measurement to another. There are
many type of non-linear distortion metrics, e.g. THD, THD+N
and IMD being the most classic ones using sine tones as the
test signal. But how can we measure distortion with real
signals such as speech and music or even noise and compare
the results to audibility? This tutorial discusses a wide
range of distortion measurements, discusses what is audible
and what distortion sounds like.
Speakers
avatar for Steve Temme

Steve Temme

Listen Inc.
Steve Temme is founder and President of Listen, Inc., manufacturer of the SoundCheck audio test system. Steve founded the company in 1995, and for the past 30 years the company has remained on the cutting edge of research into audio measurement, regularly introducing new measurement... Read More →
Thursday May 28, 2026 10:00am - 11:00am CEST
Aud 49 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:00am CEST

The Early Electronic Orchestra: The Analogue Circuits Behind Electronic Keyboards Before Digital Came Along.
Thursday May 28, 2026 10:00am - 11:00am CEST
Before digital signal processing took over electronic
keyboard instruments, they were implemented using analogue
circuits that used tubes/valves, transistors, and even neon
lightbulbs! Yet using these components keyboards were
developed that could mimic string and brass ensembles,
pianos and harpsichords and many other instruments. How did
they do it?

The purpose of this tutorial is to look at both the
architecture and the circuitry of these instruments. And
show how amazing results could be achieved using
comparatively simple electronic circuitry. It will look at:

1. The basic architecture of these instruments
2. How they generated the right notes,
3. How they desired envelope,
4. And imposed them on the waveform,
5. Simulated the effect of many instruments playing
together.

It will also look at how, if it was required, touch
sensitivity could be achieved, such as in electronic
pianos. Where possible there will be audio examples
demonstrating the sounds that could be achieved.

For many people who have only ever experienced the digital
world it will be illuminating to see just how much could be
achieved by comparatively simple circuits.
In those days electronic components were expensive so
considerable ingenuity was expended in minimising the total
number of components required.

These instruments are part of our musical and audio
heritage and the circuit techniques they used are in danger
of being forgotten so this tutorial will be a timely
reminder of what used to be done.
It may also provide useful information to people who are
attempting to model these instruments using modern digital
methods.

The tutorial will be accessible to everyone, you will not
have to be an electronic engineer to understand the
principles behind these unique pieces of audio engineering
history.
Speakers
avatar for Jamie Angus-Whiteoak

Jamie Angus-Whiteoak

Emeritus Professor/Consultant/VP-Northern Europe, AES
Jamie Angus-Whiteoak Is Emeritus Professor of Audio Technology at Salford University and VP for Northern Europe.

Her interest in audio was crystallized aged 11 when she visited the WOR studios, NYC, in 1967 on a school trip. After this she was hooked, and spent much of her free ti... Read More →
Thursday May 28, 2026 10:00am - 11:00am CEST
Aud 41 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

11:00am CEST

Input-output linearization of loudspeaker dynamics via automatic differentiation
Thursday May 28, 2026 11:00am - 11:30am CEST
Input-output linearization is a technique for compensating
nonlinear distortion in loudspeakers. To apply it to
complex loudspeaker models, we describe an end-to-end
framework for estimating model parameters from data;
deriving the linearizing control laws using automatic
differentiation. The parameter estimation approach combines
frequency-domain linear parameter estimation with a
time-domain prediction-error method for the nonlinear
parameters. The linearization approach supports non-linear
reference systems; stabilization of the control law
using trajectory tracking. We implement the framework in
dynax, an open-source Python package based on JAX,;
validate it experimentally as a feed-forward controller on
a closed-box loudspeaker. Results demonstrate validation
errors of 1--5\,\% NRMSE; total harmonic distortion
reductions of 6--12\,dB. The framework enables researchers
; engineers to rapidly prototype; validate complex
loudspeaker models for distortion compensation without
manual symbolic derivations.
Authors
avatar for Finn Agerkvist

Finn Agerkvist

Technical University of Denmark
My interest are loudspeakers (measurements, modelling, (nonlinear) parameter estimation, nonlinear compensation. Active noise control, indoor and outdoor sound field control

Thursday May 28, 2026 11:00am - 11:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

Joint Neural Translation; Classification of Videos for Audio Processing
Thursday May 28, 2026 1:30pm - 2:00pm CEST
A low-parameter-count machine-learning model for
classifying streaming video can enable content-aware
audio/video processing on consumer edge devices with
latency, computational,; battery constraints. In this
paper, we propose a low-compute classification technique
that uses only text metadata from the streaming file
header, enabling near-instantaneous inference without
decoding; analyzing audio or video signals as is
traditionally done. In particular, to support multilingual
platforms such as YouTube, we first apply neural machine
translation as a pre-processing step for the text metadata
; optimize a lightweight neural classifier for a
three-class audio-centric classification taxonomy (movie,
music, dialog/other). Experiments on a mixed-language
YouTube dataset achieve $\approx$90\% classification
accuracy on a test set using a combined translation; a
classification model (with only $\sim22K$ parameters),
demonstrating a globally-scalable approach for robust
classification on the edge.
Authors
AC

Alejandro Cajica

Samsung Research Mexico
avatar for Sunil Bharitkar

Sunil Bharitkar

Samsung Research America

Thursday May 28, 2026 1:30pm - 2:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

New Paths for Immersive Music Streaming: Channel-based and High Resolution
Thursday May 28, 2026 1:30pm - 3:00pm CEST
Streaming of immersive audio is known to western audiences
almost exclusively in the object-based format, Atmos,
developed by Dolby and employing lossy codecs to limit bit
rates. Other object-based formats like Sony 360 have had
limited success, and until recently there were no channel
based streamed versions. But this situation is changing,
as it has already done in Japan.

Responding to growing interest in very high quality
immersive music for both on-demand streaming and live
broadcast, two new services are now active that offer,
first, channel-based audio and second, audio streamed in
high res PCM. Binaural mixes, a range of PCM formats and
video are variously included, with extensions to portables,
loudspeakers, and home theater.

This workshop provides a forum for discussion of both the
genuine promise and the challenges in these new
initiatives. Included are the advantages of high
resolution over lossy; channel-based versus object-based;
the degree of adoption of transducers for multichannel;
adaptive bit rates; data sources; and the Japanese
approach; amongst others.
Speakers
avatar for Kimio Hamasaki

Kimio Hamasaki

President, Artsridge LLC
Kimio Hamasaki, an AES Fellow, is a producer and balance engineer for music recordings, a researcher in spatial audio, an educator in audio engineering and acoustics, and a consultant in audio engineering. He has recorded and produced numerous orchestral and operatic works with the Vienna Philharmonic... Read More →
avatar for Stefan Bock

Stefan Bock

Managing Director, msm-studios GmbH
Stefan Bock, born 20.08.1964 in southern Germany was starting his career in 1987 as an audio engineer. After freelancing in different facilities in Munich, he co-founded msm-studios in 1991 where he was the Chief Mastering Engineer and General Manager.

He was leading msm-studios t... Read More →
avatar for Bert van Daele

Bert van Daele

CTO, Goer Dynamics BV
Bert Van Daele is CTO at NewAuro.
After graduating as an Engineer in Digital Electronics in 1997, he started out as an electronics designer at Philips Electronics, mainly working on digital products related to Surround Sound.
During a sabbatical leave, he worked at the Galaxy Studi... Read More →
avatar for Morten Lindberg

Morten Lindberg

Engineer and Producer, 2L (Lindberg Lyd)
Recording Producer and Balance Engineer with 50 GRAMMY-nominations, 42 of these in craft categories Best Engineered Album, Best Surround Sound Album, Best Immersive Audio Album and Producer of the Year. Founder and CEO of the record label 2L. Grammy Award-winner 2020 and 2026. Immersive... Read More →
VM

Vicki Melchior

Chair, AES Technical Committee - HRA; also: IndependentnConsultant, Audio DSP and Software
Thursday May 28, 2026 1:30pm - 3:00pm CEST
Aud 49 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

Binaspect: A Python Library for Binaural Audio Analysis, Visualization & Feature Generation
Thursday May 28, 2026 1:30pm - 3:30pm CEST
We present Binaspect, an open-source Python library for
binaural audio analysis, visualization,; feature
generation. Binaspect generates interpretable “azimuth
maps” by calculating modified interaural time; level
difference spectrograms,; clustering those
time-frequency (TF) bins into stable time-azimuth histogram
representations. This allows multiple active sources to
appear as distinct azimuthal clusters, while degradations
manifest as broadened, diffused, or shifted distributions.
Crucially, Binaspect operates blindly on audio, requiring
no prior knowledge of head models. These visualizations
enable researchers; engineers to observe how binaural
cues are degraded by codec; renderer design choices,
among other downstream processes. We demonstrate the tool
on bitrate ladders, ambisonic rendering,; VBAP source
positioning, where degradations are clearly revealed. In
addition to their diagnostic value, the proposed
representations can be exported as structured features
suitable for training machine learning models in quality
prediction, spatial audio classification,; other
binaural tasks. Binaspect is released under an open-source
license with full reproducibility scripts at: (link removed
for blind review)
Authors
AR

Alessandro Ragano

University College Dublin
DB

Dan Barry

University College Dublin
DS

Davoud Shariat Panah

University College Dublin
avatar for Jan Skoglund

Jan Skoglund

Google, Google

Thursday May 28, 2026 1:30pm - 3:30pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

Lightweight Real-time Spatial Audio Interpolation for Standalone VR using Hand Claps
Thursday May 28, 2026 1:30pm - 3:30pm CEST
Realistic spatial audio consistent with visual information
is essential for providing high immersion in Augmented
Reality (AR) environments. However, conventional
high-precision real-time acoustic simulations require
significant computational power, limiting their
implementation on standalone mobile VR devices such as the
Meta Quest. This study proposes a practical method to
enhance reverb realism using solely a standalone VR HMD,
without the need for additional external equipment. By
measuring impulse responses using a few hand claps in the
physical space, we interpolate room acoustic
parameters—specifically RT60; early/late energy
ratios—to reflect the environment's unique characteristics.
These extracted parameters are then applied to the VR
engine's built-in reverb effects, enabling dynamic,
location-aware real-time rendering with minimal
computational load. The proposed method demonstrates that a
brief calibration period of 3 to 5 minutes yields
significantly improved realism compared to static reverb
templates, offering an efficient; practical spatial
audio solution for mobile
AR environments.
Authors
MK

Minsu Kim

Seoul National University
Thursday May 28, 2026 1:30pm - 3:30pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:00pm CEST

Personalized VR for hearing research with embedded devices
Thursday May 28, 2026 2:00pm - 2:30pm CEST
Deep learning has significantly improved speech enhancement
performance in controlled laboratory conditions, yet these
advances rarely translate into robust real-world benefit
for hearing aid users. Current algorithms are trained;
evaluated in simplified acoustic scenarios, neglecting
multimodal cues, user interaction, environmental dynamics,
; the strict latency; power constraints of embedded
devices. As a result, a persistent gap remains between
algorithmic performance; everyday listening experience.
This position paper reviews recent progress in speech
enhancement, embedded Artificial Intelligence hardware,;
hearing aid systems,; argues for a shift toward
ecologically valid evaluation; hardware-aware design. We
propose virtual reality as a reproducible, multisensory
benchmarking platform enabling joint assessment of human
perception; algorithmic processing. This perspective
outlines a research roadmap toward adaptive, context-aware,
; practically deployable hearing technologies.
Authors
RS

Romain Serizel

LORIA - Laboratoire Lorrain de Recherche en Informatique etnses Applications
SS

Stefania Serafin

Department of Engineering Technology and Didactics,nTechnical University of Denmark
Thursday May 28, 2026 2:00pm - 2:30pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:00pm CEST

Perceptual Model Considering Comodulation Masking Release by Postmasking Adaptation
Thursday May 28, 2026 2:00pm - 2:30pm CEST
This work presents a perceptual model based on a complex
IIR filterbank. The filterbank with a frequency resolution
of 4 bands per Bark consists of 104 filters whose slopes
are designed to take spectral masking effects into account.
The filter outputs are used to obtain masking thresholds
with the following post processing. To obtain resonable
masking thresholds from the spreading outputs, a post
masking stage is required. Here, we propose a comodulation
dependent adaptation of the postmasking decay to model
Comodulation Masking Release (CMR) effects. This approach
explicitely considers the dip-listening effect known from
literature. The final masking thresholds are obtained by
weighting the postmasking outputs by a tonality dependent
gain, controlled using spectral flatness estimation. A
listening test compares the proposed method to an already
known approach using direct CMR based modification of the
masking threshold gains.
Authors
BE

Bernd Edler

International Audio Laboratories Erlangen, Germany
FS

Fabian Schaller

Fraunhofer IIS, Erlangen, Germany
Thursday May 28, 2026 2:00pm - 2:30pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:30pm CEST

A Recursive Attractor Network for Long-Form Sound Source Localization; Identity Tracking with a Variable Number of Sources
Thursday May 28, 2026 2:30pm - 3:00pm CEST
Sound source localization; identity tracking are
fundamental tasks in acoustic scene analysis, enabling
machines to determine what, where; when produces sound
events. While deep attractor-based networks have
demonstrated improved performance under an unknown number
of sources, maintaining continuous source tracking over
long-form audio remains challenging due to memory
limitations; permutation ambiguities across adjacent
segments. In this paper, we propose a Recursive Attractor
Network (RANet) for long-form sound source localization;
identity tracking with a variable number of sources. RANet
explicitly represents source attractors as transferable
embeddings; recursively propagates them across adjacent
audio segments using a LSTM-based model, thereby preserving
source identity continuity over time. Experimental results
on simulated datasets demonstrate that RANet achieves
robust long-form sound source localization; consistent
source identity tracking, outperforming baseline approaches
under variable; dynamic source conditions.
Authors
JD

Jiaqi Du

Peking University
TQ

Tianshu Qu

Peking University
XW

Xihong Wu

Peking University
Thursday May 28, 2026 2:30pm - 3:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

3:00pm CEST

Deep-Learning-Driven Sensory Profiling of Headphone Target Curves with Adaptive Listening Test Validation
Thursday May 28, 2026 3:00pm - 3:30pm CEST
Identifying robust headphone target curves is challenging
when preference data from untrained listeners are
interpreted without explicit perceptual structure. This
work presents a methodological framework in which deep-
learning-driven sensory-profile analysis serves as the
primary interpretive layer for listening data.
Candidate target curves are generated using an Interactive
Differential Evolution (IDE) listening experiment that
combines paired comparisons with a second- stage
absolute-rating task, enabling continuous exploration of the
perceptually relevant tuning space while reducing cognitive
load. Converged gain sets are analyzed using a Virtual
Listener Panel (VLP), a Deep Learning (DL) model trained on
large-scale expert evaluations to predict perceptual
attributes from rendered musical material. Predicted
attributes are reported as relative scores along key sensory
dimensions, including bass strength, timbral balance,;
brilliance, enabling exploration of sensory clusters,
perceptual trade-offs,; potential families of target
tunings.
Adaptive listening data from three culturally distinct
listener panels (Denmark, Japan,; Colombia; 20
participants
per site) support the DL-based interpretation. Convergence
is quantified as a reduction in population variance,
; cross-site analyses assess the similarity of clustering
structures; the consistency of relationships between
preference; sensory attributes. Overall, the framework
provides a scalable, perceptually grounded approach to
interpreting listener-preference data when developing
headphone target curves.
Authors
avatar for Gabriele Ravizza

Gabriele Ravizza

Perceptual Audio Evaluation Specialist, FORCE Technology
▪  Acoustics, psychoacoustics, product development, and digital communication as an Audio Engineer in the consumer electronics industry.
▪ Currently employed as a specialist at FORCE Technology's SenseLab department, contributing to enhancing sound quality in a wide range of consumer electronics products, collaborating with audio companies from across the globe... Read More →
avatar for Julian Villegas

Julian Villegas

University of Aizu, University of Aizu
Japan
Thursday May 28, 2026 3:00pm - 3:30pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

3:00pm CEST

Emergence; Spatial Directionality of Sa Quintina in the Sacred Vocal Tradition of Castelsardo, Sardinia, Italy: An Early-Stage Sonological–Acoustical Study
Thursday May 28, 2026 3:00pm - 3:30pm CEST
Sa quintina is a distinctive emergent vocal phenomenon
almost exclusively associated with the sacred polyphonic
singing tradition of Castelsardo, perceived as an
autonomous “fifth voice” arising during collective
performance by four male singers. Although widely
acknowledged in ethnomusicological literature, its
formation mechanisms remain only partially explored within
audio engineering; acoustical research.
This paper presents an early-stage, descriptive sonological
case study proposing new hypotheses on the formation;
spatial reinforcement of sa quintina. The phenomenon is
interpreted as a physically grounded, measurable outcome of
harmonic fusion; spatial interference, observable
through spectral energy distribution; coherence. It is
hypothesized to emerge from a converging set of
conditions—including non-tempered harmonic textures,
differentiated vocal emission techniques, intentional
formant tuning,; circular spatial configuration—none of
which is assumed to be strictly sufficient in isolation.
Building upon previous spectral coherence analyses, the
study introduces a Quintina Directionality Index (QDI) to
quantify the spatial dimension of the phenomenon. QDI is
defined as the ratio between spectral energy in two
frequency bands associated with sa quintina (600–750 Hz;
1200–1400 Hz); total spectral energy. The index is
evaluated as a function of direction using ambisonic
recordings in an anechoic chamber; as a function of
microphone position in a controlled field setting.
Preliminary observations suggest that sa quintina
corresponds to localized regions of enhanced spectral
coherence; energy reinforcement, supporting its
interpretation as an emergent physical phenomenon that
precedes; enables its perceptual salience, rather than a
purely auditory illusion.
Authors
FB

Felicita Brusoni

PhD candidate Musikhögskolan i Malmö, Lund University
LF

Luca Frigo

Conservatorio G. Nicolini Piacenza
MS

Martino Sarolli

Conservatorio Paganini Genova
RD

Riccardo Dapelo

Conservatorio Nicolini Piacenza
Thursday May 28, 2026 3:00pm - 3:30pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

3:30pm CEST

Center Extraction GAN
Thursday May 28, 2026 3:30pm - 4:00pm CEST
This paper presents a method for extracting a center signal
from two-channel stereo signals for upmixing;
reproduction with additional center loudspeakers.
It uses a generative adversarial network with a generator
trained with multiple reconstruction losses; adversarial
losses obtained from a discriminator.
The processing is of low computationally complexity, causal
; can be configured for latencies down to one audio frame
of 46 ms length.
It is described how training data are created using only
publicly available signals; how the generation of target
data enables to control the attenuation of diffuse signals
; direct signals panned off-center.
An evaluation with listening test; computational metrics
SI-SDR; F2 measure is presented.
It shows an advantage compared to methods based on
classical signal processing in terms of computational
metrics for source separation; listeners preference.
Authors
AW

Andreas Walther

Fraunhofer IIS

avatar for Christian Uhle

Christian Uhle

Chief Scientist, Fraunhofer Institute for Integrated Circuits IIS
Christian Uhle is chief scientist in the Audio and Media Technologies division of the Fraunhofer IIS, Erlangen, Germany, and in the International Audio Laboratories Erlangen.
He received the Dipl.-Ing. and PhD degrees from the Technical University of Ilmenau, Germany, in 1997 and... Read More →
JK

Julian Klapp

Fraunhofer Institute for Integrated Circuits IIS
PP

Pablo Panter

Fraunhofer Institute for Integrated Circuits IIS
Thursday May 28, 2026 3:30pm - 4:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

3:30pm CEST

Audio engineering music for listeners with hearing loss
Thursday May 28, 2026 3:30pm - 4:30pm CEST
Audio engineering often implicitly assumes a uniformity in
hearing across listeners; this is an assumption that does
not reflect real-world diversity. How could technologies
and practices in production, mixing, and reproduction be
adapted to create music that is more inclusive? While the
AES has a conference series on Audio and Music Induced
Hearing Disorders, this has focused on the causes of
hearing loss with little on audio engineering for listeners
who have a hearing loss.

In western countries, about one in three adults are deaf,
have hearing loss or suffer from tinnitus. Hearing loss can
lead to many challenges with music such as: inaudibility of
quieter passages, distortion, degraded pitch perception,
and difficulty in identifying and picking out lyrics and
instruments. The most common intervention for mild to
moderately severe hearing loss is hearing aids. But while
many of these devices have music programs, their efficacy
is mixed, to the point that many opt not to use them. With
the rise of machine learning within Audio Engineering,
there are opportunities to better personalise music, and
therefore address issues listeners face. Consumer devices
are also increasingly having audio accessibility features
added, but the usefulness of these lack independent
testing. This workshop will consider opportunities for
making music more accessible.

The workshop will start by exploring how hearing loss harms
the experience of listening to music and how this varies
between people. This will lead to discussion of why no
technology can fully ‘correct’ music to achieve a ‘perfect’
listening experience for those with hearing loss. There is
no technology to recreate a ‘golden-ears’ experience. This
leads to a key research question: what is the best,
rendition of a piece of music for someone who has hearing
loss? What do listeners want from music, and how can we get
closest to achieving that?

We will bring in findings from research projects and
listening tests to explore what is known, and also to
highlight that there are significant gaps in knowledge that
require further research. We will then explore
state-of-the-art in wearables such as hearing aids and
sound reproduction systems. This will include the current
Cadenza project, which has been running a series of machine
learning challenges to improve music for those with hearing
loss.

Throughout, we will encourage questions and engagement from
delegates. We want to hear about lived experience of
hearing difference and how that has changed professional
practice and personal lives. We are also keen to hear
suggestions from delegates on what approaches might be used
to improve music for those with hearing loss.

We aim to raise awareness of the importance of considering
diverse audiences in Audio Engineering practice. Where
possible, the workshop will provide practical guidance for
audio engineers, highlighting techniques and emerging
technologies that can better support listeners with diverse
hearing profiles.

The Workshop will be organised by the Cadenza Project Team
https://cadenzachallenge.org/ A large UK-funded project
about improving music for those with hearing loss.
Speakers
avatar for Josh Reiss

Josh Reiss

Professor, Queen Mary University of London
Josh Reiss is Professor of Audio Engineering with the Centre for Digital Music at Queen Mary University of London. He has published more than 200 scientific papers (including over 50 in premier journals and 6 best paper awards) and co-authored two books. His research has been featured... Read More →
TC

Trevor Cox

University of Salford
SM

Sara Madsen

GN Store Nord
AS

Adam Steed

Contact Theatre, Manchester
Thursday May 28, 2026 3:30pm - 4:30pm CEST
Aud 41 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

4:30pm CEST

Personalized Timbre Optimization for Stereophonic Sound Reproduction via Earphones: Part 2 – Practical Implementation; Validation on Consumer TWS Devices
Thursday May 28, 2026 4:30pm - 5:00pm CEST
This paper presents Part 2 of our study on personalized
timbre optimization for stereophonic sound reproduction via
earphones, following our previous work presented at the AES
International Conference on Headphone Technology in 2025.
While Part 1 established a novel auditory-model-based
framework for reproducing a listener’s natural timbre
reference; demonstrated its perceptual validity under
controlled conditions, the present study focuses on the
practical implementation; validation of this approach
for real-world use with consumer True Wireless Stereo (TWS)
earphones.

Conventional headphone; earphone personalization
techniques primarily target spatial audio reproduction or
rely on preference-based equalization, often overlooking
the accurate reproduction of natural timbre in stereophonic
content. Our approach explicitly addresses this limitation
by isolating; optimizing perceptually relevant timbral
cues while excluding spatial encoding components, thereby
improving timbral fidelity without degrading stereo imaging.

The proposed method originally consists of four stages:
high-resolution anatomical scanning of the listener’s upper
body, including the pinnae, individualized HRTF computation
using the boundary element method, selective removal of
spatial encoding components to derive a personalized
reference target response curve (PR-TRC),; perceptual
optimization using a listener-specific weighting
coefficient grounded in auditory reference fidelity rather
than preference. In this paper, each stage is simplified
; automated using smartphone-based scanning;
AI-assisted processing, enabling end users to complete the
entire personalization process via a smartphone connected
to a cloud-based server. The resulting personalized target
response curve is implemented within the computational;
memory constraints of the DSP pipeline of commercial
consumer TWS earphones.

A subjective evaluation using the Semantic Differential
Method was conducted to assess the perceptual impact of the
simplified implementation. Twenty-four listeners evaluated
personalized target curves generated by both the original
; simplified methods, as well as two non-personalized
target curves commonly used in commercial TWS earphones.
The results show that both personalized methods
consistently outperform non-personalized conditions in
overall sound quality; listener preference. Importantly,
no statistically significant degradation in perceived
timbral naturalness was observed between the simplified;
original methods.

These findings demonstrate that auditory-model-based
personalized timbre optimization can be effectively
translated into a practical, consumer-ready technology. The
proposed approach represents a foundational contribution to
future audio personalization; has broad applicability
across headphone; earphone systems for stereophonic
sound reproduction.
Authors
AH

Atsushi Hara

final Inc.
HH

Haruto Hirai

final Inc.
avatar for Kimio Hamasaki

Kimio Hamasaki

President, Artsridge LLC
Kimio Hamasaki, an AES Fellow, is a producer and balance engineer for music recordings, a researcher in spatial audio, an educator in audio engineering and acoustics, and a consultant in audio engineering. He has recorded and produced numerous orchestral and operatic works with the Vienna Philharmonic... Read More →
MH

Mitsuru Hosoo

final Inc.
NT

Nao Tojo

final Inc.
SS

Shun Saito

final Inc./post-doc

Thursday May 28, 2026 4:30pm - 5:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

4:30pm CEST

A Parametric Dual-Channel Audio Coding via Learned Time-Frequency Masking
Thursday May 28, 2026 4:30pm - 5:00pm CEST
While Neural Audio Codecs (NAC) have revolutionized
monaural audio compression, achieving high-fidelity
dual-channel coding at low bitrates remains a significant
challenge. Existing approaches often rely on naive
independent channel quantization, leading to phase
incoherence, or entangled latent modeling, which sacrifices
spatial precision for spectral energy. This paper proposes
a novel dual-channel coding framework based on
contentspatial disentanglement. Reframing spatial
reconstruction as an informed source separation task, our
architecture synergizes a frozen, pre-trained DAC encoder
for robust mono content preservation with a
parameter-efficient side information encoder that predicts
fine-grained time-frequency masks. To ensure precise
spatial imaging, we introduce explicit physical constraints
into the end-to-end training. Experimental results indicate
that at low bitrates of 9; 11 kbps, the proposed method
outperforms state-of-the-art dual-mono neural baselines;
industry standards in both objective spatial metrics;
subjective MUSHRA evaluations.
Authors
QH

Qingbo Huang

MMLab,ByteDance
TQ

Tianshu Qu

Peking University
YW

Yihan Wang

Peking University
YQ

Yufan Qian

Peking University
Thursday May 28, 2026 4:30pm - 5:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
 
Friday, May 29
 

9:00am CEST

A method to synchronize dynamic media stream on heterogenous media playback devices
Friday May 29, 2026 9:00am - 9:30am CEST
Audio synchronization across heterogeneous media playback
devices is essential for delivering immersive sound
experiences in applications such as speaker group play;
multi-room audio playback. Existing synchronization
techniques predominantly rely on tightly coupled network
infrastructures; often embed a media sequence;
timestamp information to the media packet at the
transmitting source end, which restrict flexibility of
selecting the transmitting source; also compromises
robustness under dynamic network conditions. This paper
proposes a network; source independent audio
synchronization framework that eliminates dependency on
embedding media sequence; timestamps. The proposed
system employs an audio fingerprinting-based media
sequencing algorithm amongst the media playback devices
without relying on the type of transmitting source; the
network availability. A novel audio synchronization
algorithm is proposed which first determines a common
sequence start information given a dynamic media stream
from the transmitting source; then communicates the
fingerprint; timestamp amongst the media playback
devices without modifying the original audio packet
structure. Experimental results demonstrate that the
proposed approach achieves a high audio-audio
synchronization of less than 10ms across media playback
devices in a no network environment, thereby extending the
scope of immersive audio application irrespective of the
transmitting source.
Authors
AS

Avinash Singh

Samsung Research Institute, Delhi (SRID)
MS

Mohit Singh

Samsung Research Institute, Delhi (SRID)
avatar for Natasha Meena

Natasha Meena

Samsung Research Institute, Delhi (SRID)
I am working as Software developer in Samsung Research Institute India - Delhi and am responsible for development of features related to Samsung sound device’s
Friday May 29, 2026 9:00am - 9:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Exploring 2D Ambisonics by Amplitudes; Phases
Friday May 29, 2026 9:00am - 9:30am CEST
We present a spectral-like reformulation of 2D ambisonics,
enabling an alternative representation of the sound field
in terms of amplitudes; phases. We hypothesise that it
simplifies the representation; creative manipulation of
2D ambisonics, beyond encoded directional point sources.

In 2D high-order ambisonics (HOA) of order N, a sound field
can be represented as a 2π-periodic angular function as a
combination of circular harmonics (Y_m) weighted by the
coefficients (a_m) with m ∈ [-N, N]. This representation
can be reformulated in terms of N+1 amplitudes; N
phases, similarly to a Fourier decomposition.

A simple example of this representation is the ambisonic
encoder at an angle theta. Phases are then multiples of a
phase phi = theta/2π, as frequencies are multiples of a
fundamental in harmonic sounds. Therefore, the
amplitude-phase approach can draw on the field of sound
synthesis, between harmonic; inharmonic modelling.
Operations on ambisonic vectors in amplitude-phase also
rely on Fourier representation, namely the spectral
convolution of two vectors (element-wise products of the
amplitudes, element-wise sums of the phases). Spectral
convolution has vast potential in ambisonics, allowing to
represent all the usual spatial operations (geometric;
transformative) in a simple manner.

To test this approach, we are currently developing an
ambisonic synthesiser based on Faust functions running in
Max environment. We are evaluating the scope of this
representation, both theoretical; compositional,;
then attempt to expand this approach to 3D ambisonics.
Authors
avatar for Alain Bonardi

Alain Bonardi

Professor in Computer Science and Music Creation, University of Paris 8
Alain Bonardi is a Professor of Computer Science and Music Creation at Paris 8 University, where he is based in the Music Department and is a member of the Musidanse laboratory.
There, he co-directs the CICM (Center for Research in Computer Science and Music Creation) with Anne... Read More →
A

AxelChemla-Romeu-Santos

University of Paris 8
EF

Emma Frid

University of Paris 8
PG

Paul Goutmann

University of Paris 8
Friday May 29, 2026 9:00am - 9:30am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Voice-Based Fatigue Detection for Military Personnel: A Multi-Modal Machine Learning Framework with Acoustic Feature Emphasis
Friday May 29, 2026 9:00am - 11:00am CEST
This study presents a voice-centered machine learning
framework for detecting mental fatigue in military
personnel, integrating acoustic analysis with physiological
biosensors to enhance detection robustness. Mental fatigue
poses critical safety; performance challenges in
military operations, yet cultural stigma often prevents
self-reporting. We collected multi-modal data from 23
participants across two fatigue states, extracting
comprehensive acoustic features including sound pressure
level (SPL), formants, mel-frequency cepstral coefficients
(MFCCs), jitter, shimmer, harmonic-to-noise ratio (HNR),
; temporal speech characteristics. These voice features
were combined with electroencephalography (EEG),
photoplethysmography (PPG),; temperature data to train
multiple machine learning classifiers. The voice-based
models achieved accuracies between 82-85\%, with support
vector machines (SVM); long short-term memory (LSTM)
networks demonstrating superior performance. When acoustic
features were combined with physiological markers,
classification accuracy improved to 92\%, with
Classification; Regression Trees (CART); Linear
Discriminant Analysis (LDA) emerging as top performers.
Statistical analysis identified SPL; formant variance as
the most discriminative voice features, while Lempel-Ziv
Complexity (LZC); theta/beta ratio proved most reliable
for EEG. Evaluation on new participants yielded 67\%
accuracy, revealing model generalization challenges that
inform future research directions. This work demonstrates
that voice-based machine learning systems, when augmented
with physiological data, offer a promising non-invasive
approach to real-time fatigue monitoring in operational
military environments.
Authors
CC

Claire Courchene

Applied Perception Associate Engineer, GN
I’m a creative technologist and interaction designer exploring how sound, technology, and human experience meet. With an MScEng in Sound & Music Computing, I prototype audio interactions, build ML‑driven tools, and design experiments around perception. My background spans music... Read More →
Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Exploring Perceptual; Physiological Auditory Models for Assessing Speech Intelligibility in Enhanced Signals
Friday May 29, 2026 9:00am - 11:00am CEST
Current deep learning approaches to speech enhancement rely
heavily on objective measures like mean squared error or
scale-invariant signal-to-distortion ratio as both training
objectives; evaluation metrics. While analytically
convenient, these benchmarks often fail to capture the
nuances of human perception or actual intelligibility.
Furthermore, the inconsistent integration of metrics like
Short-Term Objective Intelligibility or Perceptual
Evaluation of Speech Quality into training; evaluation
pipelines leaves a gap between algorithmic performance;
perceptual reality. This paper proposes a transition
towards evaluation methodologies grounded in
psychoacoustics; audiological modeling. Our study
explores two distinct methods to characterise enhanced
signals. On one hand, we employ a perceptual approach based
on the Cambridge loudness model to assess the preservation
of spectral excitation patterns; perceived intensity. On
the other hand, we adopt a biophysical approach by
utilising CoNNear, a convolutional model of the human
auditory periphery. This allows us to simulate
representations of responses at different stages of the
auditory periphery to observe how speech enhancement
processing affects the physiological representation of
speech. We analyse pre-trained speech enhancement models
using automatic speech recognition; Short-Term Objective
Intelligibility as an additional proxy for human
intelligibility. By mapping automatic speech recognition
performance against loudness; peripheral response
patterns, we investigate the extent to which current
enhancement strategies maintain the perceptual;
physiological integrity of the speech signal. This work
aims to identify features predictive of intelligibility,
providing a foundation for speech enhancement systems
optimised for the human listener rather than purely
signal-based objective functions.
Authors
FE

François Effa

Université de Lorraine, CNRS, Inria, Loria, Nancy, France
RS

Romain Serizel

LORIA - Laboratoire Lorrain de Recherche en Informatique etnses Applications
Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Objective Quality Models for Decision-Making in Speech Coding
Friday May 29, 2026 9:00am - 11:00am CEST
Objective quality evaluation is widely used in speech
coding, yet objective estimates often show limited
agreement with subjective listening-test results. Rather
than focusing on absolute score accuracy, this paper
evaluates objective speech quality models from a
decision-making perspective, defined as their ability to
support comparative judgments between speech codecs or
codec configurations. A formal ITU-R P.800 Absolute
Category Rating (ACR) listening test was conducted with 30
listeners across 24 conditions, covering conventional;
neural monophonic speech codecs operating under
clear-channel conditions at sampling frequencies from 16 to
48 kHz; bit rates ranging from below 1 kbps to above 16
kbps. The speech material consisted of internally recorded,
clean French-language speech that was not used in the
development or training of any of the evaluated codecs or
objective quality models. Seven objective quality models,
namely PESQ, VISQOL Speech, VISQOL Audio, WARP-Q, NISQA,
UTMOS,; DistillMOS, were evaluated on the same material.
Decision-making performance was assessed by comparing
subjective; objective rankings using Kendall’s rank
correlation coefficient; by analyzing pairwise codec
comparisons using t-tests at a 95% confidence level. The
results show that some objective quality models are
effective for comparing bit rate variations within a given
speech coding technology, provided that all other codec
parameters remain unchanged (e.g., sampling frequency).
However, all models exhibit limitations, including
tendencies toward over- or underestimation for certain
technologies, as well as reduced reliability when applied
across different sampling frequencies. Despite its
conventional origins, PESQ remains capable of supporting
decision-making even when applied to neural speech codecs.
Authors
CL

Clémence Lamballe

Universite de Sherbrooke
PG

Philippe Gournay

Universite de Sherbrooke
RL

Roch Lefebvre

Universite de Sherbrooke
Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

The Ambisonic Denoising Paradox: U-Net Processing Degrades ASR Transcription Quality for Medical Speech
Friday May 29, 2026 9:00am - 11:00am CEST
Spatial audio recording using higher-order Ambisonics
offers rich directional information for medical speech
capture, yet challenging hospital acoustic environments
motivate preprocessing with neural denoising algorithms.
This study investigates whether U-Net-based denoising of
third-order ambisonic recordings improves automatic speech
recognition (ASR) quality for medical applications. We
developed the Medical Immersive Audio Corpus (MIAC),
comprising 1,759 utterances (6.43 hours) of Polish medical
speech recorded with a Zylia ZM-1 microphone in
uncontrolled hospital environments, capturing 16-channel
third-order Ambisonics across multiple specializations
including thyroid ultrasonography, surgical procedures,;
general diagnostics. We applied a U-Net architecture with
dual attention mechanisms trained using the Noise2Noise
paradigm to denoise the corpus, then evaluated
transcription quality using ten Whisper ASR models ranging
from 39 million to 1.55 billion parameters, including
domain-adapted medical variants. Surprisingly, we
discovered a "noise reduction paradox" where denoising
degraded transcription quality for seven of ten models,
with statistically significant increases in Word Error Rate
(WER); Character Error Rate (CER) for general-purpose
base, small,; medium models. Only the domain-adapted
whisper-medium-68000-abbr model showed statistically
significant improvement (p=0.0008), while large-scale
models (large-v2, large-v3) exhibited robustness with
negligible changes. Effect sizes remained small (Cohen's d
< 0.2) across all models. These counterintuitive findings
suggest modern ASR systems implicitly utilize background
noise characteristics as informative features,; that
preprocessing pipelines should be reconsidered for
domain-specific applications. Our results provide practical
guidance for medical speech processing system design.
Authors
avatar for Bartlomiej Mroz

Bartlomiej Mroz

Assistant Professor, Gdańsk University of Technology
PhD, Spatial Audio & Immersive Media Researcher, Recording Engineer, Statistics enthusiast
SZ

Szymon Zaporowski

Gdańsk University of Technology
Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:00am CEST

Detecting Bandwidth Variation Artifacts in Perceptual Audio Coding
Friday May 29, 2026 10:00am - 10:30am CEST
Accurate identification of audio coding artifacts is
instrumental in encoder design, audio post-processing,;
perceptual quality assessment. This paper addresses the
detection of artifacts arising from changes in the
effective bandwidth of coded audio signals caused by coarse
spectral quantization. Such bandwidth variations give rise
to two prominent artifact types: bandwidth limitation (BL)
; birdies, also referred to as spectral islands (SI).
Blind detection methods, requiring no reference signal, are
presented for both artifact types. Bandwidth limitation
is detected by analyzing variations in the zero-crossing
count across time-domain subband signals, enabling
estimation of both fixed; time-varying cutoff
frequencies. Spectral islands are identified through
analysis of the spectrogram by detecting clusters of
isolated components in the time–frequency domain,
characterized by their temporal; spectral extents. The
proposed methods are evaluated using audio material from
the ODAQ; USAC verification datasets. Results show that
the BL detection method achieves an average bandwidth
estimation error of approximately 160 Hz; demonstrates
robustness to noisy bandwidth-limited signals. In addition,
the detected birdie artifacts are perceptually validated
through listening tests, indicating an improvement in
perceived quality following detection; subsequent
suppression of the birdie artifacts.
Authors
AN

Andreas Niedermeier

Fraunhofer IIS, Erlangen

BE

Bernd Edler

International Audio Laboratories Erlangen, Germany
DD

Dipanjan Datta Roy

International Audio Labs, Erlangen
avatar for Sascha Dick

Sascha Dick

Fraunhofer IIS, Fraunhofer IIS, Erlangen
Germany
Friday May 29, 2026 10:00am - 10:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Audio Processing, Lecture

10:30am CEST

Spatial Estimation of Room Acoustic Parameters using Sound Field Reconstruction Methods
Friday May 29, 2026 10:30am - 11:00am CEST
The acoustic characterisation of indoor spaces is crucial
for a wide range of applications. While global metrics
provide convenient descriptors of a room's overall
behaviour, a more spatially detailed analysis offers deeper
insight into the spatio-temporal structure of the sound
field, albeit at a higher experimental cost. This paper
proposes a methodology that leverages the predictive
capabilities of sound field reconstruction methods to
estimate room acoustic parameters as a function of
position. The approach is experimentally evaluated in an
auditorium, where it achieves accurate estimation of
temporal; energetic room acoustic parameters across the
entire audience area. In addition, the reconstructed field
yields higher intelligibility indices compared to the raw
measurements. Overall, these results highlight the
potential of sound field reconstruction techniques as a
practical tool for room acoustic characterisation; for
supporting assistive listening technologies.
Authors
avatar for Antonio Figueroa-Duran

Antonio Figueroa-Duran

Universidad Politécnica de Madrid
EF

Efren Fernandez-Grande

Universidad Politécnica de Madrid
Friday May 29, 2026 10:30am - 11:00am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:30am CEST

Lossless Audio Coding revisited
Friday May 29, 2026 10:30am - 11:00am CEST
MPEG-4 SLS (scalable lossless coding) was published more
than 20 years ago. In the meantime several tools to improve
coding efficiency; flexibilities have been invented.
Currently, in MPEG WG6 (audio coding) there are two
standardization activities on lossless audio coding: Audio
Coding for Machines (ACoM); Biomedical; general
waveform signal coding (BWC).
ACoM phase 1 originally was targeted only towards lossless
storage formats for training of machine listening schemes,
but additional uses cases like “user generated content
analysis”, “live stream content analysis”,; “artistic
creation” have been added. The focus was extended to the
transmission of audio data from microphone (arrays) to
central processing units.
BWC is a joint activity with TU-R SG21. While ACoM started
with a large number of use cases; includes the
specification of a rich set of metadata BWC started with a
focus on medical data like electroencephalogram (EEG);
electrocardiogram (ECG). However, BWC can be used for audio
signals, too; medical data coding are on the list of use
cases for ACoM.
The call for proposals (CfP) for ACoM was completed in
January 2025. Two proposals, both outperforming MPEG-4 SLS,
had been submitted. Both proposals reused; optimized
core codecs from BWC. Currently, MPEG audio investigates
how the ACoM proposals can be merged into BWC. This merge
process must be completed end of April 2026.
The presentation will give details about ACoM use cases,
the ACoM CfP process, the results of the CfP; results
from the merge process.
Authors
avatar for Thomas Sporer

Thomas Sporer

Deputy Director IDMT / Convenor MPEG audio, Fraunhofer IDMT
Friday May 29, 2026 10:30am - 11:00am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

11:00am CEST

Obsidian Neural: Open-Source VST3 for Real-Time Generative AI – Architecting the AI as a Live Performance Instrument
Friday May 29, 2026 11:00am - 12:00pm CEST
Obsidian Neural is a novel, open-source VST3 plugin that
addresses the technical challenges of integrating
generative AI models directly into a low-latency digital
audio workstation (DAW) environment. This workshop will
provide a deep dive into the architecture designed to use
AI as a real-time performance instrument. We will cover the
C++/DSP strategies necessary for minimizing latency during
the asynchronous generation of audio loops via models like
Stable Audio Open. Crucially, we will detail the system's
ability to maintain musical coherence during a live mix,
achieved through an internal LLM "Brain" that processes
contextual session data (BPM, key, existing tracks) to
enrich generation prompts. Furthermore, we will explore the
technical solutions implemented for seamless integration
with the live mixing paradigm: quantized MIDI triggering,
multi-output routing, and the novel "Draw-to-Sound"
feature, which employs a Vision Language Model (VLM) to
translate visual input into musical parameters. This work
demonstrates a robust framework for generative AI to
function as an instantaneous, adaptable partner within
professional audio engineering workflows.
Speakers
AC

Anthony Charretier

Independent Developer
Friday May 29, 2026 11:00am - 12:00pm CEST
Building 302, 2nd floor Technical University of Denmark Asmussens Alle, Building 302 DK-2800 Kgs. Lyngby Denmark

12:00pm CEST

Saul Walker Student Design Competition
Friday May 29, 2026 12:00pm - 1:30pm CEST
The Saul Walker Student Design Competition is a long-running event of the Audio Engineering Society that highlights practical and creative work in audio design. It brings together experienced judges and a wide range of strong student submissions each year.

During this session, students from around the world will present their projects and bring their hardware designs for hands-on inspection by the judges. The format encourages open discussion, giving attendees a chance to hear how ideas are evaluated and improved in a professional setting.

Sponsored by API, the competition includes cash prizes for the winners. More importantly, it offers students valuable feedback and the opportunity to connect with people working in the industry. The session is open to everyone—students and non-students alike—who are interested in seeing what participants have created and learning more about current work in audio design.
Speakers
avatar for Jamie Angus-Whiteoak

Jamie Angus-Whiteoak

Emeritus Professor/Consultant/VP-Northern Europe, AES
Jamie Angus-Whiteoak Is Emeritus Professor of Audio Technology at Salford University and VP for Northern Europe.

Her interest in audio was crystallized aged 11 when she visited the WOR studios, NYC, in 1967 on a school trip. After this she was hooked, and spent much of her free ti... Read More →
avatar for Christoph Thompson

Christoph Thompson

Director of Music Media Production, AES Education Committee, Ball State University
Christoph Thompson is vice-chair of the AES audio education committee. He is the chair of the AES Student Design Competition and the Matlab Plugin Design Competition. He is the director of the music media production program at Ball State University. His research topics include audio... Read More →
EL

Ewa Łukasik

Poznan University of Technology, Institute of ComputingnScience
Authors
avatar for Sascha Disch

Sascha Disch

Fraunhofer IIS, Fraunhofer IIS
Sascha Disch received his Dipl.-Ing. degree in electrical engineering from the Technical University Hamburg-Harburg (TUHH) in 1999 and joined the Fraunhofer Institute for Integrated Circuits (IIS) the same year. Ever since he has been working in research and development of perceptual... Read More →
Friday May 29, 2026 12:00pm - 1:30pm CEST
Aud 49 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

12:30pm CEST

Spatial Quality Measure for Mixed-phase Impulse Response Equalization
Friday May 29, 2026 12:30pm - 1:00pm CEST
Mixed-phase impulse response equalization can improve
magnitude; phase response, but conventional objectives
such as mean-squared error (MSE) can favor solutions that
introduce objectionable temporal artifacts, including
pre-echo; extended post-echo ringing. This paper
proposes a Spatial Equalization Quality Measure (SEQM) to
select a mixed-phase equalization filter that better
controls these artifacts while remaining computationally
simple; applicable across multiple listening positions.
SEQM combines (i) a temporal-domain metric that penalizes
energy preceding the main pulse of an impulse response;
energy persisting after it, while also accounting for the
decay rate of the post-response tail, with (ii) a spatial
aggregation rule that summarizes quality across measurement
positions. We use SEQM to select the modeling delay for
mixed-phase finite-impulse-response (FIR) equalization;
to compare mixed-phase FIR designs with minimum-phase FIR
; IIR alternatives under a common multi-position
measurement framework. Experiments using semi-anechoic
measurements across 34 spatial positions for two
loudspeakers show that SEQM consistently selects
substantially shorter delays than MSE-based selection;
yields impulse responses with reduced pre-echo; faster
post-response decay, while maintaining comparable
frequency-response equalization. These results suggest that
SEQM is a practical objective tool for designing
multi-position mixed-phase equalization filters.
Authors
BD

Bill Decanio

Samsung Electronics
avatar for Sunil Bharitkar

Sunil Bharitkar

Samsung Research America

Friday May 29, 2026 12:30pm - 1:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Audio Processing, Lecture

12:30pm CEST

Innovative Measurement of Speech Intelligibility – Applications of Listening Effort in Research & Practice
Friday May 29, 2026 12:30pm - 2:00pm CEST
Speech intelligibility is a key factor in successful
communication across various domains, including research,
post-production for film and television, live sound
reinforcement, and audio production. Traditional assessment
methods often lack objectivity or fail to capture the
listener’s experience in real-world scenarios. In this
workshop, we introduce an innovative approach to measuring
speech intelligibility based on the concept of “Listening
Effort.” We will present the underlying technology, share
practical examples from different application areas, and
demonstrate how this method can be integrated into
workflows to optimize intelligibility. Attendees will have
the opportunity to participate in a hands-on demonstration
and discuss potential use cases relevant to their own work.
This session is designed for professionals and researchers
seeking reliable and actionable tools for evaluating and
improving speech intelligibility in diverse environments.
In this workshop, we present a new technology for measuring
speech intelligibility (“Listening Effort”). The method is
used in research, post-production (film/TV), live sound,
and audio production. The session is aimed at professionals
from both academia and industry who are interested in
objectively assessing and optimizing speech intelligibility.

Participants will be able to join a short demo/exercise and
ask questions.

Introduction & Relevance: Overview of the importance of
speech intelligibility across different fields
Technology & Methodology: Presentation of the measurement
method and underlying concepts
Practical Examples: Case studies from research,
post-production (film/TV), live sound, and production
Live Demo / Interactive Exercise: Practical demonstration
and opportunity for active participation
Discussion & Outlook: Q&A, exchange of ideas, and future
perspectives
Speakers
HB

Hannah Baumgartner

Fraunhofer IDMT
JR

Jan Rennies-Hochmuth

Fraunhofer IDMT
Friday May 29, 2026 12:30pm - 2:00pm CEST
Aud 41 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

Systematization of Multiplier-less Convolution for 1-bit Audio Signal
Friday May 29, 2026 1:00pm - 1:30pm CEST
High-speed 1-bit signals generated by oversampling are
widely used in audio applications as they allow simple
demodulation via low-pass filtering while preserving
in-band spectral characteristics with high accuracy.
However, conventional FIR filtering of such signals
generally requires conversion to a multi-bit representation
at a common sampling frequency, which increases
computational cost; complicates the overall processing
flow. This paper addresses the convolution of high-speed
1-bit audio signals with multi-bit FIR impulse responses
; presents a systematic formulation of a multiplier-less
convolution approach. Based on a mathematical
reinterpretation of convolution, the proposed formulation
describes how time shifting; amplitude weighting can be
expressed through structured rearranging of 1-bit samples
without arithmetic operations. This provides a theoretical
description of previously reported 1-bit convolution
methods; however, its validity has not been fully
formalized. We examine the spectral characteristics of the
proposed convolution method; compare them with those
obtained by multi-bit convolution followed by ΔΣ
modulation. Experiments are conducted by convolving 1-bit
input signals with FIR filters having multi-band frequency
responses. Spectral analysis shows that the proposed method
achieves extremely high agreement with the standard
approach within the audible band while the differences
appear primarily at much higher frequencies outside the
audible range. These results demonstrate that convolution
of high-speed 1-bit audio signals can be achieved without
multipliers, suggesting the potential for highly efficient
hardware-oriented signal processing architectures.
Authors
IS

Iori Sakurai

Waseda University
TS

Tomohiro Sakaguchi

Doctoral student, Waseda University
YO

Yasuhiro Oikawa

Waseda University

YG

Yuta Gomi

Waseda University
Friday May 29, 2026 1:00pm - 1:30pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Audio Processing, Lecture

1:00pm CEST

Geometry Sensitivity in Low-Count Virtual Microphone Arrays: From Tetrahedral Baselines to Stochastic Spherical Layouts
Friday May 29, 2026 1:00pm - 3:00pm CEST
Virtual Microphone Array techniques are being investigated
by the authors to support room acoustics optimisation in
live sound environments. In our recent AES paper, “Room
Acoustics Optimisation Using Virtual Microphone Arrays”, a
notable outcome was that a compact four-microphone
tetrahedral array performed strongly relative to its low
sensor count. Recent virtual sensing; Remote Microphone
Technique research treats microphone placement as an
explicit design variable. It reports improved remote
estimation performance when microphone layouts are
deliberately chosen for the task, rather than adopted as
fixed, standard configurations.
This submission builds on our prior VMA work by focusing on
the four-microphone case, where geometry choices are
especially constrained. We compare a tetrahedral baseline
with an ensemble of stochastically generated spherical
layouts at the same array aperture using Monte Carlo
simulation. We apply a consistent evaluation protocol
across multiple listening-region offsets; standard
beamforming estimators to isolate variability due to
geometry alone. The central proposition is that, for
low-count VMAs, geometry is a first-order design parameter.
Tetrahedral remains a credible baseline, but lightweight
stochastic exploration can reveal alternative layouts that
are competitive;, in some cases, superior without
increasing channel count.
Authors
avatar for Brian de Brit

Brian de Brit

Lecturer, Technological University Dublin
Brian de Brit is a lecturer in the School of Electrical and Electronic Engineering at Technological University Dublin. He holds a B.Sc. in Mathematical Physics (University College Dublin), an M.Phil. in Music and Media Technologies (Trinity College Dublin), and a Master of Engineering... Read More →
DD

David Dorran

Technological University Dublin
Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

Clustered Virtual Microphone Arrays for Listener-Level Monitoring; Room-Correction in Live Sound
Friday May 29, 2026 1:00pm - 3:00pm CEST
This paper introduces clustered virtual microphone arrays
as a step toward improving listener-level virtual
microphone estimation for live sound. Multiple compact
microphone sub-arrays are placed around a nominal overhead
position. Each sub-array produces a virtual microphone
estimate,; the estimates are fused. The aim is to attack
the estimation problem from multiple viewpoints; reduce
sensitivity to any one array placement or geometry.
The work builds on our earlier paper, “Room Acoustics
Optimisation Using Virtual Microphone Arrays”. That paper
proposed virtual microphones estimated from an overhead
array as a measurement layer for live sound optimisation.
It also highlighted a key limitation: in its initial form,
virtual microphone estimation quality was not yet strong
enough for reliable use across positions. The present paper
targets that limitation. We outline the clustered array
idea; treat cluster count; inter-cluster spacing as
design parameters. Virtual microphones are estimated using
beamforming; combined using simple fusion. Performance
is assessed with objective signal measures, including SNR
; frequency-; phase-related error measures, across
multiple listener-level target positions. The results
support further refinement under more realistic room
conditions; further study of the link between improved
estimation quality; FIR-based correction outcomes.
Authors
avatar for Brian de Brit

Brian de Brit

Lecturer, Technological University Dublin
Brian de Brit is a lecturer in the School of Electrical and Electronic Engineering at Technological University Dublin. He holds a B.Sc. in Mathematical Physics (University College Dublin), an M.Phil. in Music and Media Technologies (Trinity College Dublin), and a Master of Engineering... Read More →
DD

David Dorran

Technological University Dublin
Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

A Time–Frequency Integrated Framework for Frequency-Invariant Beamforming in Loudspeaker Arrays
Friday May 29, 2026 1:00pm - 3:00pm CEST
Loudspeaker array beamforming technology has been widely
used; however, current frequency-domain; time-domain
design methods for calculating FIR filters face challenges,
including the need for modeling delay; high
computational complexity. To address these issues, this
paper proposes a time–frequency integrated framework. This
framework supports both pressure matching; amplitude
matching methods, enabling not only the realization of
traditional superdirective beams but also the design of
frequency-invariant beams. For the nonlinear optimization
problem in amplitude matching, an efficient solving
algorithm based on the Alternating Direction Method of
Multipliers (ADMM) is introduced. Experimental results
demonstrate that the proposed method combines the
advantages of existing frequency-domain; time-domain
approaches, directly computing FIR filter coefficients
without delay modeling while maintaining high computational
efficiency. This provides an effective solution for beam
control in loudspeaker arrays.
Authors
JY

Jianbin Yang

Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark
KP

Keyu Pan

Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark
NC

Ning Cong

Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark
XT

Xing Tian

Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark, Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark
Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

The Impact of Frequency Gradient on Nonlinear Pulse Distribution in the Farina Technique
Friday May 29, 2026 1:00pm - 3:00pm CEST
The Exponential Sine Sweep (ESS) technique, popularized by
Angelo Farina, has become a cornerstone of modern
electroacoustic measurement due to its unique capability to
simultaneously extract a system’s linear impulse response
; its individual harmonic distortion components. Standard
implementation of this method almost exclusively utilizes a
low-to-high (upward) exponential sine sweep. However,
during a technical Q&A session at the AES Europe 2025
Convention in Warsaw, a question was raised: what are the
practical consequences of reversing the sweep direction?
This inquiry is particularly relevant given that several
industry-standard measurement platforms often employ
high-to-low (downward) sweeps to optimize the mechanical
; thermal stability of the device under test (DUT) while
performing stepped or swept sinusoidal analysis.
This paper provides an investigation into the temporal
behavior of nonlinearities when the frequency gradient of
an exponential sweep is inverted. Through formal
mathematical derivation; numerical simulations the study
proves that while the spacing between distortion orders
remains identical in magnitude, the polarity; time
distribution of these impulses is reversed. Specifically,
we demonstrate that in a downward sweep, the distortion
products shift from the "pre-causal" negative time region
to the "post-causal" positive time region. This shift
causes harmonic distortion pulses to emerge within the
reverberant tail of the impulse response, leading to
significant contamination of decay measurements;
energy-time curves. By contrasting the "tracking filter"
paradigm with "time-domain deconvolution," this work
clarifies why sweep direction is a critical parameter that
must be aligned with the specific goals of the measurement
protocol.
Authors
avatar for Daniele Ponteggia

Daniele Ponteggia

Materiacustica Srl
Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

An Extended Multichannel Frequency-Domain FxLMS Algorithm for Real-Time Full-Band Adaptive Transaural Reproduction
Friday May 29, 2026 1:30pm - 2:00pm CEST
This paper presents a multichannel adaptive filtering
algorithm for real-time full-band adaptive transaural
reproduction on general-purpose hardware. It is based on a
multichannel frequency-domain FxLMS algorithm using an
overlap-save framework for both filtering; adaptation,
; is extended with (i) online plant identification for
fully adaptive operation, (ii) frequency-dependent
normalization for faster convergence,; (iii)
frequency-dependent regularization to stabilize adaptation.
The proposed algorithm is implemented in C language on a
standard desktop PC; evaluated on a 4x2 transaural
configuration running in real time at 48 kHz with 2048-tap
control filters. Two evaluation tests are conducted. The
first test consists of reproducing two uncorrelated
white-noise signals at the ears of a manikin using
crosstalk cancellation as the performance metric. An
average crosstalk cancellation of 32 dB over 100 Hz–20 kHz
is demonstrated. The second experiment considers binaural
signal reproduction as a more realistic use case of the
algorithm. In both cases, performance is assessed for both
a static listener; a moving listener scenario,
demonstrating the algorithm’s ability to rapidly re-adapt.
Friday May 29, 2026 1:30pm - 2:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:00pm CEST

Real-Time Implementation of Personal Sound Zones Using Partitioned Convolution in Purr Data
Friday May 29, 2026 2:00pm - 2:30pm CEST
Personal sound zones aim to reproduce distinct audio
contents in separate spatial regions using loudspeaker
arrays, while minimizing acoustic interference between
zones. Although well established theoretically, their
real-time implementation remains challenging due to the
long impulse responses involved; the latency constraints
of audio processing systems.
This work presents a real-time implementation of personal
sound zones based on the pressure matching method in a
static context, i.e. transfer functions between the
loudspeakers; the zones are assumed to remain constant.
Sound zone filters are computed in the frequency domain
from experimentally measured impulse responses between an
array of 18 loudspeakers; two microphone arrays of 9
microphones defining a bright zone; a dark zone. The
system performance is then evaluated in terms of acoustic
contrast, reproduction error,; effective frequency
range. To meet real-time constraints, a fast partitioned
convolution algorithm has been used, namely the
Uniformly-Partitioned Overlap Save (UPOLS). This methods
has been implemented in C++ as an external block for the
Purr Data real-time audio environment. Experimental
results, obtained in a semi-anechoic environment,
demonstrate that it enables stable real-time multichannel
convolution with negligible numerical error compared to
offline convolution. The proposed system results in a
functional real-time sound zones demonstrator, suitable for
experimental; interactive spatial audio applications.
The codes are shared in a GitHub repository so that the
scientific community can benefit from them.
Authors
GP

Guilhem Pagès

Laboratoire d'Acoustique de l'Université du Mans (LAUM),nUMR 6613
JB

Jean Beuchet

Laboratoire d'Acoustique de l'Université du Mans (LAUM),nUMR 6613
avatar for Manuel Melon

Manuel Melon

Professor, LAUM / LE MANS Université


TL

Titouan Lefrancois

Laboratoire d'Acoustique de l'Université du Mans (LAUM),nUMR 6613
Friday May 29, 2026 2:00pm - 2:30pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:00pm CEST

Audio Design Roundtable
Friday May 29, 2026 2:00pm - 3:00pm CEST
Join us for a panel discussion about audio design featuring some of the industry’s leading audio designers and educators. This session is meant to inspire upcoming designers and encourage dialogue with established audio designers.
 
The panelists will give a brief overview of their designs, their roles in the AES, and how and why educators and students should participate in the various design competitions that the AES has to offer. The panel discussion is followed by a Q&A session that allows for questions and exchange with the panelists.

Speakers
avatar for Jamie Angus-Whiteoak

Jamie Angus-Whiteoak

Emeritus Professor/Consultant/VP-Northern Europe, AES
Jamie Angus-Whiteoak Is Emeritus Professor of Audio Technology at Salford University and VP for Northern Europe.

Her interest in audio was crystallized aged 11 when she visited the WOR studios, NYC, in 1967 on a school trip. After this she was hooked, and spent much of her free ti... Read More →
avatar for George Massenburg

George Massenburg

Associate Professor of Sound Recording, Massenburg Design Works
George Y. Massenburg is a Grammy award-winning recording engineer and inventor. Working principally in Baltimore, Los Angeles, Nashville, and Macon, Georgia, Massenburg is widely known for submitting a paper to the Audio Engineering Society in 1972 regarding the parametric equali... Read More →
avatar for Christoph Thompson

Christoph Thompson

Director of Music Media Production, AES Education Committee, Ball State University
Christoph Thompson is vice-chair of the AES audio education committee. He is the chair of the AES Student Design Competition and the Matlab Plugin Design Competition. He is the director of the music media production program at Ball State University. His research topics include audio... Read More →
Friday May 29, 2026 2:00pm - 3:00pm CEST
Aud 41 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
 
Saturday, May 30
 

9:00am CEST

Adaptive Deesser Application
Saturday May 30, 2026 9:00am - 9:30am CEST
High-fidelity vocal processing is frequently compromised by
sibilance, a phenomenon characterized by stochastic
high-frequency energy that presents unique dynamic range
challenges. While traditional de-essing techniques often
rely on static frequency bands, they fail to account for
inter-speaker variability; changing dynamics. This
project presents an adaptive real-time de-essing
application, developed using the JUCE framework, which
automatically detects; suppresses sibilant frequencies.
The proposed methodology integrates a derivative-based
frequency tracking algorithm to estimate the spectral
centroid without the computational overhead of the Fast
Fourier Transform (FFT). This is coupled with a dual-path
envelope detection system; a relative threshold logic to
distinguish sibilance from the wideband signal.
Additionally, a dynamic harmonic exciter is implemented to
restore high-frequency presence during non-sibilant
periods. Objective spectral analysis confirms the system's
ability to selectively attenuate energy in the 6–11 kHz
range while maintaining spectral transparency;
minimizing artifacts.
Authors
CE

Cumhur Erkut

Aalborg University
Cumhur Erkut (M.Sc. 1997, D.Sc. 2002) has received a PhD in acoustics and audio signal processing from Helsinki University of Technology, Finland. During his post-doctoral period, he has contributed to national and international projects (EU FP5 and 6). Between 2007 and 2012, he has conducted i... Read More →
SB

Stefanos Biliousis

Aalborg University
Saturday May 30, 2026 9:00am - 9:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:30am CEST

When Excellence Fails the Mix: Non-Compensatory Relationships in Mix Preparation for Music Production
Saturday May 30, 2026 9:30am - 10:00am CEST
Mix preparation, the foundational stage encompassing
technical, musical,; organisational tasks preceding
creative mixing, remains under-examined despite
professional acknowledgement. This study investigated
whether preparatory effectiveness operates through
compensatory relationships, where excellence in one
dimension offsets weakness in another, or through threshold
requirements demanding adequacy across all dimensions
simultaneously.

Nine professional audio practitioners each prepared three
sessions from a pool of nine multitrack recordings spanning
diverse genres. Nine engineers (with partial overlap) then
evaluated the resulting twenty-seven preparations across
five dimensions derived from Phase 1 practitioner
interviews: Session Organisation, Signal Integrity, Musical
Refinement, Processing Boundaries,; Workflow
Facilitation. Professional 'adequacy' was established at a
4.0 threshold based on practitioner consensus regarding
preparations they would 'work with' versus 'send back'.

Results revealed consistent non-compensatory patterns:
exceptional performance in isolated dimensions failed to
compensate for failures elsewhere. One practitioner
achieved perfect Workflow Facilitation (5.00) yet overall
inadequacy (3.43) due to Signal Integrity failure (2.50).
Another achieved strong Musical Refinement (4.75) whilst
Workflow Facilitation collapse (1.75) produced a
below-threshold outcome (3.49). These patterns held across
all inadequate sessions. No track produced exclusively
adequate or inadequate outcomes, confirming source material
did not determine success.

The findings challenge three assumptions: that
practitioners can specialise; compensate, that education
can sequence skills for later integration,; that
intelligent systems can optimise tasks independently.
Preparatory adequacy requires meeting threshold standards
across all dimensions concurrently, with implications for
professional hiring, curriculum design,; AI-assisted
tool development.
Authors
AA

Ashour Ahmed

University of West London - London College of Music
Saturday May 30, 2026 9:30am - 10:00am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:00am CEST

Low-Frequency Limits of Cross-Talk Cancellation Systems Under Robustness Constraints
Saturday May 30, 2026 10:00am - 10:30am CEST
The low-frequency performance of cross-talk cancellation
(CTC) systems is fundamentally limited by the condition
number of the plant matrix, which indicates the robustness
of the inverse system in the absence of regularisation.
This condition number, in turn, depends on the relationship
between loudspeaker spacing, listener distance,;
acoustic wavelength.
This paper derives a simple approximate expression for the
low-frequency limit of CTC performance, defined for a given
maximum affordable condition number as a function of these
parameters. The increase in condition number is also shown
to be directly related to the increase in array effort
relative to the minimum achievable array effort. The
formulation is derived for a centered listener; can be
extended to the case of off-center listener positions,
demonstrating the method's applicability to
listener-position-adaptive cross-talk cancellation systems.
Speakers
FF

Filippo Fazi

Chief Scientist, Audioscenic
Authors
FF

Filippo Fazi

Chief Scientist, Audioscenic
FV

Francesco Veronesi

University of Southampton
Saturday May 30, 2026 10:00am - 10:30am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:00am CEST

What Are You Doing With That Compressor?
Saturday May 30, 2026 10:00am - 11:30am CEST
Dynamic range controllers, or compressors, have long been
central to music production. The loudness standards now
adopted by major streaming platforms have further
heightened the importance of skilled and intentional
compression in both mixing and mastering. Pop and rock
productions with extremely limited dynamic range are
routinely attenuated during playback, while highly dynamic
material—such as classical orchestral works and film
scores—risks gain and possibly undesirable soft clipping
being applied when delivered masters are normalized.
This panel will address best practices for the use of
compression across a wide range of applications, from
individual instruments to full-program material, in both
stereo and immersive formats. Panelists will present
established methodologies alongside innovative techniques
drawn from their current professional workflows. Different
types of compression will be examined and compared,
including their application in mastering for vinyl release.
Audience engagement is an integral component of the
workshop, and ample time will be reserved for questions and
discussion with conference attendees.
Speakers
avatar for George Massenburg

George Massenburg

Associate Professor of Sound Recording, Massenburg Design Works
George Y. Massenburg is a Grammy award-winning recording engineer and inventor. Working principally in Baltimore, Los Angeles, Nashville, and Macon, Georgia, Massenburg is widely known for submitting a paper to the Audio Engineering Society in 1972 regarding the parametric equali... Read More →
avatar for Richard King

Richard King

McGill University, McGill University
Montreal
avatar for Stefan Bock

Stefan Bock

Managing Director, msm-studios GmbH
Stefan Bock, born 20.08.1964 in southern Germany was starting his career in 1987 as an audio engineer. After freelancing in different facilities in Munich, he co-founded msm-studios in 1991 where he was the Chief Mastering Engineer and General Manager.

He was leading msm-studios t... Read More →
ML

Margaret Luthar

Dark Sky Mastering

Saturday May 30, 2026 10:00am - 11:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

11:00am CEST

Optimising Sound Effects to Enhance Dialogue Perception in Audio Mixes Using Selective Auditory Attention
Saturday May 30, 2026 11:00am - 11:30am CEST
Dialogue intelligibility is a fundamental aspect of audio
post-production. Ensuring speech clarity in complex sound
mixes remains challenging across different playback
systems. Selective auditory attention plays a central role
in how listeners track dialogue in busy mixes, so small
changes in spectral or spatial structure can influence
perceived clarity in unexpected ways. This study
investigates the effectiveness of psychoacoustically
informed techniques, equalisation; spatialisation, in
reducing auditory masking; improving the clarity of
dialogue. The listening test was completed on participants’
own playback systems, which reflects typical domestic
viewing conditions; aligns the study with real-world
listening environments. The techniques were tested
individually; in combination to assess their impact.
Results show that equalisation was more effective than
spatialisation in reducing masking, while their combination
produced a significant improvement in intelligibility,
clarity,; reduced interference. The effectiveness of
these methods varied between the two groups of clips,
suggesting that their application should be adapted to the
specific acoustic context of each scene.
Authors
avatar for Federico Aramini

Federico Aramini

Edinburgh Napier University
Dialogue and sound editor with 3+ years' experience and 30+ credits in film across feature film, animation, documentary and TV series.Contributed to award-winning and festival recognised productions, including films screened at the Venice Film Festival and the David di Donatello Awards... Read More →
IM

Iain McGregor

Edinburgh Napier University
RS

Rod Selfridge

Edinburgh Napier University
Saturday May 30, 2026 11:00am - 11:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

12:30pm CEST

George Massenburg: 3D Masterclass
Saturday May 30, 2026 12:30pm - 1:30pm CEST
George plays high resolution stereo, 5.1 and 3D recordings
from his fabulous back catalogue, commenting on production
tools and techniques, including his own excellent dynamics
processor.

This masterclass series, featuring remarkable recording
artists, is a chance to hear 3D audio at its best; as we
discuss qualities that make it truly worth the effort.

In each masterclass, we explore the new spatial
possibilities in recording and production, detailing also
this specific listening room, regarding ITU-R BS.1116
compliance and auditory envelopment (AEV) transparency.
Seats are limited to keep playback variation at bay.
Speakers
avatar for George Massenburg

George Massenburg

Associate Professor of Sound Recording, Massenburg Design Works
George Y. Massenburg is a Grammy award-winning recording engineer and inventor. Working principally in Baltimore, Los Angeles, Nashville, and Macon, Georgia, Massenburg is widely known for submitting a paper to the Audio Engineering Society in 1972 regarding the parametric equali... Read More →
avatar for Thomas Lund

Thomas Lund

Genelec Oy, Genelec Oy
Denmark
Saturday May 30, 2026 12:30pm - 1:30pm CEST
Aud 31 Technical University of Denmark Asmussens Alle, Building 306 DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

Investigations on Nonlinearity in a Gammatone Filter Bank Based Perceptual Model
Saturday May 30, 2026 1:00pm - 3:00pm CEST
Perceptual models are playing an important role in
effectively balancing the data compression; fidelity in
audio encoders by leveraging the masking effects in human
auditory perception. For deriving well suitable masking
thresholds, considering tonality is important. In this
study, a novel filter bank is proposed, which uses narrow
complex-valued all-pole gammatone filters followed by a
non-linear spectral spreading processing. With an
appropriate non-linear mapping before spreading,;
inverse non-linear mapping afterwards, differences between
masking strengths of tonal; noise-like maskers can be
directly obtained without explicit tonality estimation.
By employing a suitable non-linearity, level-dependency of
spectral spreading in the human auditory system can also be
modeled. The performance of the proposed approach is
evaluated through subjective listening tests, which include
comparisons with results obtained using partial spectral
flatness measures.
Authors
BE

Bernd Edler

International Audio Laboratories Erlangen, Germany
FS

Fabian Schaller

Fraunhofer IIS, Erlangen, Germany
PE

Paul EmilMeier

International Audio Laboratories Erlangen

PS

Paula Schäfer

Fraunhofer-Institut für Integrierte Schaltungen IIS
YH

Yaqiong Hou

PhD student, International Audio Laboratories Erlangen
Saturday May 30, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Audio Processing, Poster

1:00pm CEST

Measurement; Analysis of Perceptual Characteristics of Binaural Cues
Saturday May 30, 2026 1:00pm - 3:00pm CEST
The application of binaural cue perception mechanisms to
multichannel audio compression technology can reduce
spatial parameter redundancy; effectively lower the
encoding bitrate. Binaural cues play a critical role in
sound source localization,; their frequency-dependent
characteristics yield varied perceptual localization
effects. However, current understanding of the specific
behavior of binaural cues at low frequencies, as well as
the similarities; differences between interaural time
difference (ITD); interaural level difference (ILD),
remains incomplete. To explore the relationship between
ITD-based; ILD-based azimuth perception, this study
non-uniformly selected nine ITD values; twelve ILD
values within the 300–1480 Hz frequency range to test ITD
; ILD perceptual azimuths, respectively. The experimental
method involved using fixed binaural cue stimuli while
varying the audio with known horizontal azimuth angles to
approach the target binaural cue stimulus. Test results
indicate that both ITD; ILD perceptual effects are
significantly influenced by frequency, with the minimum
perceptual azimuth values for both ITD; ILD observed at
700 Hz, suggesting that binaural cue perception azimuths
are closer to the median plane at this frequency.
Furthermore, surface fitting was applied to the perceptual
azimuths of ITD; ILD, revealing relatively similar
patterns. Based on experimental findings, this paper
analyzes the explorable perceptual correlation between
ITD-based; ILD-based azimuth perception. The application
of data in spatial audio coding contributes to the
efficient transmission; fidelity preservation of audio
signals. This study provides valuable insights for
optimizing binaural cue-based compression techniques,
ultimately supporting high-fidelity spatial audio
reproduction.
Authors
HW

Heng Wang

Wuhan Polytechnic University
MG

Mingyan Gao

Wuhan Polytechnic University
YX

Yiming Xu

Wuhan Polytechnic University,Wuhan,China
Saturday May 30, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

From DSP to AI Audio Engineering: The Heritage; the Future of Physical Modeling Sound Synthesis
Saturday May 30, 2026 1:30pm - 2:00pm CEST
Digital Audio Signal Processing has long enabled precise
analysis of musical instrument behavior, supporting digital
sound synthesis. In parallel, physical modeling has evolved
into a mature synthesis; simulation technology capable
of running in real time, coupling vibro-acoustic models
with perceptual control interfaces. Over the last decade,
advances in machine learning have begun to transform both
ends of this pipeline. Instead of relying solely on
analytical DSP methods, we are increasingly able to learn
impulse; frequency responses, infer parameters,;
drive synthesis models directly from data. This broader
transition from classical DSP to *AI Audio Engineering*
brings not only new algorithms but also new workflows,
evaluation practices,; deployment contexts for musical
acoustics.

Two demonstrators illustrate this shift. *First*,
measurement-driven studies of musical instruments can
constrain model architectures; reduce parameter search
spaces. The measurement-derived priors can inform both
classical modeling; data-driven neural surrogates.
*Second*, real-time physical modeling integrated into XR
environments highlights how haptic control, perceptual
feedback,; spatial audio can create convincing virtual
instruments suitable for experimentation, pedagogy,;
performance.

These demonstrators motivate an AI Audio Engineering
workflow in which measurement, modeling, learning,;
perceptual evaluation form a continuous loop, to enable
immersive XR experiences, rapid prototyping of novel
instruments,; new modes of digital lutherie. The
approach invites collaboration across acoustics, DSP,
spatial audio,; AI Audio Engineering: an emerging
discipline that considers audio models as deployable,
maintainable,; continuously improvable artifacts
governed by data, inference, evaluation,; lifecycle
operations.
Authors
CE

Cumhur Erkut

Aalborg University
Cumhur Erkut (M.Sc. 1997, D.Sc. 2002) has received a PhD in acoustics and audio signal processing from Helsinki University of Technology, Finland. During his post-doctoral period, he has contributed to national and international projects (EU FP5 and 6). Between 2007 and 2012, he has conducted i... Read More →
Saturday May 30, 2026 1:30pm - 2:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

A Study on Uncertainty of Sound Pressure Measurements in Cars
Saturday May 30, 2026 1:30pm - 2:00pm CEST
Accurate; efficient measurement of sound pressure levels
around the ears of occupants in cars is essential for
objective evaluation of basic sound quality; automotive
audio features such as personal sound zones; active
noise control. In this paper, the uncertainties of sound
pressure measurements obtained with 5 commonly used methods
are compared, which are the AES 6-microphone method, the
single-microphone method, the two-microphone method with
occupants presented, the head-and-torso simulator method,
; the human binaural method. Measurements were conducted
in the front-right seat of a 4-door electric Sedan, using
either all car body loudspeakers or a pair of headrest
loudspeakers driven by a two-channel uncorrelated pink
noise to generate an average sound pressure level of 70 dBA
in the seat. Each method underwent 3 complete
install–measure–remove cycles, a total of 54 recordings
were collected,; the standard deviation of the measured
average sound pressure levels was adopted to quantify
measurement uncertainty. The test results show that all the
5 methods have good repeatability; low uncertainty below
200 Hz; above 15 kHz, but have large uncertainty between
200 Hz; 15 kHz. The AES 6-microphone method demonstrates
the best repeatability with the lowest uncertainty across
most frequency resolutions,; its maximum uncertainty in
1/3 octave bands is less than 2.0 dB for sound pressure
measurements in the car. Therefore, the AES 6-microphone
method is recommended for use in engineering comparison;
reporting.
Authors
JT

Jiancheng Tao

Key Laboratory of Modern Acoustics and Institute ofnAcoustics, Nanjing University
RC

Ruoyan Chen

Key Laboratory of Modern Acoustics and Institute ofnAcoustics, Nanjing University
avatar for Xiaojun Qiu

Xiaojun Qiu

Yinwang Intelligent Technology Co., Ltd, Shanghai, China
ZZ

Zhou Zhou

Key Laboratory of Modern Acoustics and Institute ofnAcoustics, Nanjing University
Saturday May 30, 2026 1:30pm - 2:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:00pm CEST

Knowledge-Driven Optimization of Reverberation Parameters Using Declarative Audio Constraints
Saturday May 30, 2026 2:00pm - 2:30pm CEST
Artificial reverberation is a fundamental process in music
production; audio post-production. However, the large
; highly interdependent parameter spaces of modern
reverberation algorithms make the identification of
perceptually optimal configurations difficult, particularly
when attempting to minimize audible artifacts. This paper
presents a knowledge-driven framework for reverberation
parameter optimization that evaluates candidate
configurations using rule-based audio quality constraints
derived from perceptual; signal-processing principles.
The system automatically detects; prevents common
artifacts including spectral obfuscation, clipping, spatial
collapse,; ringing phenomena. Instead of relying on
data-driven training procedures, the proposed approach
employs declarative reasoning to model audio engineering
knowledge; systematically constrain parameter
exploration. Experimental evaluation demonstrates that the
framework successfully reduces artifact occurrence across
diverse audio material while maintaining computational
feasibility. The results suggest that knowledge-based
reasoning can provide an interpretable; controllable
alternative to data-driven optimization strategies in audio
signal processing.
Authors
FE

Flavio Everardo

Tec de Monterrey, University of Potsdam
NH

Noah Haussmann

TU Berlin, University of Potsdam
Saturday May 30, 2026 2:00pm - 2:30pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:00pm CEST

Optimal levels; measurement time for separation of nonlinear components
Saturday May 30, 2026 2:00pm - 2:30pm CEST
Linear loudspeaker parameters are often estimated via
fitting of transferfunctions, under the assumption of
linearity. This paper investigates the corruption of the
measurement caused by nonlinearities in the system;
presents a new; improved method for separating the true
linear response from the nonlinear components by analyzing
a sequence of measurements done at different levels. The
method is improved by analyzing the influence of the chosen
measurement levels as well as the measurement time at each
level; presents numerically optimal values for the most
typical cases of nonlinear behaviour. While the influence
of noise; nonlinear distortion can be eliminated
completely in the case of finite orders of nonlinearities
on the system, the method is also shown to provide improved
accuracy in the more realistic case where all orders are
present but only a finite number of them dominate.
Authors
avatar for Finn Agerkvist

Finn Agerkvist

Technical University of Denmark
My interest are loudspeakers (measurements, modelling, (nonlinear) parameter estimation, nonlinear compensation. Active noise control, indoor and outdoor sound field control

Saturday May 30, 2026 2:00pm - 2:30pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
 


Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.