Loading…
Schedule as of May 16, 2022 - subject to change

Default Time Zone is CEST - Central European Summer Time
You can change your view to your time zone (look for "Timezone" on the right)


LIVESTREAMS : A and B


ON DEMAND VIDEOS (previous days)
 
Type: Audio Processing clear filter
arrow_back View All Dates
Thursday, May 28
 

9:00am CEST

Deep Learning-Based Lower-Layer Upmixing
Thursday May 28, 2026 9:00am - 9:30am CEST
This paper introduces a novel approach for generating a
lower layer in multichannel audio upmixing, addressing a
gap in existing methods that primarily focus on mid; top
layers. Leveraging Harmonic-Percussive Separation (HPS),
the proposed framework dynamically adjusts key parameters
(separation factor, harmonic attenuation,; phase shift)
to enhance percussive components while diffusing harmonic
elements. We compared three neural network architectures
for this task: LSTM, TCN,; Transformer. Experimental
results show comparable perceptual quality; objective
metrics across all models, with the TCN being the most
balanced; suitable for deployment on edge devices.
Authors
ES

Ema Souza-Blanes

Samsung Research America
LM

Luis Madrid

Samsung Research Tijuana
avatar for Thaddeus Páez

Thaddeus Páez

Research Engineer, Samsung Research Tijuana
Research Engineer at Samsung Mexico.
Thursday May 28, 2026 9:00am - 9:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:30am CEST

Spectral Optimization for Automatic Multitrack Mixing Using Answer Set Programming
Thursday May 28, 2026 9:30am - 10:00am CEST
The mixing stage in music production involves a complex set
of interdependent technical; creative decisions aimed at
achieving a coherent; industry-level result. Intelligent
Music Production (IMP) is an emerging research area that
integrates Artificial Intelligence techniques into music
creation; post-production processes, spanning from
composition to mastering. Within this context, Answer Set
Programming (ASP), a declarative paradigm from Knowledge
Representation; Reasoning, has proven effective for
modeling; solving complex optimization problems. This
article presents frmixerr, an ASP-based intelligent system
designed to optimize the mixing process by automatically
generating balanced mixes. The system formulates mixing as
a combinatorial optimization problem; evaluates
candidate solutions against a reference spectral profile.
To assess its performance, a subjective listening test was
conducted comparing mixes generated by frmixerr with mixes
produced by human engineers with varying levels of
professional experience. The results indicate no
significant differences in perceived quality between
frmixerr mix; those created by professionals, suggesting
that ASP constitutes a viable approach for intelligent
assistance in music mixing.
Authors
CB

Carlos Benítez

Tec de Monterrey
FE

Flavio Everardo

Tec de Monterrey, University of Potsdam
Thursday May 28, 2026 9:30am - 10:00am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:00am CEST

Experimental study of sound zone methods for indoor/outdoor active noise cancellation
Thursday May 28, 2026 10:00am - 10:30am CEST
The development of personal sound zone systems in recent
years show great potential for low-frequency noise control
outside of noisy spaces. These approaches show promising
applications to manage noise pollution arising from
concerts in large venues or urban festivals. However, most
of the literature considered that the created sound zones
would exist in the same room or acoustic space as the noise
source. This premise hence discards all setups where the
disturbances would occur outside of concert venues (e.g in
neighboring houses). This paper presents a first
experimental study of the behavior of sound zone methods
for indoor sound zones; outdoor noise sources. These
initial results present a good efficiency of these methods
in this edge case, opening new use cases for these
approaches.
Authors
LH

Lucas Hocquette

L-Acoustics
avatar for Yves Pene

Yves Pene

Research Engineer, L-Acoustics
Thursday May 28, 2026 10:00am - 10:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:00am CEST

Beyond Species Identification: Real-Time Spatial Interaction Analysis in Avian Bioacoustics Using Microphone Arrays; Hybrid Beamforming on Edge Architectures
Thursday May 28, 2026 10:00am - 10:30am CEST
Conventional ornithological monitoring systems rely heavily
on single-channel recorders; deep learning classifiers
to identify "what" species is present, but fail to capture
"where" it is located or how individuals interact
spatially. This limitation hinders the study of complex
ecological behaviors, such as inter-specific spacing in
dense vegetation; predator-prey dynamics. We propose a
novel, dual-mode acoustic localization system designed to
unify semantic classification; spatial tracking.
Utilizing an economically scalable 16-channel Uniform
Rectangular Array (UMA-16) interfaced with edge-computing
platforms, we implement a hybrid spatial filtering pipeline
structured to balance real-time latency constraints with
achievable angular resolution. The first stage employs a
computationally efficient, noise-robust linear scanning
technique to generate an acoustic energy map; estimate
source multiplicity. This preliminary data initializes a
second-stage, super-resolution spectral estimation
algorithm predicated on signal-noise subspace
orthogonality, allowing the noise robustness of
non-parametric beamforming methods with the precision of
parametric approaches. By integrating these spatial filters
with standard deep learning classifiers, the system
resolves overlapping vocalizations in "Cocktail Party"
scenarios; improves Signal-to-Noise Ratio (SNR) for
cryptic species detection. We address the physical
"Localization-Detection Range Disparity," demonstrating
that while detection is viable at long ranges, precise
localization is constrained by the array aperture to the
near-to-mid field. The system outputs real-time video
overlays of acoustic heatmaps for field observation;
generates autonomous volumetric territory maps in fixed
deployments, collectively providing ornithologists with a
robust capability for analyzing the spatial ecology of
avian vocalizations.
Authors
avatar for Emre Göktuğ AKTAŞ

Emre Göktuğ AKTAŞ

Istanbul Technical University
MK

Mesut Kartal

Istanbul Technical University
Thursday May 28, 2026 10:00am - 10:30am CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:00am CEST

Distortion Measurements; Can We Measure What We Hear?
Thursday May 28, 2026 10:00am - 11:00am CEST
There are many types of different distortions that can be
measured from linear to non-linear distortion. Often the
two are convoluted together and the linear distortion
influences the non-linear distortion. Distortion is also
very signal and level dependent and it is hard to compare
one type of distortion measurement to another. There are
many type of non-linear distortion metrics, e.g. THD, THD+N
and IMD being the most classic ones using sine tones as the
test signal. But how can we measure distortion with real
signals such as speech and music or even noise and compare
the results to audibility? This tutorial discusses a wide
range of distortion measurements, discusses what is audible
and what distortion sounds like.
Speakers
avatar for Steve Temme

Steve Temme

Listen Inc.
Steve Temme is founder and President of Listen, Inc., manufacturer of the SoundCheck audio test system. Steve founded the company in 1995, and for the past 30 years the company has remained on the cutting edge of research into audio measurement, regularly introducing new measurement... Read More →
Thursday May 28, 2026 10:00am - 11:00am CEST
Aud 49 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:00am CEST

The Early Electronic Orchestra: The Analogue Circuits Behind Electronic Keyboards Before Digital Came Along.
Thursday May 28, 2026 10:00am - 11:00am CEST
Before digital signal processing took over electronic
keyboard instruments, they were implemented using analogue
circuits that used tubes/valves, transistors, and even neon
lightbulbs! Yet using these components keyboards were
developed that could mimic string and brass ensembles,
pianos and harpsichords and many other instruments. How did
they do it?

The purpose of this tutorial is to look at both the
architecture and the circuitry of these instruments. And
show how amazing results could be achieved using
comparatively simple electronic circuitry. It will look at:

1. The basic architecture of these instruments
2. How they generated the right notes,
3. How they desired envelope,
4. And imposed them on the waveform,
5. Simulated the effect of many instruments playing
together.

It will also look at how, if it was required, touch
sensitivity could be achieved, such as in electronic
pianos. Where possible there will be audio examples
demonstrating the sounds that could be achieved.

For many people who have only ever experienced the digital
world it will be illuminating to see just how much could be
achieved by comparatively simple circuits.
In those days electronic components were expensive so
considerable ingenuity was expended in minimising the total
number of components required.

These instruments are part of our musical and audio
heritage and the circuit techniques they used are in danger
of being forgotten so this tutorial will be a timely
reminder of what used to be done.
It may also provide useful information to people who are
attempting to model these instruments using modern digital
methods.

The tutorial will be accessible to everyone, you will not
have to be an electronic engineer to understand the
principles behind these unique pieces of audio engineering
history.
Speakers
avatar for Jamie Angus-Whiteoak

Jamie Angus-Whiteoak

Emeritus Professor/Consultant/VP-Northern Europe, AES
Jamie Angus-Whiteoak Is Emeritus Professor of Audio Technology at Salford University and VP for Northern Europe.

Her interest in audio was crystallized aged 11 when she visited the WOR studios, NYC, in 1967 on a school trip. After this she was hooked, and spent much of her free ti... Read More →
Thursday May 28, 2026 10:00am - 11:00am CEST
Aud 41 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

11:00am CEST

Input-output linearization of loudspeaker dynamics via automatic differentiation
Thursday May 28, 2026 11:00am - 11:30am CEST
Input-output linearization is a technique for compensating
nonlinear distortion in loudspeakers. To apply it to
complex loudspeaker models, we describe an end-to-end
framework for estimating model parameters from data;
deriving the linearizing control laws using automatic
differentiation. The parameter estimation approach combines
frequency-domain linear parameter estimation with a
time-domain prediction-error method for the nonlinear
parameters. The linearization approach supports non-linear
reference systems; stabilization of the control law
using trajectory tracking. We implement the framework in
dynax, an open-source Python package based on JAX,;
validate it experimentally as a feed-forward controller on
a closed-box loudspeaker. Results demonstrate validation
errors of 1--5\,\% NRMSE; total harmonic distortion
reductions of 6--12\,dB. The framework enables researchers
; engineers to rapidly prototype; validate complex
loudspeaker models for distortion compensation without
manual symbolic derivations.
Authors
avatar for Finn Agerkvist

Finn Agerkvist

Technical University of Denmark
My interest are loudspeakers (measurements, modelling, (nonlinear) parameter estimation, nonlinear compensation. Active noise control, indoor and outdoor sound field control

Thursday May 28, 2026 11:00am - 11:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

Joint Neural Translation; Classification of Videos for Audio Processing
Thursday May 28, 2026 1:30pm - 2:00pm CEST
A low-parameter-count machine-learning model for
classifying streaming video can enable content-aware
audio/video processing on consumer edge devices with
latency, computational,; battery constraints. In this
paper, we propose a low-compute classification technique
that uses only text metadata from the streaming file
header, enabling near-instantaneous inference without
decoding; analyzing audio or video signals as is
traditionally done. In particular, to support multilingual
platforms such as YouTube, we first apply neural machine
translation as a pre-processing step for the text metadata
; optimize a lightweight neural classifier for a
three-class audio-centric classification taxonomy (movie,
music, dialog/other). Experiments on a mixed-language
YouTube dataset achieve $\approx$90\% classification
accuracy on a test set using a combined translation; a
classification model (with only $\sim22K$ parameters),
demonstrating a globally-scalable approach for robust
classification on the edge.
Authors
AC

Alejandro Cajica

Samsung Research Mexico
avatar for Sunil Bharitkar

Sunil Bharitkar

Samsung Research America

Thursday May 28, 2026 1:30pm - 2:00pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

New Paths for Immersive Music Streaming: Channel-based and High Resolution
Thursday May 28, 2026 1:30pm - 3:00pm CEST
Streaming of immersive audio is known to western audiences
almost exclusively in the object-based format, Atmos,
developed by Dolby and employing lossy codecs to limit bit
rates. Other object-based formats like Sony 360 have had
limited success, and until recently there were no channel
based streamed versions. But this situation is changing,
as it has already done in Japan.

Responding to growing interest in very high quality
immersive music for both on-demand streaming and live
broadcast, two new services are now active that offer,
first, channel-based audio and second, audio streamed in
high res PCM. Binaural mixes, a range of PCM formats and
video are variously included, with extensions to portables,
loudspeakers, and home theater.

This workshop provides a forum for discussion of both the
genuine promise and the challenges in these new
initiatives. Included are the advantages of high
resolution over lossy; channel-based versus object-based;
the degree of adoption of transducers for multichannel;
adaptive bit rates; data sources; and the Japanese
approach; amongst others.
Speakers
avatar for Kimio Hamasaki

Kimio Hamasaki

President, Artsridge LLC
Kimio Hamasaki, an AES Fellow, is a producer and balance engineer for music recordings, a researcher in spatial audio, an educator in audio engineering and acoustics, and a consultant in audio engineering. He has recorded and produced numerous orchestral and operatic works with the Vienna Philharmonic... Read More →
avatar for Stefan Bock

Stefan Bock

Managing Director, msm-studios GmbH
Stefan Bock, born 20.08.1964 in southern Germany was starting his career in 1987 as an audio engineer. After freelancing in different facilities in Munich, he co-founded msm-studios in 1991 where he was the Chief Mastering Engineer and General Manager.

He was leading msm-studios t... Read More →
avatar for Bert van Daele

Bert van Daele

CTO, Goer Dynamics BV
Bert Van Daele is CTO at NewAuro.
After graduating as an Engineer in Digital Electronics in 1997, he started out as an electronics designer at Philips Electronics, mainly working on digital products related to Surround Sound.
During a sabbatical leave, he worked at the Galaxy Studi... Read More →
avatar for Morten Lindberg

Morten Lindberg

Engineer and Producer, 2L (Lindberg Lyd)
Recording Producer and Balance Engineer with 50 GRAMMY-nominations, 42 of these in craft categories Best Engineered Album, Best Surround Sound Album, Best Immersive Audio Album and Producer of the Year. Founder and CEO of the record label 2L. Grammy Award-winner 2020 and 2026. Immersive... Read More →
VM

Vicki Melchior

Chair, AES Technical Committee - HRA; also: IndependentnConsultant, Audio DSP and Software
Thursday May 28, 2026 1:30pm - 3:00pm CEST
Aud 49 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

Binaspect: A Python Library for Binaural Audio Analysis, Visualization & Feature Generation
Thursday May 28, 2026 1:30pm - 3:30pm CEST
We present Binaspect, an open-source Python library for
binaural audio analysis, visualization,; feature
generation. Binaspect generates interpretable “azimuth
maps” by calculating modified interaural time; level
difference spectrograms,; clustering those
time-frequency (TF) bins into stable time-azimuth histogram
representations. This allows multiple active sources to
appear as distinct azimuthal clusters, while degradations
manifest as broadened, diffused, or shifted distributions.
Crucially, Binaspect operates blindly on audio, requiring
no prior knowledge of head models. These visualizations
enable researchers; engineers to observe how binaural
cues are degraded by codec; renderer design choices,
among other downstream processes. We demonstrate the tool
on bitrate ladders, ambisonic rendering,; VBAP source
positioning, where degradations are clearly revealed. In
addition to their diagnostic value, the proposed
representations can be exported as structured features
suitable for training machine learning models in quality
prediction, spatial audio classification,; other
binaural tasks. Binaspect is released under an open-source
license with full reproducibility scripts at: (link removed
for blind review)
Authors
AR

Alessandro Ragano

University College Dublin
DB

Dan Barry

University College Dublin
DS

Davoud Shariat Panah

University College Dublin
avatar for Jan Skoglund

Jan Skoglund

Google, Google

Thursday May 28, 2026 1:30pm - 3:30pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

Lightweight Real-time Spatial Audio Interpolation for Standalone VR using Hand Claps
Thursday May 28, 2026 1:30pm - 3:30pm CEST
Realistic spatial audio consistent with visual information
is essential for providing high immersion in Augmented
Reality (AR) environments. However, conventional
high-precision real-time acoustic simulations require
significant computational power, limiting their
implementation on standalone mobile VR devices such as the
Meta Quest. This study proposes a practical method to
enhance reverb realism using solely a standalone VR HMD,
without the need for additional external equipment. By
measuring impulse responses using a few hand claps in the
physical space, we interpolate room acoustic
parameters—specifically RT60; early/late energy
ratios—to reflect the environment's unique characteristics.
These extracted parameters are then applied to the VR
engine's built-in reverb effects, enabling dynamic,
location-aware real-time rendering with minimal
computational load. The proposed method demonstrates that a
brief calibration period of 3 to 5 minutes yields
significantly improved realism compared to static reverb
templates, offering an efficient; practical spatial
audio solution for mobile
AR environments.
Authors
MK

Minsu Kim

Seoul National University
Thursday May 28, 2026 1:30pm - 3:30pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:00pm CEST

Personalized VR for hearing research with embedded devices
Thursday May 28, 2026 2:00pm - 2:30pm CEST
Deep learning has significantly improved speech enhancement
performance in controlled laboratory conditions, yet these
advances rarely translate into robust real-world benefit
for hearing aid users. Current algorithms are trained;
evaluated in simplified acoustic scenarios, neglecting
multimodal cues, user interaction, environmental dynamics,
; the strict latency; power constraints of embedded
devices. As a result, a persistent gap remains between
algorithmic performance; everyday listening experience.
This position paper reviews recent progress in speech
enhancement, embedded Artificial Intelligence hardware,;
hearing aid systems,; argues for a shift toward
ecologically valid evaluation; hardware-aware design. We
propose virtual reality as a reproducible, multisensory
benchmarking platform enabling joint assessment of human
perception; algorithmic processing. This perspective
outlines a research roadmap toward adaptive, context-aware,
; practically deployable hearing technologies.
Authors
RS

Romain Serizel

LORIA - Laboratoire Lorrain de Recherche en Informatique etnses Applications
SS

Stefania Serafin

Department of Engineering Technology and Didactics,nTechnical University of Denmark
Thursday May 28, 2026 2:00pm - 2:30pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:00pm CEST

Perceptual Model Considering Comodulation Masking Release by Postmasking Adaptation
Thursday May 28, 2026 2:00pm - 2:30pm CEST
This work presents a perceptual model based on a complex
IIR filterbank. The filterbank with a frequency resolution
of 4 bands per Bark consists of 104 filters whose slopes
are designed to take spectral masking effects into account.
The filter outputs are used to obtain masking thresholds
with the following post processing. To obtain resonable
masking thresholds from the spreading outputs, a post
masking stage is required. Here, we propose a comodulation
dependent adaptation of the postmasking decay to model
Comodulation Masking Release (CMR) effects. This approach
explicitely considers the dip-listening effect known from
literature. The final masking thresholds are obtained by
weighting the postmasking outputs by a tonality dependent
gain, controlled using spectral flatness estimation. A
listening test compares the proposed method to an already
known approach using direct CMR based modification of the
masking threshold gains.
Authors
BE

Bernd Edler

International Audio Laboratories Erlangen, Germany
FS

Fabian Schaller

Fraunhofer IIS, Erlangen, Germany
Thursday May 28, 2026 2:00pm - 2:30pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:30pm CEST

A Recursive Attractor Network for Long-Form Sound Source Localization; Identity Tracking with a Variable Number of Sources
Thursday May 28, 2026 2:30pm - 3:00pm CEST
Sound source localization; identity tracking are
fundamental tasks in acoustic scene analysis, enabling
machines to determine what, where; when produces sound
events. While deep attractor-based networks have
demonstrated improved performance under an unknown number
of sources, maintaining continuous source tracking over
long-form audio remains challenging due to memory
limitations; permutation ambiguities across adjacent
segments. In this paper, we propose a Recursive Attractor
Network (RANet) for long-form sound source localization;
identity tracking with a variable number of sources. RANet
explicitly represents source attractors as transferable
embeddings; recursively propagates them across adjacent
audio segments using a LSTM-based model, thereby preserving
source identity continuity over time. Experimental results
on simulated datasets demonstrate that RANet achieves
robust long-form sound source localization; consistent
source identity tracking, outperforming baseline approaches
under variable; dynamic source conditions.
Authors
JD

Jiaqi Du

Peking University
TQ

Tianshu Qu

Peking University
XW

Xihong Wu

Peking University
Thursday May 28, 2026 2:30pm - 3:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

3:00pm CEST

Deep-Learning-Driven Sensory Profiling of Headphone Target Curves with Adaptive Listening Test Validation
Thursday May 28, 2026 3:00pm - 3:30pm CEST
Identifying robust headphone target curves is challenging
when preference data from untrained listeners are
interpreted without explicit perceptual structure. This
work presents a methodological framework in which deep-
learning-driven sensory-profile analysis serves as the
primary interpretive layer for listening data.
Candidate target curves are generated using an Interactive
Differential Evolution (IDE) listening experiment that
combines paired comparisons with a second- stage
absolute-rating task, enabling continuous exploration of the
perceptually relevant tuning space while reducing cognitive
load. Converged gain sets are analyzed using a Virtual
Listener Panel (VLP), a Deep Learning (DL) model trained on
large-scale expert evaluations to predict perceptual
attributes from rendered musical material. Predicted
attributes are reported as relative scores along key sensory
dimensions, including bass strength, timbral balance,;
brilliance, enabling exploration of sensory clusters,
perceptual trade-offs,; potential families of target
tunings.
Adaptive listening data from three culturally distinct
listener panels (Denmark, Japan,; Colombia; 20
participants
per site) support the DL-based interpretation. Convergence
is quantified as a reduction in population variance,
; cross-site analyses assess the similarity of clustering
structures; the consistency of relationships between
preference; sensory attributes. Overall, the framework
provides a scalable, perceptually grounded approach to
interpreting listener-preference data when developing
headphone target curves.
Authors
avatar for Gabriele Ravizza

Gabriele Ravizza

Perceptual Audio Evaluation Specialist, FORCE Technology
▪  Acoustics, psychoacoustics, product development, and digital communication as an Audio Engineer in the consumer electronics industry.
▪ Currently employed as a specialist at FORCE Technology's SenseLab department, contributing to enhancing sound quality in a wide range of consumer electronics products, collaborating with audio companies from across the globe... Read More →
avatar for Julian Villegas

Julian Villegas

University of Aizu, University of Aizu
Japan
Thursday May 28, 2026 3:00pm - 3:30pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

3:00pm CEST

Emergence; Spatial Directionality of Sa Quintina in the Sacred Vocal Tradition of Castelsardo, Sardinia, Italy: An Early-Stage Sonological–Acoustical Study
Thursday May 28, 2026 3:00pm - 3:30pm CEST
Sa quintina is a distinctive emergent vocal phenomenon
almost exclusively associated with the sacred polyphonic
singing tradition of Castelsardo, perceived as an
autonomous “fifth voice” arising during collective
performance by four male singers. Although widely
acknowledged in ethnomusicological literature, its
formation mechanisms remain only partially explored within
audio engineering; acoustical research.
This paper presents an early-stage, descriptive sonological
case study proposing new hypotheses on the formation;
spatial reinforcement of sa quintina. The phenomenon is
interpreted as a physically grounded, measurable outcome of
harmonic fusion; spatial interference, observable
through spectral energy distribution; coherence. It is
hypothesized to emerge from a converging set of
conditions—including non-tempered harmonic textures,
differentiated vocal emission techniques, intentional
formant tuning,; circular spatial configuration—none of
which is assumed to be strictly sufficient in isolation.
Building upon previous spectral coherence analyses, the
study introduces a Quintina Directionality Index (QDI) to
quantify the spatial dimension of the phenomenon. QDI is
defined as the ratio between spectral energy in two
frequency bands associated with sa quintina (600–750 Hz;
1200–1400 Hz); total spectral energy. The index is
evaluated as a function of direction using ambisonic
recordings in an anechoic chamber; as a function of
microphone position in a controlled field setting.
Preliminary observations suggest that sa quintina
corresponds to localized regions of enhanced spectral
coherence; energy reinforcement, supporting its
interpretation as an emergent physical phenomenon that
precedes; enables its perceptual salience, rather than a
purely auditory illusion.
Authors
FB

Felicita Brusoni

PhD candidate Musikhögskolan i Malmö, Lund University
LF

Luca Frigo

Conservatorio G. Nicolini Piacenza
MS

Martino Sarolli

Conservatorio Paganini Genova
RD

Riccardo Dapelo

Conservatorio Nicolini Piacenza
Thursday May 28, 2026 3:00pm - 3:30pm CEST
Aud 43 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

3:30pm CEST

Center Extraction GAN
Thursday May 28, 2026 3:30pm - 4:00pm CEST
This paper presents a method for extracting a center signal
from two-channel stereo signals for upmixing;
reproduction with additional center loudspeakers.
It uses a generative adversarial network with a generator
trained with multiple reconstruction losses; adversarial
losses obtained from a discriminator.
The processing is of low computationally complexity, causal
; can be configured for latencies down to one audio frame
of 46 ms length.
It is described how training data are created using only
publicly available signals; how the generation of target
data enables to control the attenuation of diffuse signals
; direct signals panned off-center.
An evaluation with listening test; computational metrics
SI-SDR; F2 measure is presented.
It shows an advantage compared to methods based on
classical signal processing in terms of computational
metrics for source separation; listeners preference.
Authors
AW

Andreas Walther

Fraunhofer IIS

avatar for Christian Uhle

Christian Uhle

Chief Scientist, Fraunhofer Institute for Integrated Circuits IIS
Christian Uhle is chief scientist in the Audio and Media Technologies division of the Fraunhofer IIS, Erlangen, Germany, and in the International Audio Laboratories Erlangen.
He received the Dipl.-Ing. and PhD degrees from the Technical University of Ilmenau, Germany, in 1997 and... Read More →
JK

Julian Klapp

Fraunhofer Institute for Integrated Circuits IIS
PP

Pablo Panter

Fraunhofer Institute for Integrated Circuits IIS
Thursday May 28, 2026 3:30pm - 4:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

3:30pm CEST

Audio engineering music for listeners with hearing loss
Thursday May 28, 2026 3:30pm - 4:30pm CEST
Audio engineering often implicitly assumes a uniformity in
hearing across listeners; this is an assumption that does
not reflect real-world diversity. How could technologies
and practices in production, mixing, and reproduction be
adapted to create music that is more inclusive? While the
AES has a conference series on Audio and Music Induced
Hearing Disorders, this has focused on the causes of
hearing loss with little on audio engineering for listeners
who have a hearing loss.

In western countries, about one in three adults are deaf,
have hearing loss or suffer from tinnitus. Hearing loss can
lead to many challenges with music such as: inaudibility of
quieter passages, distortion, degraded pitch perception,
and difficulty in identifying and picking out lyrics and
instruments. The most common intervention for mild to
moderately severe hearing loss is hearing aids. But while
many of these devices have music programs, their efficacy
is mixed, to the point that many opt not to use them. With
the rise of machine learning within Audio Engineering,
there are opportunities to better personalise music, and
therefore address issues listeners face. Consumer devices
are also increasingly having audio accessibility features
added, but the usefulness of these lack independent
testing. This workshop will consider opportunities for
making music more accessible.

The workshop will start by exploring how hearing loss harms
the experience of listening to music and how this varies
between people. This will lead to discussion of why no
technology can fully ‘correct’ music to achieve a ‘perfect’
listening experience for those with hearing loss. There is
no technology to recreate a ‘golden-ears’ experience. This
leads to a key research question: what is the best,
rendition of a piece of music for someone who has hearing
loss? What do listeners want from music, and how can we get
closest to achieving that?

We will bring in findings from research projects and
listening tests to explore what is known, and also to
highlight that there are significant gaps in knowledge that
require further research. We will then explore
state-of-the-art in wearables such as hearing aids and
sound reproduction systems. This will include the current
Cadenza project, which has been running a series of machine
learning challenges to improve music for those with hearing
loss.

Throughout, we will encourage questions and engagement from
delegates. We want to hear about lived experience of
hearing difference and how that has changed professional
practice and personal lives. We are also keen to hear
suggestions from delegates on what approaches might be used
to improve music for those with hearing loss.

We aim to raise awareness of the importance of considering
diverse audiences in Audio Engineering practice. Where
possible, the workshop will provide practical guidance for
audio engineers, highlighting techniques and emerging
technologies that can better support listeners with diverse
hearing profiles.

The Workshop will be organised by the Cadenza Project Team
https://cadenzachallenge.org/ A large UK-funded project
about improving music for those with hearing loss.
Speakers
avatar for Josh Reiss

Josh Reiss

Professor, Queen Mary University of London
Josh Reiss is Professor of Audio Engineering with the Centre for Digital Music at Queen Mary University of London. He has published more than 200 scientific papers (including over 50 in premier journals and 6 best paper awards) and co-authored two books. His research has been featured... Read More →
TC

Trevor Cox

University of Salford
SM

Sara Madsen

GN Store Nord
AS

Adam Steed

Contact Theatre, Manchester
Thursday May 28, 2026 3:30pm - 4:30pm CEST
Aud 41 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

4:30pm CEST

Personalized Timbre Optimization for Stereophonic Sound Reproduction via Earphones: Part 2 – Practical Implementation; Validation on Consumer TWS Devices
Thursday May 28, 2026 4:30pm - 5:00pm CEST
This paper presents Part 2 of our study on personalized
timbre optimization for stereophonic sound reproduction via
earphones, following our previous work presented at the AES
International Conference on Headphone Technology in 2025.
While Part 1 established a novel auditory-model-based
framework for reproducing a listener’s natural timbre
reference; demonstrated its perceptual validity under
controlled conditions, the present study focuses on the
practical implementation; validation of this approach
for real-world use with consumer True Wireless Stereo (TWS)
earphones.

Conventional headphone; earphone personalization
techniques primarily target spatial audio reproduction or
rely on preference-based equalization, often overlooking
the accurate reproduction of natural timbre in stereophonic
content. Our approach explicitly addresses this limitation
by isolating; optimizing perceptually relevant timbral
cues while excluding spatial encoding components, thereby
improving timbral fidelity without degrading stereo imaging.

The proposed method originally consists of four stages:
high-resolution anatomical scanning of the listener’s upper
body, including the pinnae, individualized HRTF computation
using the boundary element method, selective removal of
spatial encoding components to derive a personalized
reference target response curve (PR-TRC),; perceptual
optimization using a listener-specific weighting
coefficient grounded in auditory reference fidelity rather
than preference. In this paper, each stage is simplified
; automated using smartphone-based scanning;
AI-assisted processing, enabling end users to complete the
entire personalization process via a smartphone connected
to a cloud-based server. The resulting personalized target
response curve is implemented within the computational;
memory constraints of the DSP pipeline of commercial
consumer TWS earphones.

A subjective evaluation using the Semantic Differential
Method was conducted to assess the perceptual impact of the
simplified implementation. Twenty-four listeners evaluated
personalized target curves generated by both the original
; simplified methods, as well as two non-personalized
target curves commonly used in commercial TWS earphones.
The results show that both personalized methods
consistently outperform non-personalized conditions in
overall sound quality; listener preference. Importantly,
no statistically significant degradation in perceived
timbral naturalness was observed between the simplified;
original methods.

These findings demonstrate that auditory-model-based
personalized timbre optimization can be effectively
translated into a practical, consumer-ready technology. The
proposed approach represents a foundational contribution to
future audio personalization; has broad applicability
across headphone; earphone systems for stereophonic
sound reproduction.
Authors
AH

Atsushi Hara

final Inc.
HH

Haruto Hirai

final Inc.
avatar for Kimio Hamasaki

Kimio Hamasaki

President, Artsridge LLC
Kimio Hamasaki, an AES Fellow, is a producer and balance engineer for music recordings, a researcher in spatial audio, an educator in audio engineering and acoustics, and a consultant in audio engineering. He has recorded and produced numerous orchestral and operatic works with the Vienna Philharmonic... Read More →
MH

Mitsuru Hosoo

final Inc.
NT

Nao Tojo

final Inc.
SS

Shun Saito

final Inc./post-doc

Thursday May 28, 2026 4:30pm - 5:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

4:30pm CEST

A Parametric Dual-Channel Audio Coding via Learned Time-Frequency Masking
Thursday May 28, 2026 4:30pm - 5:00pm CEST
While Neural Audio Codecs (NAC) have revolutionized
monaural audio compression, achieving high-fidelity
dual-channel coding at low bitrates remains a significant
challenge. Existing approaches often rely on naive
independent channel quantization, leading to phase
incoherence, or entangled latent modeling, which sacrifices
spatial precision for spectral energy. This paper proposes
a novel dual-channel coding framework based on
contentspatial disentanglement. Reframing spatial
reconstruction as an informed source separation task, our
architecture synergizes a frozen, pre-trained DAC encoder
for robust mono content preservation with a
parameter-efficient side information encoder that predicts
fine-grained time-frequency masks. To ensure precise
spatial imaging, we introduce explicit physical constraints
into the end-to-end training. Experimental results indicate
that at low bitrates of 9; 11 kbps, the proposed method
outperforms state-of-the-art dual-mono neural baselines;
industry standards in both objective spatial metrics;
subjective MUSHRA evaluations.
Authors
QH

Qingbo Huang

MMLab,ByteDance
TQ

Tianshu Qu

Peking University
YW

Yihan Wang

Peking University
YQ

Yufan Qian

Peking University
Thursday May 28, 2026 4:30pm - 5:00pm CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
 


Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -