AES Europe 2026: Full Schedule

Schedule as of May 16, 2022 - subject to change

Default Time Zone is CEST - Central European Summer Time
You can change your view to your time zone (look for "Timezone" on the right)

LIVESTREAMS : A and B

ON DEMAND VIDEOS (previous days)

arrow_back View All Dates

9:00am CEST

Music generation model based on global emotional feature perception

Friday May 29, 2026 9:00am - 11:00am CEST

Foyer Building 303A

The rapid development of artificial intelligence
composition technology has brought innovation to music
creation. However, current deep learning music generation
models often neglect the global correlation of emotional
features, resulting in fragmented emotional expression in
generated works; insufficient alignment with human
emotional perception, making it difficult to meet the core
demand for emotional conveyance in diverse music creation.
This study aims to propose a music generation method that
integrates a global perception mechanism for emotional
features. Taking the EMOPIA; VGMIDI preprocessed
datasets as the research objects, an improved model based
on EMelodyGen (EMelodyGen-PPO) is constructed: a GLU
network layer is introduced in the feature extraction stage
to enhance the model's ability to filter; represent
emotion-related features; an improved PPO-Clip algorithm is
integrated in the training process,; a multi-dimensional
emotional reward function is designed to achieve global
dynamic perception; optimization of emotional features.
Experimental results show that the music21 parsing rate of
the EMelodyGen-PPO model on the target dataset is 3%; 4%
higher than that of the baseline model, respectively. An
automated quality assessment system based on fluency,
rhythm stability, harmony richness, melodic smoothness,;
structural integrity verifies that the comprehensive score
of the model's generated works is significantly better than
that of the comparative model. This study provides an
efficient technical path for emotion-oriented music
generation, which can empower grassroots cultural workers
; independent musicians at low cost, facilitate diverse
music creation practices; emotional audio content
dissemination,; align with the diversity; innovative
development concept of the AES audio community.

Authors

Chen Li

Wuhan Polytechnic University

Heng Wang

Wuhan Polytechnic University

Lingzhi Chen

Wuhan Polytechnic University

Mingyan Gao

Wuhan Polytechnic University

XUETING WANG

Wuhan Polytechnic University

Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Acoustics of Music Rooms, Poster | AI and Machine Learning in Audio, Poster | Audio Applications and Technologies, Poster

Presentation Type Poster

9:00am CEST

A General Model for Deepfake Speech Detection: Diverse Bonafide Resources or Diverse AI-Based Generators

Friday May 29, 2026 9:00am - 11:00am CEST

Foyer Building 303A

In this paper, we analyze two main factors of Bonafide
Resource (BR) or AI-based Generator (AG) which affect the
performance; the generality of a Deepfake Speech
Detection (DSD) model. To this end, we first propose a
deep-learning based model, referred to as the baseline.
Then, we conducted experiments on the baseline by which
we indicate how Bonafide Resource (BR); AI-based
Generator (AG) factors affect the threshold score used to
detect fake or bonafide input audio in the inference
process. Given the experimental results, a dataset, which
re-uses public Deepfake Speech Detection (DSD) datasets;
shows a balance between Bonafide Resource (BR) or AI-based
Generator (AG), is proposed. We then train various
deep-learning based models on the proposed dataset;
conduct cross-dataset evaluation on different benchmark
datasets. The cross-dataset evaluation results prove that
the balance of Bonafide Resources (BR); AI-based
Generators (AG) is the key factor to train; achieve a
general Deepfake Speech Detection (DSD) model.

Authors

AlexanderSchindler

Dat Tran

FPT University

David Fischinger

Austrian Institute of Technology

Davide Antonutti

Austrian Institute of Technology

Ian McLoughlin

Singapore Institute of Technology

Khoi Vu

FPT University

Lam Pham

Austrian Institute of Technology

Marcel Hasenbalg

Austrian Institute of Technology

Martin Boyer

Austrian Institute of Technology

SimonFreitter

Austrian Institute of Technology

Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Poster | Audio Applications and Technologies, Poster

Presentation Type Poster

9:00am CEST

Semantic Audio Encoders from EQ Parameters Alone: Effects of Training Data Composition on Limited-Data Learning

Friday May 29, 2026 9:00am - 11:00am CEST

Foyer Building 303A

We investigate how training data composition influences
semantic audio encoders that learn perceptual descriptors
such as "warm," "bright,"; "muddy" from equalization
(EQ) parameter datasets without labeled audio examples.
Using the SAFE-DB dataset of 1,369 labeled EQ settings, we
train audio encoders via an inverse problem formulation in
which labeled EQ parameters are applied to source audio;
the encoder is trained to recognize the resulting semantic
characteristics. Three training configurations are
compared, varying both class sampling strategy (uniform
versus balanced); source audio type (pink noise versus
real music). Despite severe class imbalance in SAFE-DB,
where 76 percent of examples are labeled "bright" or
"warm," balanced class sampling combined with mixed-source
training (50 percent pink noise; 50 percent FMA music)
successfully learns physically meaningful semantic-spectral
relationships: "warm"; "muddy" show negative correlation
with spectral centroid (r = -0.56), while "bright";
"thin" show positive correlation (r = +0.49). However,
prediction confidence decreases substantially (from 0.96 to
0.76 to 0.86),; top-1 predictions remain dominated by
the "bright" class across all evaluated music genres,
reflecting inherent dataset bias rather than training
failure. These results demonstrate that training data
composition significantly affects model calibration but
cannot fully overcome fundamental bias in the underlying
label distribution, highlighting key challenges for
semantic audio understanding systems.

Authors

Daniel Dutulescu

UCL

Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Poster | Audio Applications and Technologies, Poster

Presentation Type Poster

9:00am CEST

Voice-Based Fatigue Detection for Military Personnel: A Multi-Modal Machine Learning Framework with Acoustic Feature Emphasis

Friday May 29, 2026 9:00am - 11:00am CEST

Foyer Building 303A

This study presents a voice-centered machine learning
framework for detecting mental fatigue in military
personnel, integrating acoustic analysis with physiological
biosensors to enhance detection robustness. Mental fatigue
poses critical safety; performance challenges in
military operations, yet cultural stigma often prevents
self-reporting. We collected multi-modal data from 23
participants across two fatigue states, extracting
comprehensive acoustic features including sound pressure
level (SPL), formants, mel-frequency cepstral coefficients
(MFCCs), jitter, shimmer, harmonic-to-noise ratio (HNR),
; temporal speech characteristics. These voice features
were combined with electroencephalography (EEG),
photoplethysmography (PPG),; temperature data to train
multiple machine learning classifiers. The voice-based
models achieved accuracies between 82-85\%, with support
vector machines (SVM); long short-term memory (LSTM)
networks demonstrating superior performance. When acoustic
features were combined with physiological markers,
classification accuracy improved to 92\%, with
Classification; Regression Trees (CART); Linear
Discriminant Analysis (LDA) emerging as top performers.
Statistical analysis identified SPL; formant variance as
the most discriminative voice features, while Lempel-Ziv
Complexity (LZC); theta/beta ratio proved most reliable
for EEG. Evaluation on new participants yielded 67\%
accuracy, revealing model generalization challenges that
inform future research directions. This work demonstrates
that voice-based machine learning systems, when augmented
with physiological data, offer a promising non-invasive
approach to real-time fatigue monitoring in operational
military environments.

Authors

Claire Courchene

Applied Perception Associate Engineer, GN

I’m a creative technologist and interaction designer exploring how sound, technology, and human experience meet. With an MScEng in Sound & Music Computing, I prototype audio interactions, build ML‑driven tools, and design experiments around perception. My background spans music... Read More →

Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Poster | Audio Applications and Technologies, Poster | Audio Processing, Poster | Cross-Disciplinary Sound Studies, Poster

Presentation Type Poster

9:00am CEST

Exploring Perceptual; Physiological Auditory Models for Assessing Speech Intelligibility in Enhanced Signals

Friday May 29, 2026 9:00am - 11:00am CEST

Foyer Building 303A

Current deep learning approaches to speech enhancement rely
heavily on objective measures like mean squared error or
scale-invariant signal-to-distortion ratio as both training
objectives; evaluation metrics. While analytically
convenient, these benchmarks often fail to capture the
nuances of human perception or actual intelligibility.
Furthermore, the inconsistent integration of metrics like
Short-Term Objective Intelligibility or Perceptual
Evaluation of Speech Quality into training; evaluation
pipelines leaves a gap between algorithmic performance;
perceptual reality. This paper proposes a transition
towards evaluation methodologies grounded in
psychoacoustics; audiological modeling. Our study
explores two distinct methods to characterise enhanced
signals. On one hand, we employ a perceptual approach based
on the Cambridge loudness model to assess the preservation
of spectral excitation patterns; perceived intensity. On
the other hand, we adopt a biophysical approach by
utilising CoNNear, a convolutional model of the human
auditory periphery. This allows us to simulate
representations of responses at different stages of the
auditory periphery to observe how speech enhancement
processing affects the physiological representation of
speech. We analyse pre-trained speech enhancement models
using automatic speech recognition; Short-Term Objective
Intelligibility as an additional proxy for human
intelligibility. By mapping automatic speech recognition
performance against loudness; peripheral response
patterns, we investigate the extent to which current
enhancement strategies maintain the perceptual;
physiological integrity of the speech signal. This work
aims to identify features predictive of intelligibility,
providing a foundation for speech enhancement systems
optimised for the human listener rather than purely
signal-based objective functions.

Authors

François Effa

Université de Lorraine, CNRS, Inria, Loria, Nancy, France

Romain Serizel

LORIA - Laboratoire Lorrain de Recherche en Informatique etnses Applications

Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Poster | Audio Processing, Poster | Perception, Poster

Presentation Type Poster

9:00am CEST

Objective Quality Models for Decision-Making in Speech Coding

Friday May 29, 2026 9:00am - 11:00am CEST

Foyer Building 303A

Objective quality evaluation is widely used in speech
coding, yet objective estimates often show limited
agreement with subjective listening-test results. Rather
than focusing on absolute score accuracy, this paper
evaluates objective speech quality models from a
decision-making perspective, defined as their ability to
support comparative judgments between speech codecs or
codec configurations. A formal ITU-R P.800 Absolute
Category Rating (ACR) listening test was conducted with 30
listeners across 24 conditions, covering conventional;
neural monophonic speech codecs operating under
clear-channel conditions at sampling frequencies from 16 to
48 kHz; bit rates ranging from below 1 kbps to above 16
kbps. The speech material consisted of internally recorded,
clean French-language speech that was not used in the
development or training of any of the evaluated codecs or
objective quality models. Seven objective quality models,
namely PESQ, VISQOL Speech, VISQOL Audio, WARP-Q, NISQA,
UTMOS,; DistillMOS, were evaluated on the same material.
Decision-making performance was assessed by comparing
subjective; objective rankings using Kendall’s rank
correlation coefficient; by analyzing pairwise codec
comparisons using t-tests at a 95% confidence level. The
results show that some objective quality models are
effective for comparing bit rate variations within a given
speech coding technology, provided that all other codec
parameters remain unchanged (e.g., sampling frequency).
However, all models exhibit limitations, including
tendencies toward over- or underestimation for certain
technologies, as well as reduced reliability when applied
across different sampling frequencies. Despite its
conventional origins, PESQ remains capable of supporting
decision-making even when applied to neural speech codecs.

Authors

Clémence Lamballe

Universite de Sherbrooke

Philippe Gournay

Universite de Sherbrooke

Roch Lefebvre

Universite de Sherbrooke

Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Poster | Audio Processing, Poster | Perception, Poster

Presentation Type Poster

9:00am CEST

The Ambisonic Denoising Paradox: U-Net Processing Degrades ASR Transcription Quality for Medical Speech

Friday May 29, 2026 9:00am - 11:00am CEST

Foyer Building 303A

Spatial audio recording using higher-order Ambisonics
offers rich directional information for medical speech
capture, yet challenging hospital acoustic environments
motivate preprocessing with neural denoising algorithms.
This study investigates whether U-Net-based denoising of
third-order ambisonic recordings improves automatic speech
recognition (ASR) quality for medical applications. We
developed the Medical Immersive Audio Corpus (MIAC),
comprising 1,759 utterances (6.43 hours) of Polish medical
speech recorded with a Zylia ZM-1 microphone in
uncontrolled hospital environments, capturing 16-channel
third-order Ambisonics across multiple specializations
including thyroid ultrasonography, surgical procedures,;
general diagnostics. We applied a U-Net architecture with
dual attention mechanisms trained using the Noise2Noise
paradigm to denoise the corpus, then evaluated
transcription quality using ten Whisper ASR models ranging
from 39 million to 1.55 billion parameters, including
domain-adapted medical variants. Surprisingly, we
discovered a "noise reduction paradox" where denoising
degraded transcription quality for seven of ten models,
with statistically significant increases in Word Error Rate
(WER); Character Error Rate (CER) for general-purpose
base, small,; medium models. Only the domain-adapted
whisper-medium-68000-abbr model showed statistically
significant improvement (p=0.0008), while large-scale
models (large-v2, large-v3) exhibited robustness with
negligible changes. Effect sizes remained small (Cohen's d
< 0.2) across all models. These counterintuitive findings
suggest modern ASR systems implicitly utilize background
noise characteristics as informative features,; that
preprocessing pipelines should be reconsidered for
domain-specific applications. Our results provide practical
guidance for medical speech processing system design.

Authors

Bartlomiej Mroz

Assistant Professor, Gdańsk University of Technology

PhD, Spatial Audio & Immersive Media Researcher, Recording Engineer, Statistics enthusiast

Szymon Zaporowski

Gdańsk University of Technology

Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Poster | Audio Processing, Poster | Recording Production and Reproduction, Poster

Presentation Type Poster

9:00am CEST

A perceptual evaluation of various commercial models of music source separation, with a focus on model performance against non-traditional source material

Friday May 29, 2026 9:00am - 11:00am CEST

Foyer Building 303A

Music source separation (MSS) systems are commonly used in
production, remixing,; audio analysis work, yet
questions arise regarding the extent that objective
evaluations of model performance align with human
perceptual evaluations, particularly when tasked with
non-traditional source material (in this case, heavily
processed electronic music). This study seeks to set a
framework for an evaluation of 3 machine learning
approaches to MSS: a spectrogram-domain model (spleeter), a
waveform-domain model (Demucs v2),; a hybrid-domain
model (HTDemucs). Subjective evaluations of model
performance were accumulated via a MUSHRA-style listening
test, while objective evaluations were assessed using
signal-to-distortion ratio (SDR); Frechet Audio Distance
(FAD). Results showed consistent agreement across objective
metrics, with the hybrid-domain model outperforming the
other singular-domain models. Perceptual ratings also
favored the hybrid model, with listeners occasionally
rating the model output as equal or better quality than the
original reference, interestingly. Preliminary analysis
indicates some moderate but insignificant correlations
between the two assessment paths, reinforcing concerns
about relying solely on numerical evaluations when
discussing MSS model performance. Implications for model
design; future evaluation procedures are discussed.

Authors

Sahan Wijewardane

University of Miami

Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Poster | Perception, Poster

Presentation Type Poster

9:00am CEST

Automating sound design for adaptive video game narration

Friday May 29, 2026 9:00am - 11:00am CEST

Foyer Building 303A

HAMLET is a research project that investigates the
integration of Artificial Intelligence; co-creation
practices within the creative industries. The project
proposes AI-driven enablers to support artists through
collaborative workflows between creative practitioners;
technology providers. This work focuses on an automated
sound design framework for text-based role-playing games,
where the game narration is dynamically generated through
player textual interaction with an LLM. To address this
unpredictability, the proposed system generates adaptive
soundscapes automatically from textual scene descriptions.
An LLM identifies semantically relevant sound sources,
which are then matched to audio libraries through metadata
alignment. The files are assessed for quality,; are fed
to an automated mixing module. The framework addresses
challenges related to semantic alignment, audio quality,
aesthetic balance,; file size constraints.

Authors

Charalampos Dimoulas

Aristotle University of Thessaloniki

George Kalliris

Aristotle University of Thessaloniki

Lazaros Vrysis

Aristotle University of Thessaloniki

Marina Eirini Stamatiadou

Aristotle University of Thessaloniki

Nikolaos Vryzas

Aristotle University of Thessaloniki

Dr. Nikolaos Vryzas was born in Thessaloniki in 1990. He studied Electrical & Computer Engineering in the Aristotle University of Thessaloniki (AUTh). After graduating, he received his master degrees on Information and Communication Audio Video Technologies for Education & Production from the Interdepartme... Read More →

Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Poster | Sound Design, Poster

Presentation Type Poster

1:00pm CEST

Geometry Sensitivity in Low-Count Virtual Microphone Arrays: From Tetrahedral Baselines to Stochastic Spherical Layouts

Friday May 29, 2026 1:00pm - 3:00pm CEST

Foyer Building 303A

Virtual Microphone Array techniques are being investigated
by the authors to support room acoustics optimisation in
live sound environments. In our recent AES paper, “Room
Acoustics Optimisation Using Virtual Microphone Arrays”, a
notable outcome was that a compact four-microphone
tetrahedral array performed strongly relative to its low
sensor count. Recent virtual sensing; Remote Microphone
Technique research treats microphone placement as an
explicit design variable. It reports improved remote
estimation performance when microphone layouts are
deliberately chosen for the task, rather than adopted as
fixed, standard configurations.
This submission builds on our prior VMA work by focusing on
the four-microphone case, where geometry choices are
especially constrained. We compare a tetrahedral baseline
with an ensemble of stochastically generated spherical
layouts at the same array aperture using Monte Carlo
simulation. We apply a consistent evaluation protocol
across multiple listening-region offsets; standard
beamforming estimators to isolate variability due to
geometry alone. The central proposition is that, for
low-count VMAs, geometry is a first-order design parameter.
Tetrahedral remains a credible baseline, but lightweight
stochastic exploration can reveal alternative layouts that
are competitive;, in some cases, superior without
increasing channel count.

Authors

Brian de Brit

Lecturer, Technological University Dublin

Brian de Brit is a lecturer in the School of Electrical and Electronic Engineering at Technological University Dublin. He holds a B.Sc. in Mathematical Physics (University College Dublin), an M.Phil. in Music and Media Technologies (Trinity College Dublin), and a Master of Engineering... Read More →

David Dorran

Technological University Dublin

Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Acoustics of Music Rooms, Poster | Audio Processing, Poster

Presentation Type Poster

1:00pm CEST

Clustered Virtual Microphone Arrays for Listener-Level Monitoring; Room-Correction in Live Sound

Friday May 29, 2026 1:00pm - 3:00pm CEST

Foyer Building 303A

This paper introduces clustered virtual microphone arrays
as a step toward improving listener-level virtual
microphone estimation for live sound. Multiple compact
microphone sub-arrays are placed around a nominal overhead
position. Each sub-array produces a virtual microphone
estimate,; the estimates are fused. The aim is to attack
the estimation problem from multiple viewpoints; reduce
sensitivity to any one array placement or geometry.
The work builds on our earlier paper, “Room Acoustics
Optimisation Using Virtual Microphone Arrays”. That paper
proposed virtual microphones estimated from an overhead
array as a measurement layer for live sound optimisation.
It also highlighted a key limitation: in its initial form,
virtual microphone estimation quality was not yet strong
enough for reliable use across positions. The present paper
targets that limitation. We outline the clustered array
idea; treat cluster count; inter-cluster spacing as
design parameters. Virtual microphones are estimated using
beamforming; combined using simple fusion. Performance
is assessed with objective signal measures, including SNR
; frequency-; phase-related error measures, across
multiple listener-level target positions. The results
support further refinement under more realistic room
conditions; further study of the link between improved
estimation quality; FIR-based correction outcomes.

Authors

Brian de Brit

Lecturer, Technological University Dublin

David Dorran

Technological University Dublin

Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Acoustics of Music Rooms, Poster | Audio Processing, Poster | Recording Production and Reproduction, Poster

Presentation Type Poster

1:00pm CEST

A Time–Frequency Integrated Framework for Frequency-Invariant Beamforming in Loudspeaker Arrays

Friday May 29, 2026 1:00pm - 3:00pm CEST

Foyer Building 303A

Loudspeaker array beamforming technology has been widely
used; however, current frequency-domain; time-domain
design methods for calculating FIR filters face challenges,
including the need for modeling delay; high
computational complexity. To address these issues, this
paper proposes a time–frequency integrated framework. This
framework supports both pressure matching; amplitude
matching methods, enabling not only the realization of
traditional superdirective beams but also the design of
frequency-invariant beams. For the nonlinear optimization
problem in amplitude matching, an efficient solving
algorithm based on the Alternating Direction Method of
Multipliers (ADMM) is introduced. Experimental results
demonstrate that the proposed method combines the
advantages of existing frequency-domain; time-domain
approaches, directly computing FIR filter coefficients
without delay modeling while maintaining high computational
efficiency. This provides an effective solution for beam
control in loudspeaker arrays.

Authors

Jianbin Yang

Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark

Keyu Pan

Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark

Ning Cong

Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark

Xing Tian

Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark, Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark

Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Applications and Technologies, Poster | Audio Equipment, Poster | Audio Processing, Poster

Presentation Type Poster

1:00pm CEST

The Impact of Frequency Gradient on Nonlinear Pulse Distribution in the Farina Technique

Friday May 29, 2026 1:00pm - 3:00pm CEST

Foyer Building 303A

The Exponential Sine Sweep (ESS) technique, popularized by
Angelo Farina, has become a cornerstone of modern
electroacoustic measurement due to its unique capability to
simultaneously extract a system’s linear impulse response
; its individual harmonic distortion components. Standard
implementation of this method almost exclusively utilizes a
low-to-high (upward) exponential sine sweep. However,
during a technical Q&A session at the AES Europe 2025
Convention in Warsaw, a question was raised: what are the
practical consequences of reversing the sweep direction?
This inquiry is particularly relevant given that several
industry-standard measurement platforms often employ
high-to-low (downward) sweeps to optimize the mechanical
; thermal stability of the device under test (DUT) while
performing stepped or swept sinusoidal analysis.
This paper provides an investigation into the temporal
behavior of nonlinearities when the frequency gradient of
an exponential sweep is inverted. Through formal
mathematical derivation; numerical simulations the study
proves that while the spacing between distortion orders
remains identical in magnitude, the polarity; time
distribution of these impulses is reversed. Specifically,
we demonstrate that in a downward sweep, the distortion
products shift from the "pre-causal" negative time region
to the "post-causal" positive time region. This shift
causes harmonic distortion pulses to emerge within the
reverberant tail of the impulse response, leading to
significant contamination of decay measurements;
energy-time curves. By contrasting the "tracking filter"
paradigm with "time-domain deconvolution," this work
clarifies why sweep direction is a critical parameter that
must be aligned with the specific goals of the measurement
protocol.

Authors

Daniele Ponteggia

Materiacustica Srl

Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Applications and Technologies, Poster | Audio Processing, Poster

Presentation Type Poster

1:00pm CEST

Real-Time Heart Rate Sonification Using Spectral Filtering of Preferred Music for Running Training

Friday May 29, 2026 1:00pm - 3:00pm CEST

Foyer Building 303A

The purpose of this study was to evaluate a sonification
system that maps live heart rate data to real-time spectral
filtering of a runner's preferred music. Assessed using a
within-subjects design (n = 13), the system employs
high-pass; low-pass filters to indicate deviations from
target heart rate zones, providing instantaneous
biofeedback without requiring visual attention.
Quantitative analysis revealed no statistically significant
differences in target zone accuracy or response time
between auditory, visual,; combined conditions. However,
qualitative thematic analysis identified a clear division
in user preference. Participants favouring the auditory
condition demonstrated faster mean response times to audio
biofeedback. Findings suggest that while sonification
promotes environmental focus; "gamifies" training, its
efficacy is highly dependent on individual processing
styles; music familiarity.

Authors

Duncan Williams

Senior Lecturer, Acoustics Research Centre, University of Salford

Jay Steel

Acoustics Research Centre, University of Salford

Nicholas Ripley

School of Health and Society, University of Salford

Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Applications and Technologies, Poster | Cross-Disciplinary Sound Studies, Poster | Perception, Poster | Sound Design, Poster

Presentation Type Poster

1:00pm CEST

A Psychoacoustic Framework for In-Vehicle Audio-Light Mapping

Friday May 29, 2026 1:00pm - 3:00pm CEST

Foyer Building 303A

This paper proposes a psychoacoustic-based audio-visual
mapping framework for intelligent vehicle cabins to enhance
immersion; stabilize spatial auditory perception. By
establishing mappings between auditory descriptors—such as
Direction of Arrival (DOA), spectral centroid,; temporal
envelope—and ambient lighting parameters, the framework
leverages "ambient vision" to augment the perceptual
experience without increasing the driver's cognitive load.
Theoretical analysis based on Stevens’ Power Law indicates
that the proposed mapping strategies effectively
synchronize audio-visual intensities; mitigate
perceptual fatigue, providing a conceptual reference for
future multisensory HMI design.

Authors

Kangwei Wang

Acoustic System Engineer, GoerDynamics Lab2

Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Audio Applications and Technologies, Poster | Immersive Audio, Poster | Perception, Poster

Presentation Type Poster

1:00pm CEST

Sound field creation with a cube-like loudspeaker array designed using Lamé function based on virtual sound source distribution

Friday May 29, 2026 1:00pm - 3:00pm CEST

Foyer Building 303A

The diversification of audio content production has
increased the demand for realistic, immersive sound field
reproduction. Conventional methods struggle to separate
direct; reflected sounds, limiting accuracy. To address
this issue, this study proposes a method for sound field
reproduction that identifies the arrival directions of
reflected sounds based on the virtual sound source
distribution. In this study, the virtual sound source
distribution was calculated by using closely located four
point microphone method. Assuming that spherical waves
emitted from distant virtual sound sources arrive as plane
waves within the listening area, the target sound field is
generated through plane wave synthesis, enabling more
accurate; flexible sound field generation. Furthermore,
considering practical systems; typical room shapes, we
investigated the reproducibility of plane wave sound fields
using not only spherical array, but also cube-like
loudspeaker array configured by the Lamé function, which
allows continuous geometric transformation from a sphere to
a cube-like form. In this study, the ideal plane wave sound
field derived from the wave equation was regarded as the
reference,; the sound fields generated by the
loudspeaker arrays were evaluated; compared using mean
square error (MSE). Furthermore, the evaluation was
extended beyond a single time instant, enabling assessment
that also accounts for temporal variations. The results
indicated that changing the order of the Lamé function
maintained the desired level of reproducibility.
Consequently, it was confirmed that cube-like loudspeaker
arrays can achieve a level of reproducibility equivalent to
that of the spherical array.

Authors

Tomohiro Sakaguchi

Doctoral student, Waseda University

Yasuhiro Oikawa

Waseda University

Yuzuki Eriguchi

Waseda University

Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Recording Production and Reproduction, Poster

Presentation Type Poster

1:00pm CEST

Spatial Sound Field Reproduction Systems for Cabin Noise in Rail Vehicles: Performance Evaluation Based on Sound Quality Indices

Friday May 29, 2026 1:00pm - 3:00pm CEST

Foyer Building 303A

Innovative railway vehicle systems such as high-speed rail,
maglev,; emerging transportation concepts are expected
to reduce conventional noise sources related to wheel–rail
; aerodynamic interactions. As these changes alter the
acoustic characteristics inside railway cabins, reliable
laboratory reproduction of interior noise becomes
increasingly important for evaluating passenger acoustic
comfort; guiding sound design during vehicle
development. Innovative railway vehicle systems such as
high-speed rail, maglev,; emerging transportation
concepts are expected to reduce conventional noise sources
related to wheel–rail; aerodynamic interactions. As
these changes alter the acoustic characteristics inside
railway cabins, reliable laboratory reproduction of
interior noise becomes increasingly important for
evaluating passenger acoustic comfort; guiding sound
design during vehicle development. The study focuses on
practical methods for assessing reproduction accuracy.
Conventional validation of reproduced sound fields
typically relies on sound pressure level; spectral
matching; however, these metrics alone may not fully
reflect perceptually relevant differences between in-situ
; reproduced environments. In this work, sound quality
indices are employed as complementary evaluation metrics to
examine whether reproduced sound fields maintain
perceptually meaningful characteristics of the original
cabin noise. Comparisons between in-situ recordings;
reproduced sound fields were conducted in terms of overall
sound pressure level, frequency characteristics,;
selected sound quality indices. In addition, the influence
of loudspeaker number; spatial configuration on
reproduction performance was examined. The results show
that sound quality–based evaluation provides useful
additional information for assessing perceptual fidelity
; for optimizing spatial sound reproduction systems for
railway cabin noise. The proposed reproduction platform
supports laboratory-based assessment of interior railway
noise; provides a practical framework for perceptually
informed acoustic evaluation; noise control during the
design of next-generation railway vehicles.

Authors

Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

Recording Production and Reproduction, Poster

Presentation Type Poster

9:00am CEST

9:00am CEST

9:00am CEST

9:00am CEST

9:00am CEST

9:00am CEST

9:00am CEST

9:00am CEST

9:00am CEST

1:00pm CEST

1:00pm CEST

1:00pm CEST

1:00pm CEST

1:00pm CEST

1:00pm CEST

1:00pm CEST

1:00pm CEST

Get help with the event