Loading…
Schedule as of May 16, 2022 - subject to change

Default Time Zone is CEST - Central European Summer Time
You can change your view to your time zone (look for "Timezone" on the right)


LIVESTREAMS : A and B


ON DEMAND VIDEOS (previous days)
 
Type: Perception clear filter
Thursday, May 28
 

1:30pm CEST

Binaspect: A Python Library for Binaural Audio Analysis, Visualization & Feature Generation
Thursday May 28, 2026 1:30pm - 3:30pm CEST
We present Binaspect, an open-source Python library for
binaural audio analysis, visualization,; feature
generation. Binaspect generates interpretable “azimuth
maps” by calculating modified interaural time; level
difference spectrograms,; clustering those
time-frequency (TF) bins into stable time-azimuth histogram
representations. This allows multiple active sources to
appear as distinct azimuthal clusters, while degradations
manifest as broadened, diffused, or shifted distributions.
Crucially, Binaspect operates blindly on audio, requiring
no prior knowledge of head models. These visualizations
enable researchers; engineers to observe how binaural
cues are degraded by codec; renderer design choices,
among other downstream processes. We demonstrate the tool
on bitrate ladders, ambisonic rendering,; VBAP source
positioning, where degradations are clearly revealed. In
addition to their diagnostic value, the proposed
representations can be exported as structured features
suitable for training machine learning models in quality
prediction, spatial audio classification,; other
binaural tasks. Binaspect is released under an open-source
license with full reproducibility scripts at: (link removed
for blind review)
Authors
AR

Alessandro Ragano

University College Dublin
DB

Dan Barry

University College Dublin
DS

Davoud Shariat Panah

University College Dublin
avatar for Jan Skoglund

Jan Skoglund

Google, Google

Thursday May 28, 2026 1:30pm - 3:30pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

Perceptual Evaluation of the MPEG-I Immersive Audio Standard
Thursday May 28, 2026 1:30pm - 3:30pm CEST
The recently finalized ISO international standard (IS) on
MPEG-I immersive audio enables interactive
six-degrees-of-freedom (6DoF) audio rendering for a
multitude of virtual-reality; augmented-reality (VR/AR)
acoustic scenarios; applications with comprehensive
modeling of room acoustics; intricate acoustic
phenomena, including e.g. occlusion, reflection,
transmission; diffraction caused by sound obstacles,
Doppler effect,; dynamic environment changes triggered
by user interactivity. This paper describes concept,
methodology; results of the final verification test of
this standard. In the verification test, the perceptual
quality of the renderer was assessed in an interactive
listening test using different in-; outdoor acoustic
scenes, testing the above-mentioned features of the
standard. More than 50 listeners participated in the test
distributed across six labs using the ITU‑R BS.2132 [1]
multi‑stimulus method on a 100‑point scale for three
conditions (IS, mid-; low anchor) in 10 VR scenes plus
two repetitions. The results of several anchor processing
configurations are presented. The selected mid; low
anchors have demonstrated stable quality across diverse
scenes with progressive timbre; spatial degradations.
The listening test results show a clear separation of the
conditions (IS > mid > low); the low anchor was stable
(around 16 points median value) while the mid anchor varied
by scene (around 47 points). The IS is rated with a median
of 84 points among all labs, which is the “excellent”
region of the scale. The individual scenes are rated
differently. The quartile range for some scenes can exhibit
20 points. The median value for the IS of the different
labs varied, some are a bit more critical than others.
Authors
AS

Andreas Silzle

Fraunhofer IIS, Fraunhofer IIS
Germany
avatar for Leon Terentiv

Leon Terentiv

Dolby, Dolby
Germany
avatar for Pablo Delgado

Pablo Delgado

Fraunhofer IIS, Fraunhofer IIS
Erlangen, DE
SJ

Sam Jelfs

Philips
avatar for Sascha Disch

Sascha Disch

Fraunhofer IIS, Fraunhofer IIS
Sascha Disch received his Dipl.-Ing. degree in electrical engineering from the Technical University Hamburg-Harburg (TUHH) in 1999 and joined the Fraunhofer Institute for Integrated Circuits (IIS) the same year. Ever since he has been working in research and development of perceptual... Read More →
Thursday May 28, 2026 1:30pm - 3:30pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

Capturing Immersive Sound in Concert Halls: A Comparative Analysis of PCMA-3D and Decca Cuboid Recording Techniques
Thursday May 28, 2026 1:30pm - 3:30pm CEST
This paper presents a comparative analysis of two immersive
recording techniques for classical music: the PCMA-3D
(Perspective Control Microphone Array); the Decca
Cuboid. While the Decca Cuboid relies primarily on
time-of-arrival differences to generate spatial
impressions, the PCMA-3D utilises intensity differences;
separates ambience from direct sound. A recording session
was conducted in a concert hall using a classical guitar
soloist; two distinct folk music ensembles to capture
performances simultaneously with both arrays. Subjective
evaluation was performed using a MUSHRA listening test with
18 participants, assessing parameters such as sensation of
space, localisation precision,; sound quality.
Statistical analysis reveals that while both systems
provide high-quality immersive experiences, the PCMA-3D
scored significantly higher in the sensation of space (p
Authors
ZW

Zechen Wang

University of York
Thursday May 28, 2026 1:30pm - 3:30pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
 
Friday, May 29
 

9:00am CEST

Exploring Perceptual; Physiological Auditory Models for Assessing Speech Intelligibility in Enhanced Signals
Friday May 29, 2026 9:00am - 11:00am CEST
Current deep learning approaches to speech enhancement rely
heavily on objective measures like mean squared error or
scale-invariant signal-to-distortion ratio as both training
objectives; evaluation metrics. While analytically
convenient, these benchmarks often fail to capture the
nuances of human perception or actual intelligibility.
Furthermore, the inconsistent integration of metrics like
Short-Term Objective Intelligibility or Perceptual
Evaluation of Speech Quality into training; evaluation
pipelines leaves a gap between algorithmic performance;
perceptual reality. This paper proposes a transition
towards evaluation methodologies grounded in
psychoacoustics; audiological modeling. Our study
explores two distinct methods to characterise enhanced
signals. On one hand, we employ a perceptual approach based
on the Cambridge loudness model to assess the preservation
of spectral excitation patterns; perceived intensity. On
the other hand, we adopt a biophysical approach by
utilising CoNNear, a convolutional model of the human
auditory periphery. This allows us to simulate
representations of responses at different stages of the
auditory periphery to observe how speech enhancement
processing affects the physiological representation of
speech. We analyse pre-trained speech enhancement models
using automatic speech recognition; Short-Term Objective
Intelligibility as an additional proxy for human
intelligibility. By mapping automatic speech recognition
performance against loudness; peripheral response
patterns, we investigate the extent to which current
enhancement strategies maintain the perceptual;
physiological integrity of the speech signal. This work
aims to identify features predictive of intelligibility,
providing a foundation for speech enhancement systems
optimised for the human listener rather than purely
signal-based objective functions.
Authors
FE

François Effa

Université de Lorraine, CNRS, Inria, Loria, Nancy, France
RS

Romain Serizel

LORIA - Laboratoire Lorrain de Recherche en Informatique etnses Applications
Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Objective Quality Models for Decision-Making in Speech Coding
Friday May 29, 2026 9:00am - 11:00am CEST
Objective quality evaluation is widely used in speech
coding, yet objective estimates often show limited
agreement with subjective listening-test results. Rather
than focusing on absolute score accuracy, this paper
evaluates objective speech quality models from a
decision-making perspective, defined as their ability to
support comparative judgments between speech codecs or
codec configurations. A formal ITU-R P.800 Absolute
Category Rating (ACR) listening test was conducted with 30
listeners across 24 conditions, covering conventional;
neural monophonic speech codecs operating under
clear-channel conditions at sampling frequencies from 16 to
48 kHz; bit rates ranging from below 1 kbps to above 16
kbps. The speech material consisted of internally recorded,
clean French-language speech that was not used in the
development or training of any of the evaluated codecs or
objective quality models. Seven objective quality models,
namely PESQ, VISQOL Speech, VISQOL Audio, WARP-Q, NISQA,
UTMOS,; DistillMOS, were evaluated on the same material.
Decision-making performance was assessed by comparing
subjective; objective rankings using Kendall’s rank
correlation coefficient; by analyzing pairwise codec
comparisons using t-tests at a 95% confidence level. The
results show that some objective quality models are
effective for comparing bit rate variations within a given
speech coding technology, provided that all other codec
parameters remain unchanged (e.g., sampling frequency).
However, all models exhibit limitations, including
tendencies toward over- or underestimation for certain
technologies, as well as reduced reliability when applied
across different sampling frequencies. Despite its
conventional origins, PESQ remains capable of supporting
decision-making even when applied to neural speech codecs.
Authors
CL

Clémence Lamballe

Universite de Sherbrooke
PG

Philippe Gournay

Universite de Sherbrooke
RL

Roch Lefebvre

Universite de Sherbrooke
Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

A perceptual evaluation of various commercial models of music source separation, with a focus on model performance against non-traditional source material
Friday May 29, 2026 9:00am - 11:00am CEST
Music source separation (MSS) systems are commonly used in
production, remixing,; audio analysis work, yet
questions arise regarding the extent that objective
evaluations of model performance align with human
perceptual evaluations, particularly when tasked with
non-traditional source material (in this case, heavily
processed electronic music). This study seeks to set a
framework for an evaluation of 3 machine learning
approaches to MSS: a spectrogram-domain model (spleeter), a
waveform-domain model (Demucs v2),; a hybrid-domain
model (HTDemucs). Subjective evaluations of model
performance were accumulated via a MUSHRA-style listening
test, while objective evaluations were assessed using
signal-to-distortion ratio (SDR); Frechet Audio Distance
(FAD). Results showed consistent agreement across objective
metrics, with the hybrid-domain model outperforming the
other singular-domain models. Perceptual ratings also
favored the hybrid model, with listeners occasionally
rating the model output as equal or better quality than the
original reference, interestingly. Preliminary analysis
indicates some moderate but insignificant correlations
between the two assessment paths, reinforcing concerns
about relying solely on numerical evaluations when
discussing MSS model performance. Implications for model
design; future evaluation procedures are discussed.
Authors
avatar for Sahan Wijewardane

Sahan Wijewardane

University of Miami
Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

Real-Time Heart Rate Sonification Using Spectral Filtering of Preferred Music for Running Training
Friday May 29, 2026 1:00pm - 3:00pm CEST
The purpose of this study was to evaluate a sonification
system that maps live heart rate data to real-time spectral
filtering of a runner's preferred music. Assessed using a
within-subjects design (n = 13), the system employs
high-pass; low-pass filters to indicate deviations from
target heart rate zones, providing instantaneous
biofeedback without requiring visual attention.
Quantitative analysis revealed no statistically significant
differences in target zone accuracy or response time
between auditory, visual,; combined conditions. However,
qualitative thematic analysis identified a clear division
in user preference. Participants favouring the auditory
condition demonstrated faster mean response times to audio
biofeedback. Findings suggest that while sonification
promotes environmental focus; "gamifies" training, its
efficacy is highly dependent on individual processing
styles; music familiarity.
Authors
avatar for Duncan Williams

Duncan Williams

Senior Lecturer, Acoustics Research Centre, University of Salford
JS

Jay Steel

Acoustics Research Centre, University of Salford
NR

Nicholas Ripley

School of Health and Society, University of Salford
Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

A Psychoacoustic Framework for In-Vehicle Audio-Light Mapping
Friday May 29, 2026 1:00pm - 3:00pm CEST
This paper proposes a psychoacoustic-based audio-visual
mapping framework for intelligent vehicle cabins to enhance
immersion; stabilize spatial auditory perception. By
establishing mappings between auditory descriptors—such as
Direction of Arrival (DOA), spectral centroid,; temporal
envelope—and ambient lighting parameters, the framework
leverages "ambient vision" to augment the perceptual
experience without increasing the driver's cognitive load.
Theoretical analysis based on Stevens’ Power Law indicates
that the proposed mapping strategies effectively
synchronize audio-visual intensities; mitigate
perceptual fatigue, providing a conceptual reference for
future multisensory HMI design.
Authors
avatar for Kangwei Wang

Kangwei Wang

Acoustic System Engineer, GoerDynamics Lab2
Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
 
Saturday, May 30
 

9:00am CEST

A Longitudinal Dataset for Guitar String Ageing
Saturday May 30, 2026 9:00am - 11:00am CEST
String ageing is a familiar; perceptually important
phenomenon for guitarists; players of other stringed
instruments. From the moment a new set of strings is
installed, the sound they produce when excited begins to
change due to a combination of chemical degradation,
corrosion,; mechanical wear arising from playing.
Musicians commonly report that aged strings sound dull,
lack sustain,; feel less responsive compared to new
strings. String ageing is a function of both elapsed time
; accumulated playing time, with repeated playing
accelerating degradation through contamination; repeated
mechanical stress.

Previous studies have investigated individual aspects of
string ageing by artificially accelerating wear;
performing controlled acoustic measurements, identifying
effects such as increased damping of higher partials;
increased inharmonicity. While these approaches provide
valuable physical insight, the tightly constrained
experimental conditions differ significantly from
real-world playing conditions.

This paper presents a dataset of audio recordings of guitar
playing over a four-week period, starting from the point of
new strings being installed.
Audio performance data from different sets of electric
guitar strings is recorded daily over a four-week period,
using strictly fixed musical exercises that are repeated
multiple times per session. By collecting many takes of
identical material at each stage of string age, the dataset
enables statistical analysis of ageing-related changes
while accounting for natural performance variability.

The dataset is intended to support exploratory machine
learning investigations into string ageing, including
questions of how ageing manifests over time; playing
duration, whether string age can be predicted from audio
alone,; which audio features or learned representations
capture perceptually relevant aspects of the ageing process.
Authors
AW

Alec Wright

University of Edinburgh
MH

Matthew Hamilton

University of Bologna
avatar for Thomas McKenzie

Thomas McKenzie

Lecturer in Acoustics, University of Edinburgh
Thomas McKenzie is a Lecturer in Acoustics and Architectural Acoustics at the Reid School of Music, Edinburgh College of Art, University of Edinburgh, UK. He completed a B.Sc. in Music, Multimedia, and Electronics at the University of Leeds, UK, in 2013, before completing his M.Sc... Read More →
Saturday May 30, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

JoyCam: Blending Facial Recognition with Neural Activity measurement for Real-time Estimation of Listener Emotion
Saturday May 30, 2026 1:00pm - 3:00pm CEST
The ability to objectively measure listener emotion is a
critical frontier for adaptive audio systems, healthcare,
; personalized music therapy. While music is a powerful
driver of affect, traditional self-reporting is often
intrusive or inaccessible for users in wellbeing settings
who may struggle to articulate their mood. This paper
introduces JoyCam, a multimodal system that estimates
subtle moments of joyful engagement by blending lightweight
brain-wave monitoring (wearable EEG) with facial-expression
sensing. By capturing physiological reactions that occur
below the threshold of conscious awareness, the system
creates a more stable emotional profile than
single-modality methods. In our system, Facial joy is
estimated via MediaPipe landmark analysis, focusing on
normalized mouth-width deviations. Simultaneously,
neurological engagement is tracked through Frontal Alpha
Asymmetry (FAA) using an OpenBCI Cyton system. To address
the sensitivity of EEG to movement, a dynamic artefact
index down-weights neural signals during high-frequency
interference. The system was tested in a pilot study with
five participants. Preliminary results indicate that
baseline-corrected physiological scores align closely with
self-reported music impact; valence ratings across
joyful; sad conditions. These findings suggest that
JoyCam offers a robust framework for responsive musical
companions that can adjust playlists or production
parameters based on a listener’s real-time physiological
state
Authors
avatar for Duncan Williams

Duncan Williams

Senior Lecturer, Acoustics Research Centre, University of Salford
Saturday May 30, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

Smartphone-based tinnitus matching: Implementation; Validation
Saturday May 30, 2026 1:00pm - 3:00pm CEST
Tinnitus has been described as `the conscious awareness of
a tonal or composite noise for which there is no
identifiable corresponding external sound source'; is
experienced by ~15% of the European population. Tinnitus
may be experienced in one ear, both ears, or perceived as
originating from within the head. It can present as tonal
sounds, noise-like sounds, or a combination of both. The
perception can lead to emotional;/or cognitive
dysfunction, autonomic arousal, behavioural changes,;/or
functional disability (DeRidder 2021, Biswas 2022, Jarach
2022). There is no standard test for tinnitus in the
medical literature; audiologists typically test pitch (to
within half an octave); perceived loudness of the tone
using standard clinical equipment for testing hearing loss.
The underlying causes of tinnitus are not yet fully
understood,; the most effective treatments not yet
identified. We present the first release of an extended
Tinnitus matching app that includes a highly
individualizable tinnitus tone-matching tool; a
comprehensive questionnaire for mobile health tracking. The
app facilitates large data collection on tinnitus sounds
across aetiologies, co-occurring symptoms,;
demographics. Our intentions are threefold; 1) to provide
those experiencing tinnitus with a way to communicate what
they hear more precisely, 2) understand how tinnitus sounds
vary across demographics, how these relate to co-occurring
symptoms,; eventually – 3) to provide a means of
individualising any sound-based approach to symptom
amelioration. We present the approach; validation of the
tinnitus matching tool against common clinical measures.
Authors
CJ

Cheol-Ho Jeong

Acoustic Technology, Department of Electrical and PhotonicsnEngineering, DTU
IO

Izabela Ossowska

Hearing Systems, DTU HealthTech
MB

Mark Bo Jensen

Department of Engineering Technology and Didactics, DTU
ML

Mie LærkegårdJørgensen

Hearing Systems, DTU HealthTech

MB

Mikkel Brunstedt Nørgaard

Department of Engineering Technology and Didactics, DTU
Saturday May 30, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

Measurement; Analysis of Perceptual Characteristics of Binaural Cues
Saturday May 30, 2026 1:00pm - 3:00pm CEST
The application of binaural cue perception mechanisms to
multichannel audio compression technology can reduce
spatial parameter redundancy; effectively lower the
encoding bitrate. Binaural cues play a critical role in
sound source localization,; their frequency-dependent
characteristics yield varied perceptual localization
effects. However, current understanding of the specific
behavior of binaural cues at low frequencies, as well as
the similarities; differences between interaural time
difference (ITD); interaural level difference (ILD),
remains incomplete. To explore the relationship between
ITD-based; ILD-based azimuth perception, this study
non-uniformly selected nine ITD values; twelve ILD
values within the 300–1480 Hz frequency range to test ITD
; ILD perceptual azimuths, respectively. The experimental
method involved using fixed binaural cue stimuli while
varying the audio with known horizontal azimuth angles to
approach the target binaural cue stimulus. Test results
indicate that both ITD; ILD perceptual effects are
significantly influenced by frequency, with the minimum
perceptual azimuth values for both ITD; ILD observed at
700 Hz, suggesting that binaural cue perception azimuths
are closer to the median plane at this frequency.
Furthermore, surface fitting was applied to the perceptual
azimuths of ITD; ILD, revealing relatively similar
patterns. Based on experimental findings, this paper
analyzes the explorable perceptual correlation between
ITD-based; ILD-based azimuth perception. The application
of data in spatial audio coding contributes to the
efficient transmission; fidelity preservation of audio
signals. This study provides valuable insights for
optimizing binaural cue-based compression techniques,
ultimately supporting high-fidelity spatial audio
reproduction.
Authors
HW

Heng Wang

Wuhan Polytechnic University
MG

Mingyan Gao

Wuhan Polytechnic University
YX

Yiming Xu

Wuhan Polytechnic University,Wuhan,China
Saturday May 30, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

Subjective Evaluation of Stereo Width Shrinkage Method Using Semantic Differential Method; Scheffé’s Paired Comparison
Saturday May 30, 2026 1:00pm - 3:00pm CEST
The authors proposed a stereo-width shrinkage method for
headphone reproduction, in
which crosstalk from loudspeaker reproduction is added to
the original stereo
sources. In this study, we investigate the sound quality of
stereo-width-shrunken
sources with different parameter settings. A Semantic
Differential method is
employed to quantify the subjective characteristics with
five adjective pairs,;
the naturalness of the stereo width shrunk sources is
evaluated in detail with
Scheffé’s paired comparison. The results of the Semantic
Differential method
comprehensively rank the sound sources. Interestingly, the
results of the paired
comparison are not reversed in the natural; unnatural
evaluations, whereas the
negative evaluation yields reasonable results. These
results provide valuable
insights for practical sound-quality evaluation.
Authors
MA

Matsumoto Arisa

Kyushu Institute of Technology
avatar for Mitsunori Mizumachi

Mitsunori Mizumachi

Professor, Kyushu Institute of Technology
Mitsunori Mizumachi graduated from the Department of Acoustic Design, Kyushu Institute of Design, in 1995 and received his Ph.D. degree in Information Science from Japan Advanced Institute of Science and Technology in 2000. From 2000 to 2004, he worked as a researcher at Advanced... Read More →
Saturday May 30, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Perception, Poster
 


Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.