Name: Exploring Perceptual; Physiological Auditory Models for Assessing Speech Intelligibility in Enhanced Signals
Start: 2026-05-29T09:00:00+0200
End: 2026-05-29T11:00:00+0200

Schedule as of May 16, 2022 - subject to change

Default Time Zone is CEST - Central European Summer Time
You can change your view to your time zone (look for "Timezone" on the right)

LIVESTREAMS : A and B

ON DEMAND VIDEOS (previous days)

Exploring Perceptual; Physiological Auditory Models for Assessing Speech Intelligibility in Enhanced Signals

Friday May 29, 2026 9:00am - 11:00am CEST

Foyer Building 303A

Current deep learning approaches to speech enhancement rely
heavily on objective measures like mean squared error or
scale-invariant signal-to-distortion ratio as both training
objectives; evaluation metrics. While analytically
convenient, these benchmarks often fail to capture the
nuances of human perception or actual intelligibility.
Furthermore, the inconsistent integration of metrics like
Short-Term Objective Intelligibility or Perceptual
Evaluation of Speech Quality into training; evaluation
pipelines leaves a gap between algorithmic performance;
perceptual reality. This paper proposes a transition
towards evaluation methodologies grounded in
psychoacoustics; audiological modeling. Our study
explores two distinct methods to characterise enhanced
signals. On one hand, we employ a perceptual approach based
on the Cambridge loudness model to assess the preservation
of spectral excitation patterns; perceived intensity. On
the other hand, we adopt a biophysical approach by
utilising CoNNear, a convolutional model of the human
auditory periphery. This allows us to simulate
representations of responses at different stages of the
auditory periphery to observe how speech enhancement
processing affects the physiological representation of
speech. We analyse pre-trained speech enhancement models
using automatic speech recognition; Short-Term Objective
Intelligibility as an additional proxy for human
intelligibility. By mapping automatic speech recognition
performance against loudness; peripheral response
patterns, we investigate the extent to which current
enhancement strategies maintain the perceptual;
physiological integrity of the speech signal. This work
aims to identify features predictive of intelligibility,
providing a foundation for speech enhancement systems
optimised for the human listener rather than purely
signal-based objective functions.

Authors

François Effa

Université de Lorraine, CNRS, Inria, Loria, Nancy, France

Romain Serizel

LORIA - Laboratoire Lorrain de Recherche en Informatique etnses Applications

Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

AI and Machine Learning in Audio, Poster | Audio Processing, Poster | Perception, Poster

Presentation Type Poster

AES Europe 2026

François Effa

Romain Serizel

Attendees (3)

Get help with the event

AES Europe 2026

François Effa

Romain Serizel

Attendees (3)

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event