Loading…
Schedule as of May 16, 2022 - subject to change

Default Time Zone is CEST - Central European Summer Time
You can change your view to your time zone (look for "Timezone" on the right)


LIVESTREAMS : A and B


ON DEMAND VIDEOS (previous days)
 
Type: Audio Processing clear filter
arrow_back View All Dates
Friday, May 29
 

9:00am CEST

A method to synchronize dynamic media stream on heterogenous media playback devices
Friday May 29, 2026 9:00am - 9:30am CEST
Audio synchronization across heterogeneous media playback
devices is essential for delivering immersive sound
experiences in applications such as speaker group play;
multi-room audio playback. Existing synchronization
techniques predominantly rely on tightly coupled network
infrastructures; often embed a media sequence;
timestamp information to the media packet at the
transmitting source end, which restrict flexibility of
selecting the transmitting source; also compromises
robustness under dynamic network conditions. This paper
proposes a network; source independent audio
synchronization framework that eliminates dependency on
embedding media sequence; timestamps. The proposed
system employs an audio fingerprinting-based media
sequencing algorithm amongst the media playback devices
without relying on the type of transmitting source; the
network availability. A novel audio synchronization
algorithm is proposed which first determines a common
sequence start information given a dynamic media stream
from the transmitting source; then communicates the
fingerprint; timestamp amongst the media playback
devices without modifying the original audio packet
structure. Experimental results demonstrate that the
proposed approach achieves a high audio-audio
synchronization of less than 10ms across media playback
devices in a no network environment, thereby extending the
scope of immersive audio application irrespective of the
transmitting source.
Authors
AS

Avinash Singh

Samsung Research Institute, Delhi (SRID)
MS

Mohit Singh

Samsung Research Institute, Delhi (SRID)
avatar for Natasha Meena

Natasha Meena

Samsung Research Institute, Delhi (SRID)
I am working as Software developer in Samsung Research Institute India - Delhi and am responsible for development of features related to Samsung sound device’s
Friday May 29, 2026 9:00am - 9:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Exploring 2D Ambisonics by Amplitudes; Phases
Friday May 29, 2026 9:00am - 9:30am CEST
We present a spectral-like reformulation of 2D ambisonics,
enabling an alternative representation of the sound field
in terms of amplitudes; phases. We hypothesise that it
simplifies the representation; creative manipulation of
2D ambisonics, beyond encoded directional point sources.

In 2D high-order ambisonics (HOA) of order N, a sound field
can be represented as a 2π-periodic angular function as a
combination of circular harmonics (Y_m) weighted by the
coefficients (a_m) with m ∈ [-N, N]. This representation
can be reformulated in terms of N+1 amplitudes; N
phases, similarly to a Fourier decomposition.

A simple example of this representation is the ambisonic
encoder at an angle theta. Phases are then multiples of a
phase phi = theta/2π, as frequencies are multiples of a
fundamental in harmonic sounds. Therefore, the
amplitude-phase approach can draw on the field of sound
synthesis, between harmonic; inharmonic modelling.
Operations on ambisonic vectors in amplitude-phase also
rely on Fourier representation, namely the spectral
convolution of two vectors (element-wise products of the
amplitudes, element-wise sums of the phases). Spectral
convolution has vast potential in ambisonics, allowing to
represent all the usual spatial operations (geometric;
transformative) in a simple manner.

To test this approach, we are currently developing an
ambisonic synthesiser based on Faust functions running in
Max environment. We are evaluating the scope of this
representation, both theoretical; compositional,;
then attempt to expand this approach to 3D ambisonics.
Authors
avatar for Alain Bonardi

Alain Bonardi

Professor in Computer Science and Music Creation, University of Paris 8
Alain Bonardi is a Professor of Computer Science and Music Creation at Paris 8 University, where he is based in the Music Department and is a member of the Musidanse laboratory.
There, he co-directs the CICM (Center for Research in Computer Science and Music Creation) with Anne... Read More →
A

AxelChemla-Romeu-Santos

University of Paris 8
EF

Emma Frid

University of Paris 8
PG

Paul Goutmann

University of Paris 8
Friday May 29, 2026 9:00am - 9:30am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Voice-Based Fatigue Detection for Military Personnel: A Multi-Modal Machine Learning Framework with Acoustic Feature Emphasis
Friday May 29, 2026 9:00am - 11:00am CEST
This study presents a voice-centered machine learning
framework for detecting mental fatigue in military
personnel, integrating acoustic analysis with physiological
biosensors to enhance detection robustness. Mental fatigue
poses critical safety; performance challenges in
military operations, yet cultural stigma often prevents
self-reporting. We collected multi-modal data from 23
participants across two fatigue states, extracting
comprehensive acoustic features including sound pressure
level (SPL), formants, mel-frequency cepstral coefficients
(MFCCs), jitter, shimmer, harmonic-to-noise ratio (HNR),
; temporal speech characteristics. These voice features
were combined with electroencephalography (EEG),
photoplethysmography (PPG),; temperature data to train
multiple machine learning classifiers. The voice-based
models achieved accuracies between 82-85\%, with support
vector machines (SVM); long short-term memory (LSTM)
networks demonstrating superior performance. When acoustic
features were combined with physiological markers,
classification accuracy improved to 92\%, with
Classification; Regression Trees (CART); Linear
Discriminant Analysis (LDA) emerging as top performers.
Statistical analysis identified SPL; formant variance as
the most discriminative voice features, while Lempel-Ziv
Complexity (LZC); theta/beta ratio proved most reliable
for EEG. Evaluation on new participants yielded 67\%
accuracy, revealing model generalization challenges that
inform future research directions. This work demonstrates
that voice-based machine learning systems, when augmented
with physiological data, offer a promising non-invasive
approach to real-time fatigue monitoring in operational
military environments.
Authors
CC

Claire Courchene

Applied Perception Associate Engineer, GN
I’m a creative technologist and interaction designer exploring how sound, technology, and human experience meet. With an MScEng in Sound & Music Computing, I prototype audio interactions, build ML‑driven tools, and design experiments around perception. My background spans music... Read More →
Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Exploring Perceptual; Physiological Auditory Models for Assessing Speech Intelligibility in Enhanced Signals
Friday May 29, 2026 9:00am - 11:00am CEST
Current deep learning approaches to speech enhancement rely
heavily on objective measures like mean squared error or
scale-invariant signal-to-distortion ratio as both training
objectives; evaluation metrics. While analytically
convenient, these benchmarks often fail to capture the
nuances of human perception or actual intelligibility.
Furthermore, the inconsistent integration of metrics like
Short-Term Objective Intelligibility or Perceptual
Evaluation of Speech Quality into training; evaluation
pipelines leaves a gap between algorithmic performance;
perceptual reality. This paper proposes a transition
towards evaluation methodologies grounded in
psychoacoustics; audiological modeling. Our study
explores two distinct methods to characterise enhanced
signals. On one hand, we employ a perceptual approach based
on the Cambridge loudness model to assess the preservation
of spectral excitation patterns; perceived intensity. On
the other hand, we adopt a biophysical approach by
utilising CoNNear, a convolutional model of the human
auditory periphery. This allows us to simulate
representations of responses at different stages of the
auditory periphery to observe how speech enhancement
processing affects the physiological representation of
speech. We analyse pre-trained speech enhancement models
using automatic speech recognition; Short-Term Objective
Intelligibility as an additional proxy for human
intelligibility. By mapping automatic speech recognition
performance against loudness; peripheral response
patterns, we investigate the extent to which current
enhancement strategies maintain the perceptual;
physiological integrity of the speech signal. This work
aims to identify features predictive of intelligibility,
providing a foundation for speech enhancement systems
optimised for the human listener rather than purely
signal-based objective functions.
Authors
FE

François Effa

Université de Lorraine, CNRS, Inria, Loria, Nancy, France
RS

Romain Serizel

LORIA - Laboratoire Lorrain de Recherche en Informatique etnses Applications
Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

Objective Quality Models for Decision-Making in Speech Coding
Friday May 29, 2026 9:00am - 11:00am CEST
Objective quality evaluation is widely used in speech
coding, yet objective estimates often show limited
agreement with subjective listening-test results. Rather
than focusing on absolute score accuracy, this paper
evaluates objective speech quality models from a
decision-making perspective, defined as their ability to
support comparative judgments between speech codecs or
codec configurations. A formal ITU-R P.800 Absolute
Category Rating (ACR) listening test was conducted with 30
listeners across 24 conditions, covering conventional;
neural monophonic speech codecs operating under
clear-channel conditions at sampling frequencies from 16 to
48 kHz; bit rates ranging from below 1 kbps to above 16
kbps. The speech material consisted of internally recorded,
clean French-language speech that was not used in the
development or training of any of the evaluated codecs or
objective quality models. Seven objective quality models,
namely PESQ, VISQOL Speech, VISQOL Audio, WARP-Q, NISQA,
UTMOS,; DistillMOS, were evaluated on the same material.
Decision-making performance was assessed by comparing
subjective; objective rankings using Kendall’s rank
correlation coefficient; by analyzing pairwise codec
comparisons using t-tests at a 95% confidence level. The
results show that some objective quality models are
effective for comparing bit rate variations within a given
speech coding technology, provided that all other codec
parameters remain unchanged (e.g., sampling frequency).
However, all models exhibit limitations, including
tendencies toward over- or underestimation for certain
technologies, as well as reduced reliability when applied
across different sampling frequencies. Despite its
conventional origins, PESQ remains capable of supporting
decision-making even when applied to neural speech codecs.
Authors
CL

Clémence Lamballe

Universite de Sherbrooke
PG

Philippe Gournay

Universite de Sherbrooke
RL

Roch Lefebvre

Universite de Sherbrooke
Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

9:00am CEST

The Ambisonic Denoising Paradox: U-Net Processing Degrades ASR Transcription Quality for Medical Speech
Friday May 29, 2026 9:00am - 11:00am CEST
Spatial audio recording using higher-order Ambisonics
offers rich directional information for medical speech
capture, yet challenging hospital acoustic environments
motivate preprocessing with neural denoising algorithms.
This study investigates whether U-Net-based denoising of
third-order ambisonic recordings improves automatic speech
recognition (ASR) quality for medical applications. We
developed the Medical Immersive Audio Corpus (MIAC),
comprising 1,759 utterances (6.43 hours) of Polish medical
speech recorded with a Zylia ZM-1 microphone in
uncontrolled hospital environments, capturing 16-channel
third-order Ambisonics across multiple specializations
including thyroid ultrasonography, surgical procedures,;
general diagnostics. We applied a U-Net architecture with
dual attention mechanisms trained using the Noise2Noise
paradigm to denoise the corpus, then evaluated
transcription quality using ten Whisper ASR models ranging
from 39 million to 1.55 billion parameters, including
domain-adapted medical variants. Surprisingly, we
discovered a "noise reduction paradox" where denoising
degraded transcription quality for seven of ten models,
with statistically significant increases in Word Error Rate
(WER); Character Error Rate (CER) for general-purpose
base, small,; medium models. Only the domain-adapted
whisper-medium-68000-abbr model showed statistically
significant improvement (p=0.0008), while large-scale
models (large-v2, large-v3) exhibited robustness with
negligible changes. Effect sizes remained small (Cohen's d
< 0.2) across all models. These counterintuitive findings
suggest modern ASR systems implicitly utilize background
noise characteristics as informative features,; that
preprocessing pipelines should be reconsidered for
domain-specific applications. Our results provide practical
guidance for medical speech processing system design.
Authors
avatar for Bartlomiej Mroz

Bartlomiej Mroz

Assistant Professor, Gdańsk University of Technology
PhD, Spatial Audio & Immersive Media Researcher, Recording Engineer, Statistics enthusiast
SZ

Szymon Zaporowski

Gdańsk University of Technology
Friday May 29, 2026 9:00am - 11:00am CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:00am CEST

Detecting Bandwidth Variation Artifacts in Perceptual Audio Coding
Friday May 29, 2026 10:00am - 10:30am CEST
Accurate identification of audio coding artifacts is
instrumental in encoder design, audio post-processing,;
perceptual quality assessment. This paper addresses the
detection of artifacts arising from changes in the
effective bandwidth of coded audio signals caused by coarse
spectral quantization. Such bandwidth variations give rise
to two prominent artifact types: bandwidth limitation (BL)
; birdies, also referred to as spectral islands (SI).
Blind detection methods, requiring no reference signal, are
presented for both artifact types. Bandwidth limitation
is detected by analyzing variations in the zero-crossing
count across time-domain subband signals, enabling
estimation of both fixed; time-varying cutoff
frequencies. Spectral islands are identified through
analysis of the spectrogram by detecting clusters of
isolated components in the time–frequency domain,
characterized by their temporal; spectral extents. The
proposed methods are evaluated using audio material from
the ODAQ; USAC verification datasets. Results show that
the BL detection method achieves an average bandwidth
estimation error of approximately 160 Hz; demonstrates
robustness to noisy bandwidth-limited signals. In addition,
the detected birdie artifacts are perceptually validated
through listening tests, indicating an improvement in
perceived quality following detection; subsequent
suppression of the birdie artifacts.
Authors
AN

Andreas Niedermeier

Fraunhofer IIS, Erlangen

BE

Bernd Edler

International Audio Laboratories Erlangen, Germany
DD

Dipanjan Datta Roy

International Audio Labs, Erlangen
avatar for Sascha Dick

Sascha Dick

Fraunhofer IIS, Fraunhofer IIS, Erlangen
Germany
Friday May 29, 2026 10:00am - 10:30am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Audio Processing, Lecture

10:30am CEST

Spatial Estimation of Room Acoustic Parameters using Sound Field Reconstruction Methods
Friday May 29, 2026 10:30am - 11:00am CEST
The acoustic characterisation of indoor spaces is crucial
for a wide range of applications. While global metrics
provide convenient descriptors of a room's overall
behaviour, a more spatially detailed analysis offers deeper
insight into the spatio-temporal structure of the sound
field, albeit at a higher experimental cost. This paper
proposes a methodology that leverages the predictive
capabilities of sound field reconstruction methods to
estimate room acoustic parameters as a function of
position. The approach is experimentally evaluated in an
auditorium, where it achieves accurate estimation of
temporal; energetic room acoustic parameters across the
entire audience area. In addition, the reconstructed field
yields higher intelligibility indices compared to the raw
measurements. Overall, these results highlight the
potential of sound field reconstruction techniques as a
practical tool for room acoustic characterisation; for
supporting assistive listening technologies.
Authors
avatar for Antonio Figueroa-Duran

Antonio Figueroa-Duran

Universidad Politécnica de Madrid
EF

Efren Fernandez-Grande

Universidad Politécnica de Madrid
Friday May 29, 2026 10:30am - 11:00am CEST
Aud 42 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

10:30am CEST

Lossless Audio Coding revisited
Friday May 29, 2026 10:30am - 11:00am CEST
MPEG-4 SLS (scalable lossless coding) was published more
than 20 years ago. In the meantime several tools to improve
coding efficiency; flexibilities have been invented.
Currently, in MPEG WG6 (audio coding) there are two
standardization activities on lossless audio coding: Audio
Coding for Machines (ACoM); Biomedical; general
waveform signal coding (BWC).
ACoM phase 1 originally was targeted only towards lossless
storage formats for training of machine listening schemes,
but additional uses cases like “user generated content
analysis”, “live stream content analysis”,; “artistic
creation” have been added. The focus was extended to the
transmission of audio data from microphone (arrays) to
central processing units.
BWC is a joint activity with TU-R SG21. While ACoM started
with a large number of use cases; includes the
specification of a rich set of metadata BWC started with a
focus on medical data like electroencephalogram (EEG);
electrocardiogram (ECG). However, BWC can be used for audio
signals, too; medical data coding are on the list of use
cases for ACoM.
The call for proposals (CfP) for ACoM was completed in
January 2025. Two proposals, both outperforming MPEG-4 SLS,
had been submitted. Both proposals reused; optimized
core codecs from BWC. Currently, MPEG audio investigates
how the ACoM proposals can be merged into BWC. This merge
process must be completed end of April 2026.
The presentation will give details about ACoM use cases,
the ACoM CfP process, the results of the CfP; results
from the merge process.
Authors
avatar for Thomas Sporer

Thomas Sporer

Deputy Director IDMT / Convenor MPEG audio, Fraunhofer IDMT
Friday May 29, 2026 10:30am - 11:00am CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

11:00am CEST

Obsidian Neural: Open-Source VST3 for Real-Time Generative AI – Architecting the AI as a Live Performance Instrument
Friday May 29, 2026 11:00am - 12:00pm CEST
Obsidian Neural is a novel, open-source VST3 plugin that
addresses the technical challenges of integrating
generative AI models directly into a low-latency digital
audio workstation (DAW) environment. This workshop will
provide a deep dive into the architecture designed to use
AI as a real-time performance instrument. We will cover the
C++/DSP strategies necessary for minimizing latency during
the asynchronous generation of audio loops via models like
Stable Audio Open. Crucially, we will detail the system's
ability to maintain musical coherence during a live mix,
achieved through an internal LLM "Brain" that processes
contextual session data (BPM, key, existing tracks) to
enrich generation prompts. Furthermore, we will explore the
technical solutions implemented for seamless integration
with the live mixing paradigm: quantized MIDI triggering,
multi-output routing, and the novel "Draw-to-Sound"
feature, which employs a Vision Language Model (VLM) to
translate visual input into musical parameters. This work
demonstrates a robust framework for generative AI to
function as an instantaneous, adaptable partner within
professional audio engineering workflows.
Speakers
AC

Anthony Charretier

Independent Developer
Friday May 29, 2026 11:00am - 12:00pm CEST
Building 302, 2nd floor Technical University of Denmark Asmussens Alle, Building 302 DK-2800 Kgs. Lyngby Denmark

12:00pm CEST

Saul Walker Student Design Competition
Friday May 29, 2026 12:00pm - 1:30pm CEST
The Saul Walker Student Design Competition is a long-running event of the Audio Engineering Society that highlights practical and creative work in audio design. It brings together experienced judges and a wide range of strong student submissions each year.

During this session, students from around the world will present their projects and bring their hardware designs for hands-on inspection by the judges. The format encourages open discussion, giving attendees a chance to hear how ideas are evaluated and improved in a professional setting.

Sponsored by API, the competition includes cash prizes for the winners. More importantly, it offers students valuable feedback and the opportunity to connect with people working in the industry. The session is open to everyone—students and non-students alike—who are interested in seeing what participants have created and learning more about current work in audio design.
Speakers
avatar for Jamie Angus-Whiteoak

Jamie Angus-Whiteoak

Emeritus Professor/Consultant/VP-Northern Europe, AES
Jamie Angus-Whiteoak Is Emeritus Professor of Audio Technology at Salford University and VP for Northern Europe.

Her interest in audio was crystallized aged 11 when she visited the WOR studios, NYC, in 1967 on a school trip. After this she was hooked, and spent much of her free ti... Read More →
avatar for Christoph Thompson

Christoph Thompson

Director of Music Media Production, AES Education Committee, Ball State University
Christoph Thompson is vice-chair of the AES audio education committee. He is the chair of the AES Student Design Competition and the Matlab Plugin Design Competition. He is the director of the music media production program at Ball State University. His research topics include audio... Read More →
EL

Ewa Łukasik

Poznan University of Technology, Institute of ComputingnScience
Authors
avatar for Sascha Disch

Sascha Disch

Fraunhofer IIS, Fraunhofer IIS
Sascha Disch received his Dipl.-Ing. degree in electrical engineering from the Technical University Hamburg-Harburg (TUHH) in 1999 and joined the Fraunhofer Institute for Integrated Circuits (IIS) the same year. Ever since he has been working in research and development of perceptual... Read More →
Friday May 29, 2026 12:00pm - 1:30pm CEST
Aud 49 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

12:30pm CEST

Spatial Quality Measure for Mixed-phase Impulse Response Equalization
Friday May 29, 2026 12:30pm - 1:00pm CEST
Mixed-phase impulse response equalization can improve
magnitude; phase response, but conventional objectives
such as mean-squared error (MSE) can favor solutions that
introduce objectionable temporal artifacts, including
pre-echo; extended post-echo ringing. This paper
proposes a Spatial Equalization Quality Measure (SEQM) to
select a mixed-phase equalization filter that better
controls these artifacts while remaining computationally
simple; applicable across multiple listening positions.
SEQM combines (i) a temporal-domain metric that penalizes
energy preceding the main pulse of an impulse response;
energy persisting after it, while also accounting for the
decay rate of the post-response tail, with (ii) a spatial
aggregation rule that summarizes quality across measurement
positions. We use SEQM to select the modeling delay for
mixed-phase finite-impulse-response (FIR) equalization;
to compare mixed-phase FIR designs with minimum-phase FIR
; IIR alternatives under a common multi-position
measurement framework. Experiments using semi-anechoic
measurements across 34 spatial positions for two
loudspeakers show that SEQM consistently selects
substantially shorter delays than MSE-based selection;
yields impulse responses with reduced pre-echo; faster
post-response decay, while maintaining comparable
frequency-response equalization. These results suggest that
SEQM is a practical objective tool for designing
multi-position mixed-phase equalization filters.
Authors
BD

Bill Decanio

Samsung Electronics
avatar for Sunil Bharitkar

Sunil Bharitkar

Samsung Research America

Friday May 29, 2026 12:30pm - 1:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Audio Processing, Lecture

12:30pm CEST

Innovative Measurement of Speech Intelligibility – Applications of Listening Effort in Research & Practice
Friday May 29, 2026 12:30pm - 2:00pm CEST
Speech intelligibility is a key factor in successful
communication across various domains, including research,
post-production for film and television, live sound
reinforcement, and audio production. Traditional assessment
methods often lack objectivity or fail to capture the
listener’s experience in real-world scenarios. In this
workshop, we introduce an innovative approach to measuring
speech intelligibility based on the concept of “Listening
Effort.” We will present the underlying technology, share
practical examples from different application areas, and
demonstrate how this method can be integrated into
workflows to optimize intelligibility. Attendees will have
the opportunity to participate in a hands-on demonstration
and discuss potential use cases relevant to their own work.
This session is designed for professionals and researchers
seeking reliable and actionable tools for evaluating and
improving speech intelligibility in diverse environments.
In this workshop, we present a new technology for measuring
speech intelligibility (“Listening Effort”). The method is
used in research, post-production (film/TV), live sound,
and audio production. The session is aimed at professionals
from both academia and industry who are interested in
objectively assessing and optimizing speech intelligibility.

Participants will be able to join a short demo/exercise and
ask questions.

Introduction & Relevance: Overview of the importance of
speech intelligibility across different fields
Technology & Methodology: Presentation of the measurement
method and underlying concepts
Practical Examples: Case studies from research,
post-production (film/TV), live sound, and production
Live Demo / Interactive Exercise: Practical demonstration
and opportunity for active participation
Discussion & Outlook: Q&A, exchange of ideas, and future
perspectives
Speakers
HB

Hannah Baumgartner

Fraunhofer IDMT
JR

Jan Rennies-Hochmuth

Fraunhofer IDMT
Friday May 29, 2026 12:30pm - 2:00pm CEST
Aud 41 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

Systematization of Multiplier-less Convolution for 1-bit Audio Signal
Friday May 29, 2026 1:00pm - 1:30pm CEST
High-speed 1-bit signals generated by oversampling are
widely used in audio applications as they allow simple
demodulation via low-pass filtering while preserving
in-band spectral characteristics with high accuracy.
However, conventional FIR filtering of such signals
generally requires conversion to a multi-bit representation
at a common sampling frequency, which increases
computational cost; complicates the overall processing
flow. This paper addresses the convolution of high-speed
1-bit audio signals with multi-bit FIR impulse responses
; presents a systematic formulation of a multiplier-less
convolution approach. Based on a mathematical
reinterpretation of convolution, the proposed formulation
describes how time shifting; amplitude weighting can be
expressed through structured rearranging of 1-bit samples
without arithmetic operations. This provides a theoretical
description of previously reported 1-bit convolution
methods; however, its validity has not been fully
formalized. We examine the spectral characteristics of the
proposed convolution method; compare them with those
obtained by multi-bit convolution followed by ΔΣ
modulation. Experiments are conducted by convolving 1-bit
input signals with FIR filters having multi-band frequency
responses. Spectral analysis shows that the proposed method
achieves extremely high agreement with the standard
approach within the audible band while the differences
appear primarily at much higher frequencies outside the
audible range. These results demonstrate that convolution
of high-speed 1-bit audio signals can be achieved without
multipliers, suggesting the potential for highly efficient
hardware-oriented signal processing architectures.
Authors
IS

Iori Sakurai

Waseda University
TS

Tomohiro Sakaguchi

Doctoral student, Waseda University
YO

Yasuhiro Oikawa

Waseda University

YG

Yuta Gomi

Waseda University
Friday May 29, 2026 1:00pm - 1:30pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
  Audio Processing, Lecture

1:00pm CEST

Geometry Sensitivity in Low-Count Virtual Microphone Arrays: From Tetrahedral Baselines to Stochastic Spherical Layouts
Friday May 29, 2026 1:00pm - 3:00pm CEST
Virtual Microphone Array techniques are being investigated
by the authors to support room acoustics optimisation in
live sound environments. In our recent AES paper, “Room
Acoustics Optimisation Using Virtual Microphone Arrays”, a
notable outcome was that a compact four-microphone
tetrahedral array performed strongly relative to its low
sensor count. Recent virtual sensing; Remote Microphone
Technique research treats microphone placement as an
explicit design variable. It reports improved remote
estimation performance when microphone layouts are
deliberately chosen for the task, rather than adopted as
fixed, standard configurations.
This submission builds on our prior VMA work by focusing on
the four-microphone case, where geometry choices are
especially constrained. We compare a tetrahedral baseline
with an ensemble of stochastically generated spherical
layouts at the same array aperture using Monte Carlo
simulation. We apply a consistent evaluation protocol
across multiple listening-region offsets; standard
beamforming estimators to isolate variability due to
geometry alone. The central proposition is that, for
low-count VMAs, geometry is a first-order design parameter.
Tetrahedral remains a credible baseline, but lightweight
stochastic exploration can reveal alternative layouts that
are competitive;, in some cases, superior without
increasing channel count.
Authors
avatar for Brian de Brit

Brian de Brit

Lecturer, Technological University Dublin
Brian de Brit is a lecturer in the School of Electrical and Electronic Engineering at Technological University Dublin. He holds a B.Sc. in Mathematical Physics (University College Dublin), an M.Phil. in Music and Media Technologies (Trinity College Dublin), and a Master of Engineering... Read More →
DD

David Dorran

Technological University Dublin
Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

Clustered Virtual Microphone Arrays for Listener-Level Monitoring; Room-Correction in Live Sound
Friday May 29, 2026 1:00pm - 3:00pm CEST
This paper introduces clustered virtual microphone arrays
as a step toward improving listener-level virtual
microphone estimation for live sound. Multiple compact
microphone sub-arrays are placed around a nominal overhead
position. Each sub-array produces a virtual microphone
estimate,; the estimates are fused. The aim is to attack
the estimation problem from multiple viewpoints; reduce
sensitivity to any one array placement or geometry.
The work builds on our earlier paper, “Room Acoustics
Optimisation Using Virtual Microphone Arrays”. That paper
proposed virtual microphones estimated from an overhead
array as a measurement layer for live sound optimisation.
It also highlighted a key limitation: in its initial form,
virtual microphone estimation quality was not yet strong
enough for reliable use across positions. The present paper
targets that limitation. We outline the clustered array
idea; treat cluster count; inter-cluster spacing as
design parameters. Virtual microphones are estimated using
beamforming; combined using simple fusion. Performance
is assessed with objective signal measures, including SNR
; frequency-; phase-related error measures, across
multiple listener-level target positions. The results
support further refinement under more realistic room
conditions; further study of the link between improved
estimation quality; FIR-based correction outcomes.
Authors
avatar for Brian de Brit

Brian de Brit

Lecturer, Technological University Dublin
Brian de Brit is a lecturer in the School of Electrical and Electronic Engineering at Technological University Dublin. He holds a B.Sc. in Mathematical Physics (University College Dublin), an M.Phil. in Music and Media Technologies (Trinity College Dublin), and a Master of Engineering... Read More →
DD

David Dorran

Technological University Dublin
Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

A Time–Frequency Integrated Framework for Frequency-Invariant Beamforming in Loudspeaker Arrays
Friday May 29, 2026 1:00pm - 3:00pm CEST
Loudspeaker array beamforming technology has been widely
used; however, current frequency-domain; time-domain
design methods for calculating FIR filters face challenges,
including the need for modeling delay; high
computational complexity. To address these issues, this
paper proposes a time–frequency integrated framework. This
framework supports both pressure matching; amplitude
matching methods, enabling not only the realization of
traditional superdirective beams but also the design of
frequency-invariant beams. For the nonlinear optimization
problem in amplitude matching, an efficient solving
algorithm based on the Alternating Direction Method of
Multipliers (ADMM) is introduced. Experimental results
demonstrate that the proposed method combines the
advantages of existing frequency-domain; time-domain
approaches, directly computing FIR filter coefficients
without delay modeling while maintaining high computational
efficiency. This provides an effective solution for beam
control in loudspeaker arrays.
Authors
JY

Jianbin Yang

Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark
KP

Keyu Pan

Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark
NC

Ning Cong

Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark
XT

Xing Tian

Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark, Dynaudio Lab, Gammel Lundtoftevej 3B, Copenhagen, Denmark
Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:00pm CEST

The Impact of Frequency Gradient on Nonlinear Pulse Distribution in the Farina Technique
Friday May 29, 2026 1:00pm - 3:00pm CEST
The Exponential Sine Sweep (ESS) technique, popularized by
Angelo Farina, has become a cornerstone of modern
electroacoustic measurement due to its unique capability to
simultaneously extract a system’s linear impulse response
; its individual harmonic distortion components. Standard
implementation of this method almost exclusively utilizes a
low-to-high (upward) exponential sine sweep. However,
during a technical Q&A session at the AES Europe 2025
Convention in Warsaw, a question was raised: what are the
practical consequences of reversing the sweep direction?
This inquiry is particularly relevant given that several
industry-standard measurement platforms often employ
high-to-low (downward) sweeps to optimize the mechanical
; thermal stability of the device under test (DUT) while
performing stepped or swept sinusoidal analysis.
This paper provides an investigation into the temporal
behavior of nonlinearities when the frequency gradient of
an exponential sweep is inverted. Through formal
mathematical derivation; numerical simulations the study
proves that while the spacing between distortion orders
remains identical in magnitude, the polarity; time
distribution of these impulses is reversed. Specifically,
we demonstrate that in a downward sweep, the distortion
products shift from the "pre-causal" negative time region
to the "post-causal" positive time region. This shift
causes harmonic distortion pulses to emerge within the
reverberant tail of the impulse response, leading to
significant contamination of decay measurements;
energy-time curves. By contrasting the "tracking filter"
paradigm with "time-domain deconvolution," this work
clarifies why sweep direction is a critical parameter that
must be aligned with the specific goals of the measurement
protocol.
Authors
avatar for Daniele Ponteggia

Daniele Ponteggia

Materiacustica Srl
Friday May 29, 2026 1:00pm - 3:00pm CEST
Foyer Building 303A Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

1:30pm CEST

An Extended Multichannel Frequency-Domain FxLMS Algorithm for Real-Time Full-Band Adaptive Transaural Reproduction
Friday May 29, 2026 1:30pm - 2:00pm CEST
This paper presents a multichannel adaptive filtering
algorithm for real-time full-band adaptive transaural
reproduction on general-purpose hardware. It is based on a
multichannel frequency-domain FxLMS algorithm using an
overlap-save framework for both filtering; adaptation,
; is extended with (i) online plant identification for
fully adaptive operation, (ii) frequency-dependent
normalization for faster convergence,; (iii)
frequency-dependent regularization to stabilize adaptation.
The proposed algorithm is implemented in C language on a
standard desktop PC; evaluated on a 4x2 transaural
configuration running in real time at 48 kHz with 2048-tap
control filters. Two evaluation tests are conducted. The
first test consists of reproducing two uncorrelated
white-noise signals at the ears of a manikin using
crosstalk cancellation as the performance metric. An
average crosstalk cancellation of 32 dB over 100 Hz–20 kHz
is demonstrated. The second experiment considers binaural
signal reproduction as a more realistic use case of the
algorithm. In both cases, performance is assessed for both
a static listener; a moving listener scenario,
demonstrating the algorithm’s ability to rapidly re-adapt.
Friday May 29, 2026 1:30pm - 2:00pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:00pm CEST

Real-Time Implementation of Personal Sound Zones Using Partitioned Convolution in Purr Data
Friday May 29, 2026 2:00pm - 2:30pm CEST
Personal sound zones aim to reproduce distinct audio
contents in separate spatial regions using loudspeaker
arrays, while minimizing acoustic interference between
zones. Although well established theoretically, their
real-time implementation remains challenging due to the
long impulse responses involved; the latency constraints
of audio processing systems.
This work presents a real-time implementation of personal
sound zones based on the pressure matching method in a
static context, i.e. transfer functions between the
loudspeakers; the zones are assumed to remain constant.
Sound zone filters are computed in the frequency domain
from experimentally measured impulse responses between an
array of 18 loudspeakers; two microphone arrays of 9
microphones defining a bright zone; a dark zone. The
system performance is then evaluated in terms of acoustic
contrast, reproduction error,; effective frequency
range. To meet real-time constraints, a fast partitioned
convolution algorithm has been used, namely the
Uniformly-Partitioned Overlap Save (UPOLS). This methods
has been implemented in C++ as an external block for the
Purr Data real-time audio environment. Experimental
results, obtained in a semi-anechoic environment,
demonstrate that it enables stable real-time multichannel
convolution with negligible numerical error compared to
offline convolution. The proposed system results in a
functional real-time sound zones demonstrator, suitable for
experimental; interactive spatial audio applications.
The codes are shared in a GitHub repository so that the
scientific community can benefit from them.
Authors
GP

Guilhem Pagès

Laboratoire d'Acoustique de l'Université du Mans (LAUM),nUMR 6613
JB

Jean Beuchet

Laboratoire d'Acoustique de l'Université du Mans (LAUM),nUMR 6613
avatar for Manuel Melon

Manuel Melon

Professor, LAUM / LE MANS Université


TL

Titouan Lefrancois

Laboratoire d'Acoustique de l'Université du Mans (LAUM),nUMR 6613
Friday May 29, 2026 2:00pm - 2:30pm CEST
Aud 44 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark

2:00pm CEST

Audio Design Roundtable
Friday May 29, 2026 2:00pm - 3:00pm CEST
Join us for a panel discussion about audio design featuring some of the industry’s leading audio designers and educators. This session is meant to inspire upcoming designers and encourage dialogue with established audio designers.
 
The panelists will give a brief overview of their designs, their roles in the AES, and how and why educators and students should participate in the various design competitions that the AES has to offer. The panel discussion is followed by a Q&A session that allows for questions and exchange with the panelists.

Speakers
avatar for Jamie Angus-Whiteoak

Jamie Angus-Whiteoak

Emeritus Professor/Consultant/VP-Northern Europe, AES
Jamie Angus-Whiteoak Is Emeritus Professor of Audio Technology at Salford University and VP for Northern Europe.

Her interest in audio was crystallized aged 11 when she visited the WOR studios, NYC, in 1967 on a school trip. After this she was hooked, and spent much of her free ti... Read More →
avatar for George Massenburg

George Massenburg

Associate Professor of Sound Recording, Massenburg Design Works
George Y. Massenburg is a Grammy award-winning recording engineer and inventor. Working principally in Baltimore, Los Angeles, Nashville, and Macon, Georgia, Massenburg is widely known for submitting a paper to the Audio Engineering Society in 1972 regarding the parametric equali... Read More →
avatar for Christoph Thompson

Christoph Thompson

Director of Music Media Production, AES Education Committee, Ball State University
Christoph Thompson is vice-chair of the AES audio education committee. He is the chair of the AES Student Design Competition and the Matlab Plugin Design Competition. He is the director of the music media production program at Ball State University. His research topics include audio... Read More →
Friday May 29, 2026 2:00pm - 3:00pm CEST
Aud 41 Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
 


Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -