Audio synchronization across heterogeneous media playback devices is essential for delivering immersive sound experiences in applications such as speaker group play; multi-room audio playback. Existing synchronization techniques predominantly rely on tightly coupled network infrastructures; often embed a media sequence; timestamp information to the media packet at the transmitting source end, which restrict flexibility of selecting the transmitting source; also compromises robustness under dynamic network conditions. This paper proposes a network; source independent audio synchronization framework that eliminates dependency on embedding media sequence; timestamps. The proposed system employs an audio fingerprinting-based media sequencing algorithm amongst the media playback devices without relying on the type of transmitting source; the network availability. A novel audio synchronization algorithm is proposed which first determines a common sequence start information given a dynamic media stream from the transmitting source; then communicates the fingerprint; timestamp amongst the media playback devices without modifying the original audio packet structure. Experimental results demonstrate that the proposed approach achieves a high audio-audio synchronization of less than 10ms across media playback devices in a no network environment, thereby extending the scope of immersive audio application irrespective of the transmitting source.
I am working as Software developer in Samsung Research Institute India - Delhi and am responsible for development of features related to Samsung sound device’s
Friday May 29, 2026 9:00am - 9:30am CEST Aud 44Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
We present a spectral-like reformulation of 2D ambisonics, enabling an alternative representation of the sound field in terms of amplitudes; phases. We hypothesise that it simplifies the representation; creative manipulation of 2D ambisonics, beyond encoded directional point sources.
In 2D high-order ambisonics (HOA) of order N, a sound field can be represented as a 2π-periodic angular function as a combination of circular harmonics (Y_m) weighted by the coefficients (a_m) with m ∈ [-N, N]. This representation can be reformulated in terms of N+1 amplitudes; N phases, similarly to a Fourier decomposition.
A simple example of this representation is the ambisonic encoder at an angle theta. Phases are then multiples of a phase phi = theta/2π, as frequencies are multiples of a fundamental in harmonic sounds. Therefore, the amplitude-phase approach can draw on the field of sound synthesis, between harmonic; inharmonic modelling. Operations on ambisonic vectors in amplitude-phase also rely on Fourier representation, namely the spectral convolution of two vectors (element-wise products of the amplitudes, element-wise sums of the phases). Spectral convolution has vast potential in ambisonics, allowing to represent all the usual spatial operations (geometric; transformative) in a simple manner.
To test this approach, we are currently developing an ambisonic synthesiser based on Faust functions running in Max environment. We are evaluating the scope of this representation, both theoretical; compositional,; then attempt to expand this approach to 3D ambisonics.
Professor in Computer Science and Music Creation, University of Paris 8
Alain Bonardi is a Professor of Computer Science and Music Creation at Paris 8 University, where he is based in the Music Department and is a member of the Musidanse laboratory.
There, he co-directs the CICM (Center for Research in Computer Science and Music Creation) with Anne... Read More →
This study presents a voice-centered machine learning framework for detecting mental fatigue in military personnel, integrating acoustic analysis with physiological biosensors to enhance detection robustness. Mental fatigue poses critical safety; performance challenges in military operations, yet cultural stigma often prevents self-reporting. We collected multi-modal data from 23 participants across two fatigue states, extracting comprehensive acoustic features including sound pressure level (SPL), formants, mel-frequency cepstral coefficients (MFCCs), jitter, shimmer, harmonic-to-noise ratio (HNR), ; temporal speech characteristics. These voice features were combined with electroencephalography (EEG), photoplethysmography (PPG),; temperature data to train multiple machine learning classifiers. The voice-based models achieved accuracies between 82-85\%, with support vector machines (SVM); long short-term memory (LSTM) networks demonstrating superior performance. When acoustic features were combined with physiological markers, classification accuracy improved to 92\%, with Classification; Regression Trees (CART); Linear Discriminant Analysis (LDA) emerging as top performers. Statistical analysis identified SPL; formant variance as the most discriminative voice features, while Lempel-Ziv Complexity (LZC); theta/beta ratio proved most reliable for EEG. Evaluation on new participants yielded 67\% accuracy, revealing model generalization challenges that inform future research directions. This work demonstrates that voice-based machine learning systems, when augmented with physiological data, offer a promising non-invasive approach to real-time fatigue monitoring in operational military environments.
I’m a creative technologist and interaction designer exploring how sound, technology, and human experience meet. With an MScEng in Sound & Music Computing, I prototype audio interactions, build ML‑driven tools, and design experiments around perception. My background spans music... Read More →
Friday May 29, 2026 9:00am - 11:00am CEST Foyer Building 303ATechnical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
Current deep learning approaches to speech enhancement rely heavily on objective measures like mean squared error or scale-invariant signal-to-distortion ratio as both training objectives; evaluation metrics. While analytically convenient, these benchmarks often fail to capture the nuances of human perception or actual intelligibility. Furthermore, the inconsistent integration of metrics like Short-Term Objective Intelligibility or Perceptual Evaluation of Speech Quality into training; evaluation pipelines leaves a gap between algorithmic performance; perceptual reality. This paper proposes a transition towards evaluation methodologies grounded in psychoacoustics; audiological modeling. Our study explores two distinct methods to characterise enhanced signals. On one hand, we employ a perceptual approach based on the Cambridge loudness model to assess the preservation of spectral excitation patterns; perceived intensity. On the other hand, we adopt a biophysical approach by utilising CoNNear, a convolutional model of the human auditory periphery. This allows us to simulate representations of responses at different stages of the auditory periphery to observe how speech enhancement processing affects the physiological representation of speech. We analyse pre-trained speech enhancement models using automatic speech recognition; Short-Term Objective Intelligibility as an additional proxy for human intelligibility. By mapping automatic speech recognition performance against loudness; peripheral response patterns, we investigate the extent to which current enhancement strategies maintain the perceptual; physiological integrity of the speech signal. This work aims to identify features predictive of intelligibility, providing a foundation for speech enhancement systems optimised for the human listener rather than purely signal-based objective functions.
Objective quality evaluation is widely used in speech coding, yet objective estimates often show limited agreement with subjective listening-test results. Rather than focusing on absolute score accuracy, this paper evaluates objective speech quality models from a decision-making perspective, defined as their ability to support comparative judgments between speech codecs or codec configurations. A formal ITU-R P.800 Absolute Category Rating (ACR) listening test was conducted with 30 listeners across 24 conditions, covering conventional; neural monophonic speech codecs operating under clear-channel conditions at sampling frequencies from 16 to 48 kHz; bit rates ranging from below 1 kbps to above 16 kbps. The speech material consisted of internally recorded, clean French-language speech that was not used in the development or training of any of the evaluated codecs or objective quality models. Seven objective quality models, namely PESQ, VISQOL Speech, VISQOL Audio, WARP-Q, NISQA, UTMOS,; DistillMOS, were evaluated on the same material. Decision-making performance was assessed by comparing subjective; objective rankings using Kendall’s rank correlation coefficient; by analyzing pairwise codec comparisons using t-tests at a 95% confidence level. The results show that some objective quality models are effective for comparing bit rate variations within a given speech coding technology, provided that all other codec parameters remain unchanged (e.g., sampling frequency). However, all models exhibit limitations, including tendencies toward over- or underestimation for certain technologies, as well as reduced reliability when applied across different sampling frequencies. Despite its conventional origins, PESQ remains capable of supporting decision-making even when applied to neural speech codecs.
Spatial audio recording using higher-order Ambisonics offers rich directional information for medical speech capture, yet challenging hospital acoustic environments motivate preprocessing with neural denoising algorithms. This study investigates whether U-Net-based denoising of third-order ambisonic recordings improves automatic speech recognition (ASR) quality for medical applications. We developed the Medical Immersive Audio Corpus (MIAC), comprising 1,759 utterances (6.43 hours) of Polish medical speech recorded with a Zylia ZM-1 microphone in uncontrolled hospital environments, capturing 16-channel third-order Ambisonics across multiple specializations including thyroid ultrasonography, surgical procedures,; general diagnostics. We applied a U-Net architecture with dual attention mechanisms trained using the Noise2Noise paradigm to denoise the corpus, then evaluated transcription quality using ten Whisper ASR models ranging from 39 million to 1.55 billion parameters, including domain-adapted medical variants. Surprisingly, we discovered a "noise reduction paradox" where denoising degraded transcription quality for seven of ten models, with statistically significant increases in Word Error Rate (WER); Character Error Rate (CER) for general-purpose base, small,; medium models. Only the domain-adapted whisper-medium-68000-abbr model showed statistically significant improvement (p=0.0008), while large-scale models (large-v2, large-v3) exhibited robustness with negligible changes. Effect sizes remained small (Cohen's d < 0.2) across all models. These counterintuitive findings suggest modern ASR systems implicitly utilize background noise characteristics as informative features,; that preprocessing pipelines should be reconsidered for domain-specific applications. Our results provide practical guidance for medical speech processing system design.
Accurate identification of audio coding artifacts is instrumental in encoder design, audio post-processing,; perceptual quality assessment. This paper addresses the detection of artifacts arising from changes in the effective bandwidth of coded audio signals caused by coarse spectral quantization. Such bandwidth variations give rise to two prominent artifact types: bandwidth limitation (BL) ; birdies, also referred to as spectral islands (SI). Blind detection methods, requiring no reference signal, are presented for both artifact types. Bandwidth limitation is detected by analyzing variations in the zero-crossing count across time-domain subband signals, enabling estimation of both fixed; time-varying cutoff frequencies. Spectral islands are identified through analysis of the spectrogram by detecting clusters of isolated components in the time–frequency domain, characterized by their temporal; spectral extents. The proposed methods are evaluated using audio material from the ODAQ; USAC verification datasets. Results show that the BL detection method achieves an average bandwidth estimation error of approximately 160 Hz; demonstrates robustness to noisy bandwidth-limited signals. In addition, the detected birdie artifacts are perceptually validated through listening tests, indicating an improvement in perceived quality following detection; subsequent suppression of the birdie artifacts.
The acoustic characterisation of indoor spaces is crucial for a wide range of applications. While global metrics provide convenient descriptors of a room's overall behaviour, a more spatially detailed analysis offers deeper insight into the spatio-temporal structure of the sound field, albeit at a higher experimental cost. This paper proposes a methodology that leverages the predictive capabilities of sound field reconstruction methods to estimate room acoustic parameters as a function of position. The approach is experimentally evaluated in an auditorium, where it achieves accurate estimation of temporal; energetic room acoustic parameters across the entire audience area. In addition, the reconstructed field yields higher intelligibility indices compared to the raw measurements. Overall, these results highlight the potential of sound field reconstruction techniques as a practical tool for room acoustic characterisation; for supporting assistive listening technologies.
MPEG-4 SLS (scalable lossless coding) was published more than 20 years ago. In the meantime several tools to improve coding efficiency; flexibilities have been invented. Currently, in MPEG WG6 (audio coding) there are two standardization activities on lossless audio coding: Audio Coding for Machines (ACoM); Biomedical; general waveform signal coding (BWC). ACoM phase 1 originally was targeted only towards lossless storage formats for training of machine listening schemes, but additional uses cases like “user generated content analysis”, “live stream content analysis”,; “artistic creation” have been added. The focus was extended to the transmission of audio data from microphone (arrays) to central processing units. BWC is a joint activity with TU-R SG21. While ACoM started with a large number of use cases; includes the specification of a rich set of metadata BWC started with a focus on medical data like electroencephalogram (EEG); electrocardiogram (ECG). However, BWC can be used for audio signals, too; medical data coding are on the list of use cases for ACoM. The call for proposals (CfP) for ACoM was completed in January 2025. Two proposals, both outperforming MPEG-4 SLS, had been submitted. Both proposals reused; optimized core codecs from BWC. Currently, MPEG audio investigates how the ACoM proposals can be merged into BWC. This merge process must be completed end of April 2026. The presentation will give details about ACoM use cases, the ACoM CfP process, the results of the CfP; results from the merge process.
Obsidian Neural is a novel, open-source VST3 plugin that addresses the technical challenges of integrating generative AI models directly into a low-latency digital audio workstation (DAW) environment. This workshop will provide a deep dive into the architecture designed to use AI as a real-time performance instrument. We will cover the C++/DSP strategies necessary for minimizing latency during the asynchronous generation of audio loops via models like Stable Audio Open. Crucially, we will detail the system's ability to maintain musical coherence during a live mix, achieved through an internal LLM "Brain" that processes contextual session data (BPM, key, existing tracks) to enrich generation prompts. Furthermore, we will explore the technical solutions implemented for seamless integration with the live mixing paradigm: quantized MIDI triggering, multi-output routing, and the novel "Draw-to-Sound" feature, which employs a Vision Language Model (VLM) to translate visual input into musical parameters. This work demonstrates a robust framework for generative AI to function as an instantaneous, adaptable partner within professional audio engineering workflows.
Friday May 29, 2026 11:00am - 12:00pm CEST Building 302, 2nd floorTechnical University of Denmark Asmussens Alle, Building 302 DK-2800 Kgs. Lyngby Denmark
The Saul Walker Student Design Competition is a long-running event of the Audio Engineering Society that highlights practical and creative work in audio design. It brings together experienced judges and a wide range of strong student submissions each year.
During this session, students from around the world will present their projects and bring their hardware designs for hands-on inspection by the judges. The format encourages open discussion, giving attendees a chance to hear how ideas are evaluated and improved in a professional setting.
Sponsored by API, the competition includes cash prizes for the winners. More importantly, it offers students valuable feedback and the opportunity to connect with people working in the industry. The session is open to everyone—students and non-students alike—who are interested in seeing what participants have created and learning more about current work in audio design.
Jamie Angus-Whiteoak Is Emeritus Professor of Audio Technology at Salford University and VP for Northern Europe.
Her interest in audio was crystallized aged 11 when she visited the WOR studios, NYC, in 1967 on a school trip. After this she was hooked, and spent much of her free ti... Read More →
Director of Music Media Production, AES Education Committee, Ball State University
Christoph Thompson is vice-chair of the AES audio education committee. He is the chair of the AES Student Design Competition and the Matlab Plugin Design Competition. He is the director of the music media production program at Ball State University. His research topics include audio... Read More →
Sascha Disch received his Dipl.-Ing. degree in electrical engineering from the Technical University Hamburg-Harburg (TUHH) in 1999 and joined the Fraunhofer Institute for Integrated Circuits (IIS) the same year. Ever since he has been working in research and development of perceptual... Read More →
Friday May 29, 2026 12:00pm - 1:30pm CEST Aud 49Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
Mixed-phase impulse response equalization can improve magnitude; phase response, but conventional objectives such as mean-squared error (MSE) can favor solutions that introduce objectionable temporal artifacts, including pre-echo; extended post-echo ringing. This paper proposes a Spatial Equalization Quality Measure (SEQM) to select a mixed-phase equalization filter that better controls these artifacts while remaining computationally simple; applicable across multiple listening positions. SEQM combines (i) a temporal-domain metric that penalizes energy preceding the main pulse of an impulse response; energy persisting after it, while also accounting for the decay rate of the post-response tail, with (ii) a spatial aggregation rule that summarizes quality across measurement positions. We use SEQM to select the modeling delay for mixed-phase finite-impulse-response (FIR) equalization; to compare mixed-phase FIR designs with minimum-phase FIR ; IIR alternatives under a common multi-position measurement framework. Experiments using semi-anechoic measurements across 34 spatial positions for two loudspeakers show that SEQM consistently selects substantially shorter delays than MSE-based selection; yields impulse responses with reduced pre-echo; faster post-response decay, while maintaining comparable frequency-response equalization. These results suggest that SEQM is a practical objective tool for designing multi-position mixed-phase equalization filters.
Speech intelligibility is a key factor in successful communication across various domains, including research, post-production for film and television, live sound reinforcement, and audio production. Traditional assessment methods often lack objectivity or fail to capture the listener’s experience in real-world scenarios. In this workshop, we introduce an innovative approach to measuring speech intelligibility based on the concept of “Listening Effort.” We will present the underlying technology, share practical examples from different application areas, and demonstrate how this method can be integrated into workflows to optimize intelligibility. Attendees will have the opportunity to participate in a hands-on demonstration and discuss potential use cases relevant to their own work. This session is designed for professionals and researchers seeking reliable and actionable tools for evaluating and improving speech intelligibility in diverse environments. In this workshop, we present a new technology for measuring speech intelligibility (“Listening Effort”). The method is used in research, post-production (film/TV), live sound, and audio production. The session is aimed at professionals from both academia and industry who are interested in objectively assessing and optimizing speech intelligibility.
Participants will be able to join a short demo/exercise and ask questions.
Introduction & Relevance: Overview of the importance of speech intelligibility across different fields Technology & Methodology: Presentation of the measurement method and underlying concepts Practical Examples: Case studies from research, post-production (film/TV), live sound, and production Live Demo / Interactive Exercise: Practical demonstration and opportunity for active participation Discussion & Outlook: Q&A, exchange of ideas, and future perspectives
High-speed 1-bit signals generated by oversampling are widely used in audio applications as they allow simple demodulation via low-pass filtering while preserving in-band spectral characteristics with high accuracy. However, conventional FIR filtering of such signals generally requires conversion to a multi-bit representation at a common sampling frequency, which increases computational cost; complicates the overall processing flow. This paper addresses the convolution of high-speed 1-bit audio signals with multi-bit FIR impulse responses ; presents a systematic formulation of a multiplier-less convolution approach. Based on a mathematical reinterpretation of convolution, the proposed formulation describes how time shifting; amplitude weighting can be expressed through structured rearranging of 1-bit samples without arithmetic operations. This provides a theoretical description of previously reported 1-bit convolution methods; however, its validity has not been fully formalized. We examine the spectral characteristics of the proposed convolution method; compare them with those obtained by multi-bit convolution followed by ΔΣ modulation. Experiments are conducted by convolving 1-bit input signals with FIR filters having multi-band frequency responses. Spectral analysis shows that the proposed method achieves extremely high agreement with the standard approach within the audible band while the differences appear primarily at much higher frequencies outside the audible range. These results demonstrate that convolution of high-speed 1-bit audio signals can be achieved without multipliers, suggesting the potential for highly efficient hardware-oriented signal processing architectures.
Virtual Microphone Array techniques are being investigated by the authors to support room acoustics optimisation in live sound environments. In our recent AES paper, “Room Acoustics Optimisation Using Virtual Microphone Arrays”, a notable outcome was that a compact four-microphone tetrahedral array performed strongly relative to its low sensor count. Recent virtual sensing; Remote Microphone Technique research treats microphone placement as an explicit design variable. It reports improved remote estimation performance when microphone layouts are deliberately chosen for the task, rather than adopted as fixed, standard configurations. This submission builds on our prior VMA work by focusing on the four-microphone case, where geometry choices are especially constrained. We compare a tetrahedral baseline with an ensemble of stochastically generated spherical layouts at the same array aperture using Monte Carlo simulation. We apply a consistent evaluation protocol across multiple listening-region offsets; standard beamforming estimators to isolate variability due to geometry alone. The central proposition is that, for low-count VMAs, geometry is a first-order design parameter. Tetrahedral remains a credible baseline, but lightweight stochastic exploration can reveal alternative layouts that are competitive;, in some cases, superior without increasing channel count.
Brian de Brit is a lecturer in the School of Electrical and Electronic Engineering at Technological University Dublin. He holds a B.Sc. in Mathematical Physics (University College Dublin), an M.Phil. in Music and Media Technologies (Trinity College Dublin), and a Master of Engineering... Read More →
This paper introduces clustered virtual microphone arrays as a step toward improving listener-level virtual microphone estimation for live sound. Multiple compact microphone sub-arrays are placed around a nominal overhead position. Each sub-array produces a virtual microphone estimate,; the estimates are fused. The aim is to attack the estimation problem from multiple viewpoints; reduce sensitivity to any one array placement or geometry. The work builds on our earlier paper, “Room Acoustics Optimisation Using Virtual Microphone Arrays”. That paper proposed virtual microphones estimated from an overhead array as a measurement layer for live sound optimisation. It also highlighted a key limitation: in its initial form, virtual microphone estimation quality was not yet strong enough for reliable use across positions. The present paper targets that limitation. We outline the clustered array idea; treat cluster count; inter-cluster spacing as design parameters. Virtual microphones are estimated using beamforming; combined using simple fusion. Performance is assessed with objective signal measures, including SNR ; frequency-; phase-related error measures, across multiple listener-level target positions. The results support further refinement under more realistic room conditions; further study of the link between improved estimation quality; FIR-based correction outcomes.
Brian de Brit is a lecturer in the School of Electrical and Electronic Engineering at Technological University Dublin. He holds a B.Sc. in Mathematical Physics (University College Dublin), an M.Phil. in Music and Media Technologies (Trinity College Dublin), and a Master of Engineering... Read More →
Loudspeaker array beamforming technology has been widely used; however, current frequency-domain; time-domain design methods for calculating FIR filters face challenges, including the need for modeling delay; high computational complexity. To address these issues, this paper proposes a time–frequency integrated framework. This framework supports both pressure matching; amplitude matching methods, enabling not only the realization of traditional superdirective beams but also the design of frequency-invariant beams. For the nonlinear optimization problem in amplitude matching, an efficient solving algorithm based on the Alternating Direction Method of Multipliers (ADMM) is introduced. Experimental results demonstrate that the proposed method combines the advantages of existing frequency-domain; time-domain approaches, directly computing FIR filter coefficients without delay modeling while maintaining high computational efficiency. This provides an effective solution for beam control in loudspeaker arrays.
The Exponential Sine Sweep (ESS) technique, popularized by Angelo Farina, has become a cornerstone of modern electroacoustic measurement due to its unique capability to simultaneously extract a system’s linear impulse response ; its individual harmonic distortion components. Standard implementation of this method almost exclusively utilizes a low-to-high (upward) exponential sine sweep. However, during a technical Q&A session at the AES Europe 2025 Convention in Warsaw, a question was raised: what are the practical consequences of reversing the sweep direction? This inquiry is particularly relevant given that several industry-standard measurement platforms often employ high-to-low (downward) sweeps to optimize the mechanical ; thermal stability of the device under test (DUT) while performing stepped or swept sinusoidal analysis. This paper provides an investigation into the temporal behavior of nonlinearities when the frequency gradient of an exponential sweep is inverted. Through formal mathematical derivation; numerical simulations the study proves that while the spacing between distortion orders remains identical in magnitude, the polarity; time distribution of these impulses is reversed. Specifically, we demonstrate that in a downward sweep, the distortion products shift from the "pre-causal" negative time region to the "post-causal" positive time region. This shift causes harmonic distortion pulses to emerge within the reverberant tail of the impulse response, leading to significant contamination of decay measurements; energy-time curves. By contrasting the "tracking filter" paradigm with "time-domain deconvolution," this work clarifies why sweep direction is a critical parameter that must be aligned with the specific goals of the measurement protocol.
This paper presents a multichannel adaptive filtering algorithm for real-time full-band adaptive transaural reproduction on general-purpose hardware. It is based on a multichannel frequency-domain FxLMS algorithm using an overlap-save framework for both filtering; adaptation, ; is extended with (i) online plant identification for fully adaptive operation, (ii) frequency-dependent normalization for faster convergence,; (iii) frequency-dependent regularization to stabilize adaptation. The proposed algorithm is implemented in C language on a standard desktop PC; evaluated on a 4x2 transaural configuration running in real time at 48 kHz with 2048-tap control filters. Two evaluation tests are conducted. The first test consists of reproducing two uncorrelated white-noise signals at the ears of a manikin using crosstalk cancellation as the performance metric. An average crosstalk cancellation of 32 dB over 100 Hz–20 kHz is demonstrated. The second experiment considers binaural signal reproduction as a more realistic use case of the algorithm. In both cases, performance is assessed for both a static listener; a moving listener scenario, demonstrating the algorithm’s ability to rapidly re-adapt.
Personal sound zones aim to reproduce distinct audio contents in separate spatial regions using loudspeaker arrays, while minimizing acoustic interference between zones. Although well established theoretically, their real-time implementation remains challenging due to the long impulse responses involved; the latency constraints of audio processing systems. This work presents a real-time implementation of personal sound zones based on the pressure matching method in a static context, i.e. transfer functions between the loudspeakers; the zones are assumed to remain constant. Sound zone filters are computed in the frequency domain from experimentally measured impulse responses between an array of 18 loudspeakers; two microphone arrays of 9 microphones defining a bright zone; a dark zone. The system performance is then evaluated in terms of acoustic contrast, reproduction error,; effective frequency range. To meet real-time constraints, a fast partitioned convolution algorithm has been used, namely the Uniformly-Partitioned Overlap Save (UPOLS). This methods has been implemented in C++ as an external block for the Purr Data real-time audio environment. Experimental results, obtained in a semi-anechoic environment, demonstrate that it enables stable real-time multichannel convolution with negligible numerical error compared to offline convolution. The proposed system results in a functional real-time sound zones demonstrator, suitable for experimental; interactive spatial audio applications. The codes are shared in a GitHub repository so that the scientific community can benefit from them.
Join us for a panel discussion about audio design featuring some of the industry’s leading audio designers and educators. This session is meant to inspire upcoming designers and encourage dialogue with established audio designers.
The panelists will give a brief overview of their designs, their roles in the AES, and how and why educators and students should participate in the various design competitions that the AES has to offer. The panel discussion is followed by a Q&A session that allows for questions and exchange with the panelists.
Jamie Angus-Whiteoak Is Emeritus Professor of Audio Technology at Salford University and VP for Northern Europe.
Her interest in audio was crystallized aged 11 when she visited the WOR studios, NYC, in 1967 on a school trip. After this she was hooked, and spent much of her free ti... Read More →
Director of Music Media Production, AES Education Committee, Ball State University
Christoph Thompson is vice-chair of the AES audio education committee. He is the chair of the AES Student Design Competition and the Matlab Plugin Design Competition. He is the director of the music media production program at Ball State University. His research topics include audio... Read More →
Friday May 29, 2026 2:00pm - 3:00pm CEST Aud 41Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark