This paper introduces a novel approach for generating a lower layer in multichannel audio upmixing, addressing a gap in existing methods that primarily focus on mid; top layers. Leveraging Harmonic-Percussive Separation (HPS), the proposed framework dynamically adjusts key parameters (separation factor, harmonic attenuation,; phase shift) to enhance percussive components while diffusing harmonic elements. We compared three neural network architectures for this task: LSTM, TCN,; Transformer. Experimental results show comparable perceptual quality; objective metrics across all models, with the TCN being the most balanced; suitable for deployment on edge devices.
Acoustic lenses are structures that enable the focusing of acoustic waves, with increasing applications in audio devices like loudspeakers to concentrate energy toward a listening position. While typically employed at higher frequencies, achieving effective performance within the audible frequency range remains a significant challenge due to long acoustic wavelengths, which necessitate structures of substantially larger dimensions. This paper addresses the design of an acoustic lens dedicated to operation in the audible range. The proposed lens is composed of periodically arranged acoustic unit cells, enabling precise control over both the sound transmission coefficient; the phase delay. A parametric analysis of a single acoustic unit cell was performed, followed by global optimization of the complete lens structure using the Particle Swarm Optimization (PSO) algorithm. The outcome of the study is an acoustic lens design with predefined properties that demonstrate the desired directional characteristics. The findings highlight the potential of this approach for effectively manipulating the acoustic wave field; the directivity of sound sources within the audible frequency range.
The proposed workshop/tutorial serves as a prequel to the presentation on the history of dynamic loudspeakers given at the 158th Convention (Warsaw, 2025). It focuses on the earliest phase of consumer loudspeaker technology in the 1920s, prior to the widespread adoption of dynamic loudspeakers in the mass market.
Loudspeakers had been in use since the mid-1910s for public address applications, and the rapid global expansion of broadcast radio soon brought loudspeakers into domestic use. The 1920s constituted a period of rapid innovation in loudspeaker design, preceding the introduction of the dynamic loudspeaker, which achieved significant commercial impact only in the latter part of the decade.
The workshop/tutorial will examine consumer loudspeaker technologies of the 1920s, the concurrent advancements in audio electronics and signal sources that enabled subsequent developments, and the earliest efforts in systematic loudspeaker theory and measurement.
Two loudspeaker types dominated this period: horn loudspeakers driven by electromagnetic drivers similar to those used in headphones and telephone receivers (with headphones, particularly Baldwin models, also serving as the basis for do-it-yourself loudspeakers), and open-baffle cone loudspeakers, frequently actuated by electromagnetic reed drivers.
Although these transducer technologies were rapidly superseded during the following decade, the electromagnetic loudspeaker era already featured multi-way loudspeakers employing passive crossovers. Early measurements exposed deficiencies in frequency response, leading to the introduction of equalisation techniques, including notch filters, to correct these responses.
Developments in amplification were equally significant. The 1920s saw the introduction of push-pull amplifiers (described at the time as “distortionless”) and, as a key contributor to improved bandwidth and reduced distortion, new audio transformers derived from Bell Labs’ telephone research. Amplifier power limitations nevertheless remained a dominant constraint in loudspeaker design, resulting in the widespread use of strong resonances to achieve high sensitivity. Improvements in signal source quality from the mid-1920s onwards — including advances in radio transmission and the introduction of electrical disc recording and playback — further increased the demand for improved loudspeaker performance, ultimately contributing to the development of dynamic loudspeakers. In contrast, headphone technology appears to have undergone relatively little development during this period.
The tutorial will conclude with a brief overview of the loudspeaker manufacturing landscape of the era, noting that only a small proportion of manufacturers survived the transition to dynamic loudspeaker technology.
Come and meet fellow student peers and AES leadership from around the world. Attendees will gain an overview of student-focused events at the Convention, other upcoming student events and competitions organized by AES, and learn about the finalists in the Student Recording Competition.
Participants will have the opportunity to introduce themselves and their local student sections. The short session encourages international connection and collaboration among students, fostering a global network of future audio professionals.
AES / Kansas City Kansas Community College / off-beat-open-hats LLC, AES
Dr. Ian Corbett is the Coordinator and Professor of Audio Engineering and Music Technology at Kansas City Kansas Community College. He also owns and operates "off-beat-open-hats LLC”, providing live sound, audio production, and recording services to clients in the Kansas City area. Highly active... Read More →
Thursday May 28, 2026 9:00am - 10:00am CEST Aud 41Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
The mixing stage in music production involves a complex set of interdependent technical; creative decisions aimed at achieving a coherent; industry-level result. Intelligent Music Production (IMP) is an emerging research area that integrates Artificial Intelligence techniques into music creation; post-production processes, spanning from composition to mastering. Within this context, Answer Set Programming (ASP), a declarative paradigm from Knowledge Representation; Reasoning, has proven effective for modeling; solving complex optimization problems. This article presents frmixerr, an ASP-based intelligent system designed to optimize the mixing process by automatically generating balanced mixes. The system formulates mixing as a combinatorial optimization problem; evaluates candidate solutions against a reference spectral profile. To assess its performance, a subjective listening test was conducted comparing mixes generated by frmixerr with mixes produced by human engineers with varying levels of professional experience. The results indicate no significant differences in perceived quality between frmixerr mix; those created by professionals, suggesting that ASP constitutes a viable approach for intelligent assistance in music mixing.
In today’s live; electronic music events there are some sound reinforcement systems that are using horn loaded bass speaker cabinets to provide the low-end section. Especially for the electronic music applications the PA system is designed to use one or multiple clusters of bass cabinets to provide the needed SPL; impact in the low frequency range. Despite being large; heavy the horn loaded bass speakers have some advantages like the efficiency; directivity which makes them a great option for electronic music. Even more, the enthusiasts are describing them as having a longer projection of the sound when compared with bass reflex units. When used in clusters the bass horns present a mutual coupling due to a larger mouth surface area; the physics behind. This effect alters the working parameters in a good way regarding sound reproduction; is clearly noticed at high levels. This mechanism increases the output close to the low edge of the frequency response interval; changes the directivity pattern. A cluster of four or six double 18” horn loaded bass bins placed in the front middle of a dance area will provide good impact described a “punchy” sound, so acclaimed in the electronic music party scene. In this paper I will describe an investigation of the mutual coupling between horn cabinets using electrical; acoustical measurements to reveal the mentioned above mechanism. Electrical impedance measurement together with SPL; frequency response in coupled; uncoupled scenarios are used to describe; demystify the mutual coupling phenomena.
Sound system design and calibration engineer. I am running a company providing professional sound systems and DJ equipment rental. Sound system setup design, numerical simulations and technical support are included in the portfolio. Horn speakers and Vacuum tube amplifiers enthus... Read More →
Thursday May 28, 2026 9:30am - 10:00am CEST Aud 44Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
The development of personal sound zone systems in recent years show great potential for low-frequency noise control outside of noisy spaces. These approaches show promising applications to manage noise pollution arising from concerts in large venues or urban festivals. However, most of the literature considered that the created sound zones would exist in the same room or acoustic space as the noise source. This premise hence discards all setups where the disturbances would occur outside of concert venues (e.g in neighboring houses). This paper presents a first experimental study of the behavior of sound zone methods for indoor sound zones; outdoor noise sources. These initial results present a good efficiency of these methods in this edge case, opening new use cases for these approaches.
Conventional ornithological monitoring systems rely heavily on single-channel recorders; deep learning classifiers to identify "what" species is present, but fail to capture "where" it is located or how individuals interact spatially. This limitation hinders the study of complex ecological behaviors, such as inter-specific spacing in dense vegetation; predator-prey dynamics. We propose a novel, dual-mode acoustic localization system designed to unify semantic classification; spatial tracking. Utilizing an economically scalable 16-channel Uniform Rectangular Array (UMA-16) interfaced with edge-computing platforms, we implement a hybrid spatial filtering pipeline structured to balance real-time latency constraints with achievable angular resolution. The first stage employs a computationally efficient, noise-robust linear scanning technique to generate an acoustic energy map; estimate source multiplicity. This preliminary data initializes a second-stage, super-resolution spectral estimation algorithm predicated on signal-noise subspace orthogonality, allowing the noise robustness of non-parametric beamforming methods with the precision of parametric approaches. By integrating these spatial filters with standard deep learning classifiers, the system resolves overlapping vocalizations in "Cocktail Party" scenarios; improves Signal-to-Noise Ratio (SNR) for cryptic species detection. We address the physical "Localization-Detection Range Disparity," demonstrating that while detection is viable at long ranges, precise localization is constrained by the array aperture to the near-to-mid field. The system outputs real-time video overlays of acoustic heatmaps for field observation; generates autonomous volumetric territory maps in fixed deployments, collectively providing ornithologists with a robust capability for analyzing the spatial ecology of avian vocalizations.
Since 2021, 7.1.4 musical content has transitioned from a niche specialty to a mainstream commercial deliverable within major streaming ecosystems. However, industry discourse indicates a disparity in how the immersive stage is utilized across different production tiers. This paper presents a targeted quantitative study of thirty 7.1.4 tracks (N = 30 total; 15 per category; 2021–2026), employing a matched-pair sampling strategy driven by the availability of 'Established Excellence' (Grammy Award-winning/nominated immersive albums) against genre-equivalent 'Market Dominance' (top-charting streaming tracks). The study utilizes a multi-parameter measurement methodology, including Inter-Channel Cross-Correlation, hemispheric symmetry; spatial width analysis. Furthermore, vertical spectral centroid distribution; channel occupancy (Center; LFE) are analyzed to identify recurring structural immersive design markers. Preliminary findings suggest a consistent forward-facing bias; lower activity in select channels in charting commercial releases compared to award-recognized counterparts. By documenting these technical indicators, such as quarter-sphere correlation; LFE handling differences, this study establishes a benchmark for current immersive mixing practices; highlights the technical indicators that may limit the transition from enhanced stereo to true immersive envelopment.
There are many types of different distortions that can be measured from linear to non-linear distortion. Often the two are convoluted together and the linear distortion influences the non-linear distortion. Distortion is also very signal and level dependent and it is hard to compare one type of distortion measurement to another. There are many type of non-linear distortion metrics, e.g. THD, THD+N and IMD being the most classic ones using sine tones as the test signal. But how can we measure distortion with real signals such as speech and music or even noise and compare the results to audibility? This tutorial discusses a wide range of distortion measurements, discusses what is audible and what distortion sounds like.
Steve Temme is founder and President of Listen, Inc., manufacturer of the SoundCheck audio test system. Steve founded the company in 1995, and for the past 30 years the company has remained on the cutting edge of research into audio measurement, regularly introducing new measurement... Read More →
Thursday May 28, 2026 10:00am - 11:00am CEST Aud 49Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
Accurate characterization of the three-dimensional sound radiation of outdoor public-address (PA) systems is essential for sound system engineering, environmental noise assessment, neighbourhood protection, and the validation of prediction models. In current practice, field measurements around performance stages are typically restricted to receiver heights below 5 m, limiting insight into sound radiation at elevated positions and towards the surrounding environment. This tutorial presents a measurement approach using an unmanned aerial vehicle (UAV) as a platform for Class 1 sound level measurements, enabling in-situ characterization of large-scale PA systems sound radiation in three dimensions. A controlled case study was conducted at an open-air festival site in Belgium where the sound radiation of a professional line-array PA system was measured at heights of 2 m and 30 m using both conventional ground-based measurements and a drone-mounted sound level meter. To ensure compatibility with standard sound engineering and environmental noise practice, strict Class 1 methodology was applied, including the use of an omnidirectional microphone, broadband excitation signals, and background noise correction in accordance with ISO 1996-2. Drone self-noise was quantified under operational conditions, and measurement data not meeting signal-to-noise validity criteria were excluded. The results show that reliable drone-based measurements are achievable in the low-frequency range from 25 to 315 Hz, which is of primary relevance for outdoor music systems and community noise impact and disturbance. Directivity indices derived at elevated height reveal weaker low-frequency directivity compared to ground-level measurements. This provides new insight into vertical sound radiation behaviour of festival PA systems. A comparison between measured and modelled sound levels demonstrates good agreement in terms of angular distribution and relative level differences. The proposed drone-based measurement approach enables three-dimensional sound field characterization of outdoor PA systems that is not attainable using conventional techniques. The method provides valuable data for sound system engineering leading to validation of prediction models and environmental noise assessment. This three-dimensional decibel measurement represents a step towards standardized UAV-based measurement methodologies for large-scale outdoor sound reinforcement systems. This tutorial will describe in detail the protocol to operate a measurement drone flight. After the presentation a practical demonstration of the drone platform will be held outside of the building.
Thursday May 28, 2026 10:00am - 11:00am CEST Building 302, 2nd floorTechnical University of Denmark Asmussens Alle, Building 302 DK-2800 Kgs. Lyngby Denmark
Before digital signal processing took over electronic keyboard instruments, they were implemented using analogue circuits that used tubes/valves, transistors, and even neon lightbulbs! Yet using these components keyboards were developed that could mimic string and brass ensembles, pianos and harpsichords and many other instruments. How did they do it?
The purpose of this tutorial is to look at both the architecture and the circuitry of these instruments. And show how amazing results could be achieved using comparatively simple electronic circuitry. It will look at:
1. The basic architecture of these instruments 2. How they generated the right notes, 3. How they desired envelope, 4. And imposed them on the waveform, 5. Simulated the effect of many instruments playing together.
It will also look at how, if it was required, touch sensitivity could be achieved, such as in electronic pianos. Where possible there will be audio examples demonstrating the sounds that could be achieved.
For many people who have only ever experienced the digital world it will be illuminating to see just how much could be achieved by comparatively simple circuits. In those days electronic components were expensive so considerable ingenuity was expended in minimising the total number of components required.
These instruments are part of our musical and audio heritage and the circuit techniques they used are in danger of being forgotten so this tutorial will be a timely reminder of what used to be done. It may also provide useful information to people who are attempting to model these instruments using modern digital methods.
The tutorial will be accessible to everyone, you will not have to be an electronic engineer to understand the principles behind these unique pieces of audio engineering history.
Jamie Angus-Whiteoak Is Emeritus Professor of Audio Technology at Salford University and VP for Northern Europe.
Her interest in audio was crystallized aged 11 when she visited the WOR studios, NYC, in 1967 on a school trip. After this she was hooked, and spent much of her free ti... Read More →
Thursday May 28, 2026 10:00am - 11:00am CEST Aud 41Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
The ECHO Project (Exploring the Cinematic Hemisphere for Orchestra) is a collaborative initiative investigating 3D microphone array techniques for orchestral recording. Building on the 3D-MARCo initiative, the project provides a platform for sound engineers, composers, researchers, and students to explore and experiment with immersive recording approaches. As part of this effort, an open-access database of high-quality orchestral recordings was created from sessions at AIR Studios, London, featuring Oscar-winning composer Volker Bertelmann and the London Contemporary Orchestra.The ECHO database contains recordings of four musical pieces captured using up to 143 microphone capsules, including seven expert-designed microphone arrays, spot microphones, a dummy head, and a higher-order spherical microphone system. The database enables comparison of different recording techniques and supports experimentation with microphone mixing, making it a valuable resource for research, teaching, and immersive audio production. This workshop will introduce the microphone arrays, describe the recording process and immersive compositional approach, and showcase selected recordings in 7.1.4.
Recording Producer and Balance Engineer with 50 GRAMMY-nominations, 42 of these in craft categories Best Engineered Album, Best Surround Sound Album, Best Immersive Audio Album and Producer of the Year. Founder and CEO of the record label 2L. Grammy Award-winner 2020 and 2026. Immersive... Read More →
Thursday May 28, 2026 10:00am - 11:00am CEST Aud 31Technical University of Denmark Asmussens Alle, Building 306 DK-2800 Kgs. Lyngby Denmark
This work presents the results of a perceptual study investigating the influence on musicians of a virtual acoustics system installed in the live room of a professional recording studio. The study focused on analyzing relationships between a selection of objective acoustic parameters (T30, STLate, LJ); subjective perceptions of 19 solo musicians performing under 11 different acoustic conditions. The experiment was conducted using the VAT (Virtual Acoustic Technology) system; the VAT Suite software developed at the Immersive Media Laboratory (IMLab) in the Sound Recording Department at McGill University. Correlations between quantitative; qualitative analyses show that musicians’ preferences converge on conditions with T30 ≈ 1 s,; that late; lateral energy increases the perception of spatiality, providing a positive balance between clarity; acoustic support. However, longer reverberation reduces comfort; executive control.
Audio event-classification models trained on AudioSet are widely adopted; form a central component of the state of the art in machine listening, yet their behavior when deployed in complex, open acoustic environments remains largely unexplored. In this study, we evaluate several widely adopted AudioSet-pretrained architectures—particularly models from the PANNs family, including MobileNetV2; Wavegram; Transformer-based PaSST model—when applied to a real operational scenario at the commercial Port of Valencia, Spain. We observed a recurring; systematic unexpected behavior: the models frequently assigned disproportionately high probability to the class Music for non-musical industrial; transportation sounds. These mislabeled events included train-wheel squealing, motorcycle acceleration, emergency sirens,; reversing beeps—sound categories that are common in port logistics environments but acoustically different from music. By analyzing the probability distributions output by the models, we demonstrate that this erroneous Music activation is not an isolated failure but a pervasive pattern across several architectures. Our findings highlight a critical gap in the robustness; domain generalization of AudioSet-derived models; emphasize the need for targeted adaptation techniques when deploying them in real industrial settings.
Damping in viscoelastic materials such as rubbers is often desirable, especially in loudspeaker suspensions. Under high strain loads however, viscoelastic materials can also exhibit a hysteretic stiffness behavior, causing a stiffness decrease with amplitude. In this study, we examine the viscoelastic rubber suspension of a loudspeaker, using the loudspeaker motor system as actuator ; sensor. From measurements we observe the hysteretic force-displacement behavior; pronounced odd-order harmonic distortion even at low amplitudes, in accordance with the literature. We further explore a macro-thermodynamic plastic flow model to model the stiffness of viscoelastic materials. The results show that the plastic flow suspension model explains; replicates the observed nonlinear hysteretic behavior. We also show that a fitted time-domain loudspeaker model including plastic flow matches the measured distortion profile. In contrast, models with polynomial stiffness; viscous damping fail to explain the observed amplitude dependencies such as odd order harmonic levels. The experiments demonstrate that viscoelastic hysteresis occurs not only at high but also at low amplitudes, where the elastic stiffness is approximately linear.
My interest are loudspeakers (measurements, modelling, (nonlinear) parameter estimation, nonlinear compensation. Active noise control, indoor and outdoor sound field control
This work addresses the problem of frame drum (bendir) stroke technique recognition in simulated real-world conditions. The traditional frame drum technique includes three discrete strokes that are used to create rhythmic patterns, dum, tek; slap. In the presented work, audio data augmentation is investigated on a dataset containing recordings of instruments of various construction attributes. The used techniques are selected in the direction of generalizing classification in real-world conditions. Moreover, the mixing of the frame drum samples with accompanying guitar chords is introduced, simulating the more complicated problem of hit technique recognition when playing in a duo. The application of the aforementioned data augmentation leads to the formation of different available datasets for training; testing. Two convolutional neural network architectures (one-; two-dimensional) are taken into consideration, trained on waveforms; melscale spectrograms of the different subsets accordingly.
Dr. Nikolaos Vryzas was born in Thessaloniki in 1990. He studied Electrical & Computer Engineering in the Aristotle University of Thessaloniki (AUTh). After graduating, he received his master degrees on Information and Communication Audio Video Technologies for Education & Production from the Interdepartme... Read More →
Thursday May 28, 2026 11:00am - 11:30am CEST Aud 43Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
Input-output linearization is a technique for compensating nonlinear distortion in loudspeakers. To apply it to complex loudspeaker models, we describe an end-to-end framework for estimating model parameters from data; deriving the linearizing control laws using automatic differentiation. The parameter estimation approach combines frequency-domain linear parameter estimation with a time-domain prediction-error method for the nonlinear parameters. The linearization approach supports non-linear reference systems; stabilization of the control law using trajectory tracking. We implement the framework in dynax, an open-source Python package based on JAX,; validate it experimentally as a feed-forward controller on a closed-box loudspeaker. Results demonstrate validation errors of 1--5\,\% NRMSE; total harmonic distortion reductions of 6--12\,dB. The framework enables researchers ; engineers to rapidly prototype; validate complex loudspeaker models for distortion compensation without manual symbolic derivations.
My interest are loudspeakers (measurements, modelling, (nonlinear) parameter estimation, nonlinear compensation. Active noise control, indoor and outdoor sound field control
This study introduces a fourth-order Ambisonics-based decoding system to reproduce railway cabin running noise in a studio environment, enabling enhanced spatial impression and detailed sound field variation. Real-world operational noise was recorded using a multichannel fourth-order Ambisonics microphone (Eigenmike® EM32, mh acoustics LLC, USA), and the reproduced sound field was implemented through a multichannel loudspeaker system. The reproduced signals were quantitatively compared with the original operational noise in terms of spectral variation and waveform distortion.
Yonghee Lee Ph D. Mechanical Engineeing. Ultrasonic, Acoustic, SHM, NDE, fNIRS, and Bio-medical engineering. Contact: [email protected] Institute: Changwon National Uniersity, South Korea
Thursday May 28, 2026 11:00am - 11:30am CEST Aud 42Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
Kseniya Kawko, a Munich- and London-based Tonmeister and recording engineer specializing in classical music and jazz, shares selections from her recent live and studio recording and mixing projects, featuring leading orchestras and jazz ensembles, and provides an introduction to the artistic and production considerations behind immersive formats.
This masterclass series, featuring remarkable recording artists, is a chance to hear 3D audio at its best; as we discuss qualities that make it truly worth the effort.
In each masterclass, we explore the new spatial possibilities in recording and production, detailing also this specific listening room, regarding ITU-R BS.1116 compliance and auditory envelopment (AEV) transparency. Seats are limited to keep playback variation at bay.
Kseniya Kawko is a producer and recording engineer specialized in classical music and jazz. She holds Master of Music degrees from two world-renowned audio programs: Sound Recording, McGill University (Montréal, Canada) and Musikregie / Tonmeister, Hochschule für Musik Detmold (Germany... Read More →
Immersive music is at a critical point in its development. While production tools, workflows, and distribution models have begun to stabilise, the market remains fragile, and long-term adoption is far from guaranteed.
New immersive audio formats are now entering a field where creators, labels, and platforms have only recently started to commit resources and build confidence. This raises a fundamental question: does the introduction of additional formats strengthen immersive music, or does it increase uncertainty at a time when the market can least afford it?
This panel-based workshop focuses on immersive audio formats for music and explores whether current challenges are best addressed through new formats, or through innovation and improvement within existing ones.
Topics for discussion include: - What are the most pressing problems facing immersive music today? - Do emerging formats solve these problems, or risk fragmenting production, distribution, and listening experiences? - How does format uncertainty affect investment, release strategies, and creative willingness, especially in smaller markets? - What are the potential consequences if industry stakeholders decide that immersive music is too complex or too risky to prioritise? - How do issues such as translation between loudspeaker-based and headphone listening fit into this broader picture?
The session is designed as an open, moderated discussion with panelists from production, research, mastering, education, and technology development.
Stefan Bock, born 20.08.1964 in southern Germany was starting his career in 1987 as an audio engineer. After freelancing in different facilities in Munich, he co-founded msm-studios in 1991 where he was the Chief Mastering Engineer and General Manager.
Recording Producer and Balance Engineer with 50 GRAMMY-nominations, 42 of these in craft categories Best Engineered Album, Best Surround Sound Album, Best Immersive Audio Album and Producer of the Year. Founder and CEO of the record label 2L. Grammy Award-winner 2020 and 2026. Immersive... Read More →
With expertise in Dolby Atmos and immersive sound, Lars Tirsbæk leads the way in teaching studio production at Sonic College. His innovative approach combines the best of both studio and live sound, focusing on efficient workflows, technical tools, and the creative process. Additionally... Read More →
Thursday May 28, 2026 11:00am - 12:00pm CEST Aud 41Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
Multichannel audio formats require an attention to channels' correlations and sometimes special approach. In this workshop, we would like to continue the discussion started at AES Show 2025 in LA and show how you can use different measurement tools to avoid certain problems in the final mix. For example, the mutual influence between the upper and main beds in immersive layout or problems in the LFE channel and how to check the mix for the correlation issues outside the sweet spot.
Taking the premiere and reperformance of the sci-tech symphonic suite Symphonic Coding as a case study, this paper discusses audio system organization, sound diffusion, and cross-venue migration in the co-performance of symphonic and electronic music. Given the challenges of diverse live inputs, real-time control of the electronic music part, concurrent recording and live streaming, and varying acoustic conditions, the article analyzes how a single workflow handles traditional miking, electronic music generation and control, live spatial diffusion, and multi-purpose distribution. The study is structured across four levels: system design requirements, signal organization, dual-venue implementation, and engineering discussion. It illustrates the development of an interconnected workflow comprising Content, Rendering, and Distribution Layers through mixing console organization, immersive rendering, and AoIP distribution. Results indicate that the significance of this work lies not in the reproduction of the listening experience of the entire performance, but in enabling the spatial presentation of the electronic music part to remain valid across different environments based on a consistent reference. Furthermore, the project enhances reperformance capability and production flexibility through the separation of functions, roles, and systems.
Mechanical overload remains a primary limitation in high-output loudspeaker operation, particularly at low frequencies where large coil excursions are required. Conventional mechanical protection strategies are typically implemented as signal-domain limiters or filters, which act indirectly on the loudspeaker’s mechanical state; may introduce discontinuities, spectral modification, or unnecessary attenuation.
This paper proposes a methodological framework for mechanical loudspeaker protection based on the virtualization of admissible system behavior. The approach is formulated within a nonlinear wave digital loudspeaker model; realized using a direct–inverse–direct architecture. Mechanical protection is embedded directly into the virtual loudspeaker dynamics by shaping the nonlinear suspension compliance as a function of voice-coil displacement. As the excursion approaches a prescribed admissible limit, the virtual compliance is progressively reduced using a smooth raised-cosine law, resulting in a continuous increase of the virtual mechanical stiffness. Excessive excursion is therefore prevented as a consequence of the system dynamics, without explicit limiting, clipping, or signal-domain intervention.
The proposed framework is evaluated through numerical simulations using steady-state low-frequency sinusoids; low-frequency sine bursts under free-air loading. Results are compared against an unprotected loudspeaker; a fixed high-pass filter configured to meet the same excursion constraint. The simulations verify that the proposed method enforces a soft excursion ceiling without discontinuities, preserves low-frequency output in the near-limit operating region,; exhibits stable; immediate recovery following transient excitation. Distortion behavior is characterized; shown to increase smoothly as a result of the introduced mechanical nonlinearity.
The results demonstrate that mechanical protection can be realized as an emergent property of a virtual loudspeaker model rather than as an external control action. The proposed approach provides a physically interpretable; numerically robust foundation for virtualization-based loudspeaker protection.
A “phantom image” is the illusion of an independent sound source created by two or more loudspeakers. Most often created by manipulating level differences between stereophonic channels (aka, “panning”), the effect is used to create a sense of auditory space between loudspeakers ; is largely taken for granted. In recent years, surround; immersive audio systems have attempted to utilize phantom image processing to render audio objects in desired positions across multiple loudspeaker arrays. This research examined the efficacy of phantom image perception horizontally; vertically from an active listener perspective. After listening to a target loudspeaker, listeners (n = 442) were asked to move a phantom sound to a position to match that of the target loudspeaker. The listener’s phantom placement was then compared to the target,; subjects were allowed “correct” their phantom position. The horizontal experiment was based on a standard stereophonic 60° loudspeaker array with the target loudspeaker at 15° off center. The vertical experiment utilized elevated loudspeakers in a 60° arc with the target loudspeaker elevated 10° above the horizon (lower loudspeaker). Results show nearly universal “undershoot” in horizontal placement error on first attempts with gradual improvement over trials that coalesced around the projected target location. However, after repeated tries, final perceptual image locations were spread over 2/3 of the sound-field around the target loudspeaker. In the vertical trials perceptual locations were spread across the entire sound field in all three trials; failed to show any patterns of coalescence around the target loudspeaker.
Associate Professor of Audio Engineering Technology, interested in the perception and cognition of music and sound, especially timbre and attention. An amateur historical keyboardist. And my first name sounds like "song-he" as in "The song he sang was beautiful."
Brecht De Man is Head of Research at PXL-Music, guest lecturer at the Royal Conservatoire of The Hague, and author of Intelligent Music Production (Routledge 2019). He holds a PhD from the Centre for Digital Music at Queen Mary University of London, where he developed and evaluated... Read More →
With expertise in Dolby Atmos and immersive sound, Lars Tirsbæk leads the way in teaching studio production at Sonic College. His innovative approach combines the best of both studio and live sound, focusing on efficient workflows, technical tools, and the creative process. Additionally... Read More →
My interest are loudspeakers (measurements, modelling, (nonlinear) parameter estimation, nonlinear compensation. Active noise control, indoor and outdoor sound field control
Thursday May 28, 2026 12:00pm - 1:30pm CEST Aud 42Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
Target curves for the sound signature of headphones are a helpful design target during the development process. While a lot of attention has been made to fi nd target curves that match the listening preference of consumers, equivalents for studio headphones date back to the 90’s. In the context of music production a mutual target or even standard is essential as to make mixing; mastering more gear-independent. This becomes even more important since the main tool for sound engineers shifts from loudspeakers in professional environments such as acoustically treated studios to headphones, often additionally equipped with virtualization algorithms. This enables them to be more fl exible; to rely less on potentially expensive loudspeaker setups. The diffuse fi eld target curve that is currently still the only standardized target curve for studio headphones is often reported to not match a real loudspeaker-equivalent of studio environments. In this paper, we approach to find a new standard target curve for studio headphones emulating the frequency response of a loudspeaker setup in modern studio environments. For this, we give an overview of current target curves; match them to their equivalent loudspeaker setups. Based on that we propose a new methodology for a measurement-based target curve incorporating typical panning paradigms of music signals based on measurements inside multiple control rooms. To verify the results, we conduct listening tests with professionals in multiple studio environments.
A low-parameter-count machine-learning model for classifying streaming video can enable content-aware audio/video processing on consumer edge devices with latency, computational,; battery constraints. In this paper, we propose a low-compute classification technique that uses only text metadata from the streaming file header, enabling near-instantaneous inference without decoding; analyzing audio or video signals as is traditionally done. In particular, to support multilingual platforms such as YouTube, we first apply neural machine translation as a pre-processing step for the text metadata ; optimize a lightweight neural classifier for a three-class audio-centric classification taxonomy (movie, music, dialog/other). Experiments on a mixed-language YouTube dataset achieve $\approx$90\% classification accuracy on a test set using a combined translation; a classification model (with only $\sim22K$ parameters), demonstrating a globally-scalable approach for robust classification on the edge.
Headphones have become the dominant device for music playback, and their design appears to have reached a certain level of technical maturity. This workshop presents an overview of the current state of the art in headphone design and examines potential directions for future technological development, addressing both acoustic aspects—including transducer design—and signal-processing approaches.
The workshop establishes a common foundation by introducing the fundamentals of headphone acoustics and design principles, together with a brief overview of the historical development of headphones and the main headphone types in use today.
Based on this foundation, the workshop addresses current challenges and future development potential in headphone technology, including: • Transducer and acoustic development potential: materials, design methodologies and simulation techniques, and advances in measurement technology • Characteristics of a high-quality headphone: What differentiates an excellent headphone from a good one? To what extent can headphone performance be characterized using current measurement techniques, and what additional metrics, target criteria, or perceptual considerations may be required? What is the role of mechanical quality? • Signal processing potential: from advanced noise cancellation and augmented hearing to spatial audio processing • Challenges in realistic spatial reproduction: interaction between auditory and visual environments • Emerging wireless technologies: technologies such as UWB and Bluetooth 6 offer not only increased bandwidth and reduced latency but also the capability to localize the playback device. What are the implications for conventional headphone performance and for spatial audio applications? • Changes in studio workflows: professional practice has evolved from loudspeakers as the primary monitoring tools, with headphones mainly used for detailed analysis, toward headphones playing a central role in the early stages of recording and mixing. What are the consequences of this shift for headphone design and signal processing? • Technically feasible but not yet commercialized solutions: advanced headphone concepts that are achievable with current technology but have not yet been adopted due to economic or practical constraints
Streaming of immersive audio is known to western audiences almost exclusively in the object-based format, Atmos, developed by Dolby and employing lossy codecs to limit bit rates. Other object-based formats like Sony 360 have had limited success, and until recently there were no channel based streamed versions. But this situation is changing, as it has already done in Japan.
Responding to growing interest in very high quality immersive music for both on-demand streaming and live broadcast, two new services are now active that offer, first, channel-based audio and second, audio streamed in high res PCM. Binaural mixes, a range of PCM formats and video are variously included, with extensions to portables, loudspeakers, and home theater.
This workshop provides a forum for discussion of both the genuine promise and the challenges in these new initiatives. Included are the advantages of high resolution over lossy; channel-based versus object-based; the degree of adoption of transducers for multichannel; adaptive bit rates; data sources; and the Japanese approach; amongst others.
Kimio Hamasaki, an AES Fellow, is a producer and balance engineer for music recordings, a researcher in spatial audio, an educator in audio engineering and acoustics, and a consultant in audio engineering. He has recorded and produced numerous orchestral and operatic works with the Vienna Philharmonic... Read More →
Stefan Bock, born 20.08.1964 in southern Germany was starting his career in 1987 as an audio engineer. After freelancing in different facilities in Munich, he co-founded msm-studios in 1991 where he was the Chief Mastering Engineer and General Manager.
Bert Van Daele is CTO at NewAuro. After graduating as an Engineer in Digital Electronics in 1997, he started out as an electronics designer at Philips Electronics, mainly working on digital products related to Surround Sound. During a sabbatical leave, he worked at the Galaxy Studi... Read More →
Recording Producer and Balance Engineer with 50 GRAMMY-nominations, 42 of these in craft categories Best Engineered Album, Best Surround Sound Album, Best Immersive Audio Album and Producer of the Year. Founder and CEO of the record label 2L. Grammy Award-winner 2020 and 2026. Immersive... Read More →
We present Binaspect, an open-source Python library for binaural audio analysis, visualization,; feature generation. Binaspect generates interpretable “azimuth maps” by calculating modified interaural time; level difference spectrograms,; clustering those time-frequency (TF) bins into stable time-azimuth histogram representations. This allows multiple active sources to appear as distinct azimuthal clusters, while degradations manifest as broadened, diffused, or shifted distributions. Crucially, Binaspect operates blindly on audio, requiring no prior knowledge of head models. These visualizations enable researchers; engineers to observe how binaural cues are degraded by codec; renderer design choices, among other downstream processes. We demonstrate the tool on bitrate ladders, ambisonic rendering,; VBAP source positioning, where degradations are clearly revealed. In addition to their diagnostic value, the proposed representations can be exported as structured features suitable for training machine learning models in quality prediction, spatial audio classification,; other binaural tasks. Binaspect is released under an open-source license with full reproducibility scripts at: (link removed for blind review)
Thursday May 28, 2026 1:30pm - 3:30pm CEST Foyer Building 303ATechnical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
Realistic spatial audio consistent with visual information is essential for providing high immersion in Augmented Reality (AR) environments. However, conventional high-precision real-time acoustic simulations require significant computational power, limiting their implementation on standalone mobile VR devices such as the Meta Quest. This study proposes a practical method to enhance reverb realism using solely a standalone VR HMD, without the need for additional external equipment. By measuring impulse responses using a few hand claps in the physical space, we interpolate room acoustic parameters—specifically RT60; early/late energy ratios—to reflect the environment's unique characteristics. These extracted parameters are then applied to the VR engine's built-in reverb effects, enabling dynamic, location-aware real-time rendering with minimal computational load. The proposed method demonstrates that a brief calibration period of 3 to 5 minutes yields significantly improved realism compared to static reverb templates, offering an efficient; practical spatial audio solution for mobile AR environments.
Thursday May 28, 2026 1:30pm - 3:30pm CEST Foyer Building 303ATechnical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
The recently finalized ISO international standard (IS) on MPEG-I immersive audio enables interactive six-degrees-of-freedom (6DoF) audio rendering for a multitude of virtual-reality; augmented-reality (VR/AR) acoustic scenarios; applications with comprehensive modeling of room acoustics; intricate acoustic phenomena, including e.g. occlusion, reflection, transmission; diffraction caused by sound obstacles, Doppler effect,; dynamic environment changes triggered by user interactivity. This paper describes concept, methodology; results of the final verification test of this standard. In the verification test, the perceptual quality of the renderer was assessed in an interactive listening test using different in-; outdoor acoustic scenes, testing the above-mentioned features of the standard. More than 50 listeners participated in the test distributed across six labs using the ITU‑R BS.2132 [1] multi‑stimulus method on a 100‑point scale for three conditions (IS, mid-; low anchor) in 10 VR scenes plus two repetitions. The results of several anchor processing configurations are presented. The selected mid; low anchors have demonstrated stable quality across diverse scenes with progressive timbre; spatial degradations. The listening test results show a clear separation of the conditions (IS > mid > low); the low anchor was stable (around 16 points median value) while the mid anchor varied by scene (around 47 points). The IS is rated with a median of 84 points among all labs, which is the “excellent” region of the scale. The individual scenes are rated differently. The quartile range for some scenes can exhibit 20 points. The median value for the IS of the different labs varied, some are a bit more critical than others.
Sascha Disch received his Dipl.-Ing. degree in electrical engineering from the Technical University Hamburg-Harburg (TUHH) in 1999 and joined the Fraunhofer Institute for Integrated Circuits (IIS) the same year. Ever since he has been working in research and development of perceptual... Read More →
Thursday May 28, 2026 1:30pm - 3:30pm CEST Foyer Building 303ATechnical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
The significance of individual versus generic HRTFs in Virtual Audio can be difficult to ascertain given the variety of scenarios; tasks related to the spatial listening experience. Are we working on the most significant 80% of the success or fine-tuning the last 5% of the sound quality? When the VR users are blind it is fair to assume that the quality of the spatial audio becomes a critical; more important factor. This is the challenge as we see it. In the present project, we will investigate options for powerful game components relying on spatialized sound, using effects that are natural for the blind gamer. As a first step, we have implemented a test platform, where different options for HRTFs will exist,; where the on-boarding process shall reveal the optimal solution for the given user. The test scenario is inspired by a “classical” shooting down sound sources scenario, where we will vary e.g. the task definition, success criteria (hit zone, number of attempts; elapsed time) as well as eavesdropping game internal parameters of more complex nature (e.g. navigation trajectories). The results will display the variation in normal seeing listeners; produce normative data for later comparisons with blind participants. The platform also includes options for simple mirror-image room models,; standardized reverberation, which will be used in later tests to learn, whether the room acoustics may play a stronger role for the blind gamers’ navigation; source identification, than for normal seeing listeners.
Department of Engineering Technology and Didactics,nTechnical University of Denmark
Thursday May 28, 2026 1:30pm - 3:30pm CEST Foyer Building 303ATechnical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
This study investigates the relationship between the robustness of crosstalk cancellation; the symmetry of system configuration. Analytical results show that, when the positions of the sound sources are fixed, increasing asymmetry caused by deviations in the listener’s head position or orientation leads to a reduction in system robustness, whereas optimal performance is consistently achieved in symmetric layouts. For asymmetric configurations, we propose a method to optimize the axial angles of the sound sources. This method leverages source directivity patterns to adjust level differences along the acoustic propagation paths, thereby improving system robustness. Experiments confirm the effectiveness of the proposed method in asymmetric crosstalk cancellation systems, demonstrating enhanced robustness; yielding higher binaural channel separation under slight listener head movements.
Thursday May 28, 2026 1:30pm - 3:30pm CEST Foyer Building 303ATechnical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
This paper presents a comparative analysis of two immersive recording techniques for classical music: the PCMA-3D (Perspective Control Microphone Array); the Decca Cuboid. While the Decca Cuboid relies primarily on time-of-arrival differences to generate spatial impressions, the PCMA-3D utilises intensity differences; separates ambience from direct sound. A recording session was conducted in a concert hall using a classical guitar soloist; two distinct folk music ensembles to capture performances simultaneously with both arrays. Subjective evaluation was performed using a MUSHRA listening test with 18 participants, assessing parameters such as sensation of space, localisation precision,; sound quality. Statistical analysis reveals that while both systems provide high-quality immersive experiences, the PCMA-3D scored significantly higher in the sensation of space (p
Thursday May 28, 2026 1:30pm - 3:30pm CEST Foyer Building 303ATechnical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
- Binaspect: A Python Library for Binaural Audio Analysis, Visualization & Feature Generation
- Lightweight Real-time Spatial Audio Interpolation for Standalone VR using Hand Claps
- Perceptual Evaluation of the MPEG-I Immersive Audio Standard
- Can the individual winner HRTFs be determined in a shooting task during onboarding for an Audio Only VR?
- Exploiting Source Directivity for Robust Asymmetric Crosstalk Cancellation
- Capturing Immersive Sound in Concert Halls: A Comparative Analysis of PCMA-3D and Decca Cuboid Recording Techniques
Thursday May 28, 2026 1:30pm - 3:30pm CEST Foyer Building 303A PostersTechnical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
Deep learning has significantly improved speech enhancement performance in controlled laboratory conditions, yet these advances rarely translate into robust real-world benefit for hearing aid users. Current algorithms are trained; evaluated in simplified acoustic scenarios, neglecting multimodal cues, user interaction, environmental dynamics, ; the strict latency; power constraints of embedded devices. As a result, a persistent gap remains between algorithmic performance; everyday listening experience. This position paper reviews recent progress in speech enhancement, embedded Artificial Intelligence hardware,; hearing aid systems,; argues for a shift toward ecologically valid evaluation; hardware-aware design. We propose virtual reality as a reproducible, multisensory benchmarking platform enabling joint assessment of human perception; algorithmic processing. This perspective outlines a research roadmap toward adaptive, context-aware, ; practically deployable hearing technologies.
Few studies exist on the perception; measurement of nonlinear distortion in headphones. This paper reports the detection thresholds; perceived sound quality from real distortion in headphones. Five different distortion measurements were made on the headphones to determine how well they predict audibility; quality. Music samples were binaurally recorded on six headphones at playback levels ranging from 85 to +110 dBA at 3 dB increments. The recordings were reproduced at a normal playback level (83 dBA) through a reference headphone with low distortion. The headphone recordings were post-processed to remove both level; frequency response differences so only nonlinear distortions; residual noise remained. In a second test, listeners rated the similarity in quality of headphones relative to an undistorted reference; a hidden version of it. The results provide evidence audible distortion in headphones with music occurs at significantly higher playback levels (104 to 112 dBA SPL) than what is considered typical; safe. The percentage of measured THD in the headphone had the highest correlation with the detection thresholds while the non-coherent distortion with music best predicted the similarity ratings. We discuss the results; the practical implications they might have on future headphone design, testing; measurement.
This work presents a perceptual model based on a complex IIR filterbank. The filterbank with a frequency resolution of 4 bands per Bark consists of 104 filters whose slopes are designed to take spectral masking effects into account. The filter outputs are used to obtain masking thresholds with the following post processing. To obtain resonable masking thresholds from the spreading outputs, a post masking stage is required. Here, we propose a comodulation dependent adaptation of the postmasking decay to model Comodulation Masking Release (CMR) effects. This approach explicitely considers the dip-listening effect known from literature. The final masking thresholds are obtained by weighting the postmasking outputs by a tonality dependent gain, controlled using spectral flatness estimation. A listening test compares the proposed method to an already known approach using direct CMR based modification of the masking threshold gains.
Florian details the design of his brilliant and durable Double-Ufix 3D mic array, capable of high resolution outdoor recording. Attendees are treated to memorable listening examples from natural and rural environments in Austria and the Nordics.
This masterclass series, featuring remarkable recording artists, is a chance to hear 3D audio at its best; as we discuss qualities that make it truly worth the effort.
In each masterclass, we explore the new spatial possibilities in recording and production, detailing also this specific listening room, regarding ITU-R BS.1116 compliance and auditory envelopment (AEV) transparency. Seats are limited to keep playback variation at bay.
Join us to hear the finalists selected for this category of the Student Recording Competition. We will hear their presentations and recordings, and comments and feedback from the judges. Award and prize placements will be announced on the last day of the convention.
AES / Kansas City Kansas Community College / off-beat-open-hats LLC, AES
Dr. Ian Corbett is the Coordinator and Professor of Audio Engineering and Music Technology at Kansas City Kansas Community College. He also owns and operates "off-beat-open-hats LLC”, providing live sound, audio production, and recording services to clients in the Kansas City area. Highly active... Read More →
Thursday May 28, 2026 2:00pm - 3:00pm CEST Building 302, 2nd floorTechnical University of Denmark Asmussens Alle, Building 302 DK-2800 Kgs. Lyngby Denmark
Sound source localization; identity tracking are fundamental tasks in acoustic scene analysis, enabling machines to determine what, where; when produces sound events. While deep attractor-based networks have demonstrated improved performance under an unknown number of sources, maintaining continuous source tracking over long-form audio remains challenging due to memory limitations; permutation ambiguities across adjacent segments. In this paper, we propose a Recursive Attractor Network (RANet) for long-form sound source localization; identity tracking with a variable number of sources. RANet explicitly represents source attractors as transferable embeddings; recursively propagates them across adjacent audio segments using a LSTM-based model, thereby preserving source identity continuity over time. Experimental results on simulated datasets demonstrate that RANet achieves robust long-form sound source localization; consistent source identity tracking, outperforming baseline approaches under variable; dynamic source conditions.
There are three architectural approaches to microelectromechanical systems (MEMS) microphones, miniature devices used in a wide range of products. Capacitive microelectromechanical systems (MEMS) microphones are embedded in billions of consumer electronics. Solder-compatible; providing tight part-to-part sensitivity matching—all in a small footprint—capacitive MEMS microphones have demonstrated improved performance in recent years. State-of-the-art digital capacitive MEMS microphones can now achieve up to 72dB signal-to-noise ratio (SNR), with a 22dBA noise floor ; overall dynamic range in the order of 106 dB.
However, capacitive MEMS microphone technology has now reached the limits of its architecture, which constrains the key audio performance metrics: SNR; acoustic overload point (AOP).
Piezoelectric MEMS microphones have not demonstrated SNR performance exceeding 65dB,; require new materials to be developed to increase their performance. Optical MEMS microphones—a new architectural approach that combines a laser optical subsystem, a MEMS; advanced CMOS circuit design—has exceeded the limits of capacitive technology. With 80dB SNR supporting a 14 dBA noise floor, 132 dB dynamic range,; a 146dB AOP, optical MEMS microphones accomplish studio-quality performance in a tiny form factor that supports semiconductor-level yields in high-volume manufacturing.
This presentation will explain the architectural advancements of optical MEMS microphones in comparison to capacitive MEMS microphones. It will provide example use cases of high-SNR; high-AOP microphones in high volume applications.
EMORSION is an exploratory study examining how film audio design shapes audience emotion; immersion. It was conducted using scenes from four films in the horror (2) ; drama (2) genres, with two mainstream; two independent productions. For each scene, multiple alternative audio mixes were created by systematically manipulating three core aspects of audio design; frequency (pitch), dynamics (loudness),; directionality (spatial placement). Three audience groups were exposed to the scenes in a cinema setting, with each group experiencing either one manipulated audio mix; a control mix. Audience responses were assessed through a multimodal framework combining self-reported emotion; immersion via a questionnaire,; physiological measures, including heart rate monitoring; video-based motion tracking. Results show that subtle changes in audio design significantly affect emotional perception; immersion. Unconventional mixes produced greater variability in interpretation, while conventional immersive mixes led to stronger agreement across audiences. Notably, participants often reported perceived visual changes despite no alterations to the visual content.
Josh Reiss is Professor of Audio Engineering with the Centre for Digital Music at Queen Mary University of London. He has published more than 200 scientific papers (including over 50 in premier journals and 6 best paper awards) and co-authored two books. His research has been featured... Read More →
I'm Nelly Garcia. I'm an engineer in communications and electronics with the specialty in acoustics. Now, I'm a PhD Researcher at the Centre for Digital Music (C4DM) at Queen Mary University of London. My main interest is sound design, ways to create sounds from scratch, optimize the workflow of a sound designer and innovative ways to label, categorise or access samples... Read More →
Have you ever wondered how AES works? Let's meet up and talk about the benefits of volunteering and the path to leadership in AES! You could be our next Chair, Vice President, or even AES President!
With expertise in Dolby Atmos and immersive sound, Lars Tirsbæk leads the way in teaching studio production at Sonic College. His innovative approach combines the best of both studio and live sound, focusing on efficient workflows, technical tools, and the creative process. Additionally... Read More →
Brecht De Man is Head of Research at PXL-Music, guest lecturer at the Royal Conservatoire of The Hague, and author of Intelligent Music Production (Routledge 2019). He holds a PhD from the Centre for Digital Music at Queen Mary University of London, where he developed and evaluated... Read More →
Thursday May 28, 2026 2:30pm - 3:30pm CEST Aud 41Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
Boundary conditions are a critical part of room acoustic simulations. In the case of ray tracing, absorption coefficients of nearly all materials are measured; provided. However, wave-based simulations face several issues. The first one is the variety of boundary conditions used. Depending on the method, surface impedance or admittance might be needed, either in the frequency or in the time domain, as an angle-dependent or averaged variable. This limitation hinders the development of a standard measured quantity for boundary conditions in wave-based simulations. In turn, this leads to the second issue encountered, which is the lack of widely available data to describe the characteristics of the different materials commonly found in rooms. In this study, a deep neural network has been trained to estimate the material properties of porous absorbers from their absorption coefficient in octave bands. These estimated material properties can then be used to calculate any boundary condition needed. This method thus allows to characterize the boundary conditions for any type of room acoustic simulation from the most commonly available data. Moreover, it provides a new tool to identify the sound absorber corresponding to a desired absorption profile during the design phase of a project. The training dataset in this study was generated from finite element method simulations. The poroelastic properties of the material, the sample thickness, as well as the depth of the air cavity backing the material were varied to create the training dataset.
Identifying robust headphone target curves is challenging when preference data from untrained listeners are interpreted without explicit perceptual structure. This work presents a methodological framework in which deep- learning-driven sensory-profile analysis serves as the primary interpretive layer for listening data. Candidate target curves are generated using an Interactive Differential Evolution (IDE) listening experiment that combines paired comparisons with a second- stage absolute-rating task, enabling continuous exploration of the perceptually relevant tuning space while reducing cognitive load. Converged gain sets are analyzed using a Virtual Listener Panel (VLP), a Deep Learning (DL) model trained on large-scale expert evaluations to predict perceptual attributes from rendered musical material. Predicted attributes are reported as relative scores along key sensory dimensions, including bass strength, timbral balance,; brilliance, enabling exploration of sensory clusters, perceptual trade-offs,; potential families of target tunings. Adaptive listening data from three culturally distinct listener panels (Denmark, Japan,; Colombia; 20 participants per site) support the DL-based interpretation. Convergence is quantified as a reduction in population variance, ; cross-site analyses assess the similarity of clustering structures; the consistency of relationships between preference; sensory attributes. Overall, the framework provides a scalable, perceptually grounded approach to interpreting listener-preference data when developing headphone target curves.
Perceptual Audio Evaluation Specialist, FORCE Technology
▪ Acoustics, psychoacoustics, product development, and digital communication as an Audio Engineer in the consumer electronics industry. ▪ Currently employed as a specialist at FORCE Technology's SenseLab department, contributing to enhancing sound quality in a wide range of consumer electronics products, collaborating with audio companies from across the globe... Read More →
Sa quintina is a distinctive emergent vocal phenomenon almost exclusively associated with the sacred polyphonic singing tradition of Castelsardo, perceived as an autonomous “fifth voice” arising during collective performance by four male singers. Although widely acknowledged in ethnomusicological literature, its formation mechanisms remain only partially explored within audio engineering; acoustical research. This paper presents an early-stage, descriptive sonological case study proposing new hypotheses on the formation; spatial reinforcement of sa quintina. The phenomenon is interpreted as a physically grounded, measurable outcome of harmonic fusion; spatial interference, observable through spectral energy distribution; coherence. It is hypothesized to emerge from a converging set of conditions—including non-tempered harmonic textures, differentiated vocal emission techniques, intentional formant tuning,; circular spatial configuration—none of which is assumed to be strictly sufficient in isolation. Building upon previous spectral coherence analyses, the study introduces a Quintina Directionality Index (QDI) to quantify the spatial dimension of the phenomenon. QDI is defined as the ratio between spectral energy in two frequency bands associated with sa quintina (600–750 Hz; 1200–1400 Hz); total spectral energy. The index is evaluated as a function of direction using ambisonic recordings in an anechoic chamber; as a function of microphone position in a controlled field setting. Preliminary observations suggest that sa quintina corresponds to localized regions of enhanced spectral coherence; energy reinforcement, supporting its interpretation as an emergent physical phenomenon that precedes; enables its perceptual salience, rather than a purely auditory illusion.
Jim and Ulrike have been recording in and for immersive audio for broadcast, film and audiophile staples for decades. They specialize in turning traditional acoustic New York Studio recordings into vast spatial experiences. The audiences will be experiencing the breathtaking virtuosity of the likes of Jane Ira Bloom, the Secret Trio, Donald Vega and large format ensembles under Franco Ambrosetti and Jim Pugh.
This masterclass series, featuring remarkable recording artists, is a chance to hear 3D audio at its best; as we discuss qualities that make it truly worth the effort.
In each masterclass, we explore the new spatial possibilities in recording and production, detailing also this specific listening room, regarding ITU-R BS.1116 compliance and auditory envelopment (AEV) transparency. Seats are limited to keep playback variation at bay.
Jim has been the President of the AES Educational Foundation since 2020 and is a professor of recorded music with the Clive Davis Institute of Recorded Music in the Tisch School of the Arts at New York University. Jim was the Institute’s Chair from 2004 – 2008. A graduate of the... Read More →
AES Technical Committee on "NETWORK AUDIO SYSTEMS"
The AES Technical Committees (TC) lead the Society's involvement in science and technology, and are a hub of networking, knowledge and expertise. Each TC specializes in a specific area of audio, and helps forge links between each of these areas and the society as a whole. Connect and engage!
Eclipsa Audio, based on the Immersive Audio Model and Format (IAMF) specification developed by members of the Alliance for Open Media, represents an open and royalty-free approach to immersive audio creation and delivery. Eclipsa Audio provides a growing ecosystem for producing and distributing spatial audio content, with hardware integration and streaming platform support, including YouTube, actively being rolled out. This panel brings together practitioners, researchers, and engineers directly involved in the development of IAMF and Eclipsa Audio to inform the audio engineering community about the current state of the format and its evolving toolkit. Presenters will provide an overview of the specification's design principles, discuss the collaborative research and development effort behind the Open Audio Renderer (OAR) and Open Audio Codec (OAC), introduce the content creation tools currently available within the Eclipsa Audio ecosystem, and propose practical workflows for immersive audio production and delivery. The session will include presentations followed by an open discussion addressing format interoperability, integration with existing production environments, listener experience considerations, and future directions for development. Audience participation is encouraged.
Toni Hirvonen studied acoustics at the Helsinki University of Technology (now Aalto University), where he obtained a PhD in audio signal processing and spatial audio. After a position as a Marie Curie fellow, he has worked internationally in the audio industry since 2010. His projects... Read More →
With 25+ years of media industry product development, Jani Huoponen is a seasoned expert in developing cutting-edge audio and video technologies for consumer devices and streaming systems. Joining Google in 2010, he’s served as a product manager across key multimedia initiatives... Read More →
Join us to hear the finalists selected for this category of the Student Recording Competition. We will hear their presentations and recordings, and comments and feedback from the judges. Award and prize placements will be announced on the last day of the convention.
Kseniya Kawko is a producer and recording engineer specialized in classical music and jazz. She holds Master of Music degrees from two world-renowned audio programs: Sound Recording, McGill University (Montréal, Canada) and Musikregie / Tonmeister, Hochschule für Musik Detmold (Germany... Read More →
AES / Kansas City Kansas Community College / off-beat-open-hats LLC, AES
Dr. Ian Corbett is the Coordinator and Professor of Audio Engineering and Music Technology at Kansas City Kansas Community College. He also owns and operates "off-beat-open-hats LLC”, providing live sound, audio production, and recording services to clients in the Kansas City area. Highly active... Read More →
CUNY LaGuardia Community College, CUNY LaGuardia Community College
New York City
Thursday May 28, 2026 3:00pm - 4:00pm CEST Building 302, 2nd floorTechnical University of Denmark Asmussens Alle, Building 302 DK-2800 Kgs. Lyngby Denmark
This paper presents a method for extracting a center signal from two-channel stereo signals for upmixing; reproduction with additional center loudspeakers. It uses a generative adversarial network with a generator trained with multiple reconstruction losses; adversarial losses obtained from a discriminator. The processing is of low computationally complexity, causal ; can be configured for latencies down to one audio frame of 46 ms length. It is described how training data are created using only publicly available signals; how the generation of target data enables to control the attenuation of diffuse signals ; direct signals panned off-center. An evaluation with listening test; computational metrics SI-SDR; F2 measure is presented. It shows an advantage compared to methods based on classical signal processing in terms of computational metrics for source separation; listeners preference.
Chief Scientist, Fraunhofer Institute for Integrated Circuits IIS
Christian Uhle is chief scientist in the Audio and Media Technologies division of the Fraunhofer IIS, Erlangen, Germany, and in the International Audio Laboratories Erlangen. He received the Dipl.-Ing. and PhD degrees from the Technical University of Ilmenau, Germany, in 1997 and... Read More →
This work presents a measurement uncertainty evaluation of the free-field sensitivity of a MEMS microphone using a substitution comparison method. The measurement setup is based on principles used in secondary microphone calibration, with sensitivity determined relative to a calibrated reference microphone. The uncertainty analysis follows the Guide to the Expression of Uncertainty in Measurement (GUM), where Type A; Type B uncertainty evaluations are propagated through a defined measurement model to obtain the final measurement result. The MEMS microphone sensitivity is estimated together with an expanded uncertainty, where the calibration uncertainty of the reference microphone is identified as the dominant contributor. Broadband results show that the measured sensitivity is close to the typical manufacturer sensitivity over a wide frequency range; follows a similar frequency trend. The proposed approach enables reproducible estimation of the free-field sensitivity of MEMS microphones; provides a clear framework for uncertainty evaluation.
Live music environments can be simulated; evaluated through spatial audio; augmented reality (AR) technology. However, conducting perceptual studies on AR environments can be challenging, as multiple design considerations; uncontrolled variables come into play. Hence, we developed Naviqual, a tool to create a spatial audio quality map for a virtual live music environment. We generated objective quality contour; polar maps to predict the quality of experience (QoE) across listener locations; directions respectively. We found that these maps strongly aligned with perceptual evaluations by normal-hearing listeners through listening tests. We also found that binaural objective metrics; signal-to-noise ratio both strongly predict QoE across listener translations, with the former outperforming the latter in predicting QoE across listener directions. Overall, Naviqual provides a QoE map for virtual live music environments robust across various listener locations; directions, noise locations, music content,; room acoustics.
Audio engineering often implicitly assumes a uniformity in hearing across listeners; this is an assumption that does not reflect real-world diversity. How could technologies and practices in production, mixing, and reproduction be adapted to create music that is more inclusive? While the AES has a conference series on Audio and Music Induced Hearing Disorders, this has focused on the causes of hearing loss with little on audio engineering for listeners who have a hearing loss.
In western countries, about one in three adults are deaf, have hearing loss or suffer from tinnitus. Hearing loss can lead to many challenges with music such as: inaudibility of quieter passages, distortion, degraded pitch perception, and difficulty in identifying and picking out lyrics and instruments. The most common intervention for mild to moderately severe hearing loss is hearing aids. But while many of these devices have music programs, their efficacy is mixed, to the point that many opt not to use them. With the rise of machine learning within Audio Engineering, there are opportunities to better personalise music, and therefore address issues listeners face. Consumer devices are also increasingly having audio accessibility features added, but the usefulness of these lack independent testing. This workshop will consider opportunities for making music more accessible.
The workshop will start by exploring how hearing loss harms the experience of listening to music and how this varies between people. This will lead to discussion of why no technology can fully ‘correct’ music to achieve a ‘perfect’ listening experience for those with hearing loss. There is no technology to recreate a ‘golden-ears’ experience. This leads to a key research question: what is the best, rendition of a piece of music for someone who has hearing loss? What do listeners want from music, and how can we get closest to achieving that?
We will bring in findings from research projects and listening tests to explore what is known, and also to highlight that there are significant gaps in knowledge that require further research. We will then explore state-of-the-art in wearables such as hearing aids and sound reproduction systems. This will include the current Cadenza project, which has been running a series of machine learning challenges to improve music for those with hearing loss.
Throughout, we will encourage questions and engagement from delegates. We want to hear about lived experience of hearing difference and how that has changed professional practice and personal lives. We are also keen to hear suggestions from delegates on what approaches might be used to improve music for those with hearing loss.
We aim to raise awareness of the importance of considering diverse audiences in Audio Engineering practice. Where possible, the workshop will provide practical guidance for audio engineers, highlighting techniques and emerging technologies that can better support listeners with diverse hearing profiles.
The Workshop will be organised by the Cadenza Project Team https://cadenzachallenge.org/ A large UK-funded project about improving music for those with hearing loss.
Josh Reiss is Professor of Audio Engineering with the Centre for Digital Music at Queen Mary University of London. He has published more than 200 scientific papers (including over 50 in premier journals and 6 best paper awards) and co-authored two books. His research has been featured... Read More →
Higher-Order Ambisonics (HOA) encoding from sparse, irregular microphone arrays remains a critical challenge for consumer spatial audio capture in immersive communication; XR. We propose Flow-HOA, a generative framework that jointly optimizes a multi-dimensional perceptual objective while producing a deployable, time-invariant bank of Finite Impulse Response (FIR) encoding filters. Using conditional flow matching, the model learns to map a simple prior distribution to the target distribution of FIR filter coefficients. Training is guided by a composite loss that balances time-domain waveform fidelity, multi-resolution spectral consistency, sub-band energy preservation,; spatial directivity constraints. Objective evaluations demonstrate improved performance over strong model-based baselines in both signal fidelity; spatial accuracy metrics. Subjective listening tests further confirm that Flow-HOA yields higher overall sound quality with reduced artifacts.
This paper presents an improved method for characterizing integrated microphone arrays for Device‑Related Transfer Function (DRTF) synthesis. A probe‑array extension of the IMPro technique is introduced to measure all device microphones simultaneously, eliminating unknown timing offsets that arise in asynchronous device–probe recordings. A custom four‑element probe array; modular test jig were developed to evaluate relative inter‑channel propagation delay (RIPD) accuracy across varied microphone‑port geometries. Hybrid free‑field DRTFs were synthesized by combining IMPro data with Boundary Element Method (BEM) acoustic scattering simulations, demonstrating that the probe‑array measurements capture small delay variations essential for precise spatial‑audio modeling. The extended IMPro method offers a practical, scalable alternative to anechoic‑chamber measurements for modern multi‑microphone devices.
The phenomenon in which listeners’ impressions of music are unintentionally altered even when the same sound source is played back remains an important issue. Previous research has shown that the state; combination of audio equipment affect the characteristics of nonlinear distortion in music playback. Hence, we conducted a subjective evaluation of auditory; musical impressions using sound sources with various nonlinear distortions. However, the subjective evaluation was unstable; difficult to assess. The reason was that the sound change was perceived emotionally as a slight change in sound image; musicality,; the interpretation of evaluation terms varies widely among subjects due to the difficulty of verbalizing the impression. Therefore, we evaluated the change in listeners’ stress caused by nonlinear distortion in music playback using the photoplethysmography (PPG). In this study, we conducted a follow-up experiment with improved accuracy. In the experiment, 41 subjects listened to sound sources with even-order harmonic distortion at 2.69% THD, odd-order harmonic distortion at 2.69% THD,; no distortion. The musical piece of sound sources is an original to eliminate familiarity; bias toward existing music. We evaluated changes in subjects’ stress states using the mean pulse-pulse interval (PPI); the root mean square of successive differences (RMSSD), computed from the PPG signal, as indicators of stress. These results reconfirm that nonlinear distortion in music playback affects listeners’ vital responses, as evidenced by significant differences in both mean PPI; RMSSD, as assessed by Cochran's Q test at the 5% significance level.
Stefan reports from the front lines of recording, mixing, and live streaming immersive music, highlighting the technical and creative challenges of delivering three-dimensional sound in real time. He shares practical insights into spatial mixing, format compatibility, and the realities of reliable immersive streaming across diverse playback environments.
This masterclass series, featuring remarkable recording artists, is a chance to hear 3D audio at its best; as we discuss qualities that make it truly worth the effort.
In each masterclass, we explore the new spatial possibilities in recording and production, detailing also this specific listening room, regarding ITU-R BS.1116 compliance and auditory envelopment (AEV) transparency. Seats are limited to keep playback variation at bay.
Stefan Bock, born 20.08.1964 in southern Germany was starting his career in 1987 as an audio engineer. After freelancing in different facilities in Munich, he co-founded msm-studios in 1991 where he was the Chief Mastering Engineer and General Manager.
The AES Technical Committees (TC) lead the Society's involvement in science and technology, and are a hub of networking, knowledge and expertise. Each TC specializes in a specific area of audio, and helps forge links between each of these areas and the society as a whole. Connect and engage!
The demand for wireless audio expands constantly, while the available RF spectrum over recent decades has shrunk and become more crowded. This session will explore strategies for making wireless audio work cleanly and reliably, essential information for live production, as well as TV and film production.
These sessions are an opportunity for AES student members to receive feedback on their mixes from a panel of industry professionals, in a live, non-competitive setting. Join us to hear mixes by other students, and get tips, tricks, and advice to push your skills to the next level! Mixes can be submitted in advance by following the instructions are posted at: https://www.aesstudents.org/competitions/student-mix-critiques/ Very limited on-site submission may also be possible on site. Maybe one of your mixes can be featured!
AES / Kansas City Kansas Community College / off-beat-open-hats LLC, AES
Dr. Ian Corbett is the Coordinator and Professor of Audio Engineering and Music Technology at Kansas City Kansas Community College. He also owns and operates "off-beat-open-hats LLC”, providing live sound, audio production, and recording services to clients in the Kansas City area. Highly active... Read More →
Thursday May 28, 2026 4:00pm - 5:00pm CEST Building 302, 2nd floorTechnical University of Denmark Asmussens Alle, Building 302 DK-2800 Kgs. Lyngby Denmark
This paper presents Part 2 of our study on personalized timbre optimization for stereophonic sound reproduction via earphones, following our previous work presented at the AES International Conference on Headphone Technology in 2025. While Part 1 established a novel auditory-model-based framework for reproducing a listener’s natural timbre reference; demonstrated its perceptual validity under controlled conditions, the present study focuses on the practical implementation; validation of this approach for real-world use with consumer True Wireless Stereo (TWS) earphones.
Conventional headphone; earphone personalization techniques primarily target spatial audio reproduction or rely on preference-based equalization, often overlooking the accurate reproduction of natural timbre in stereophonic content. Our approach explicitly addresses this limitation by isolating; optimizing perceptually relevant timbral cues while excluding spatial encoding components, thereby improving timbral fidelity without degrading stereo imaging.
The proposed method originally consists of four stages: high-resolution anatomical scanning of the listener’s upper body, including the pinnae, individualized HRTF computation using the boundary element method, selective removal of spatial encoding components to derive a personalized reference target response curve (PR-TRC),; perceptual optimization using a listener-specific weighting coefficient grounded in auditory reference fidelity rather than preference. In this paper, each stage is simplified ; automated using smartphone-based scanning; AI-assisted processing, enabling end users to complete the entire personalization process via a smartphone connected to a cloud-based server. The resulting personalized target response curve is implemented within the computational; memory constraints of the DSP pipeline of commercial consumer TWS earphones.
A subjective evaluation using the Semantic Differential Method was conducted to assess the perceptual impact of the simplified implementation. Twenty-four listeners evaluated personalized target curves generated by both the original ; simplified methods, as well as two non-personalized target curves commonly used in commercial TWS earphones. The results show that both personalized methods consistently outperform non-personalized conditions in overall sound quality; listener preference. Importantly, no statistically significant degradation in perceived timbral naturalness was observed between the simplified; original methods.
These findings demonstrate that auditory-model-based personalized timbre optimization can be effectively translated into a practical, consumer-ready technology. The proposed approach represents a foundational contribution to future audio personalization; has broad applicability across headphone; earphone systems for stereophonic sound reproduction.
Kimio Hamasaki, an AES Fellow, is a producer and balance engineer for music recordings, a researcher in spatial audio, an educator in audio engineering and acoustics, and a consultant in audio engineering. He has recorded and produced numerous orchestral and operatic works with the Vienna Philharmonic... Read More →
While Neural Audio Codecs (NAC) have revolutionized monaural audio compression, achieving high-fidelity dual-channel coding at low bitrates remains a significant challenge. Existing approaches often rely on naive independent channel quantization, leading to phase incoherence, or entangled latent modeling, which sacrifices spatial precision for spectral energy. This paper proposes a novel dual-channel coding framework based on contentspatial disentanglement. Reframing spatial reconstruction as an informed source separation task, our architecture synergizes a frozen, pre-trained DAC encoder for robust mono content preservation with a parameter-efficient side information encoder that predicts fine-grained time-frequency masks. To ensure precise spatial imaging, we introduce explicit physical constraints into the end-to-end training. Experimental results indicate that at low bitrates of 9; 11 kbps, the proposed method outperforms state-of-the-art dual-mono neural baselines; industry standards in both objective spatial metrics; subjective MUSHRA evaluations.
Audio engineering standards often present as objective, yet they frequently rely on a systemic data bias which Perez characterises as the 'default male bias' [1]. This paper examines the hegemony of the male ear, a system of norms that privileges masculine modes of hearing by prioritizing technical structure; text over affective experience; timbre [2]. By transitioning from a visual centric auditory gaze toward an embodied sonic gnosis, researchers can recover haptic; physiological ways of knowing sound. Drawing on the feminist listening praxis of the Female Ear [3], this work explores the recording studio as an analytical space where sonic microaggressions [4] enforce rigid technical standards. The author argues for a new audio praxis that centers ear pleasures [5], validating subjective; affective sensory data as legitimate engineering input. This approach seeks to dismantle the regulatory fiction [6] of a universal hearing standard, promoting a pluralistic understanding of musicking [7] that is inclusive of non normative perspectives.
This years famous Richard Heyser Momorial Lecture will be given by Professor Dorte Hammershøi from Aalborg University.
Throughout a distinguished academic career, the lecturer’s work in measuring outer ear transfer functions and headphone characteristics served not only to develop and refine methods for binaural recording and reproduction, but eventually provided a stepping stone into the field of technical audiology and hearing-aid rehabilitation. In 2026, an earphone is rarely just a sound reproduction device, and a hearing aid is rarely just a medical device. The talk will give highlights from 36 years of work in the field, and discuss what the presenter considers to be the contemporary challenges when earphones become hearing aids and vice versa. Finally, the presenter may address the challenges of creating audio-only virtual reality for blind gamers.
Technical Counsil Vicechair, Audio Engineering Society
Worked in various fields of audio – digital mixer design at Wheatstone (broadcast), DSP at Motorola (consumer, professional), R&D and product development at THX (amplification, line arrays, automotive sound), engineering strategy as CTO of Audio Precision (test & measurement); worked... Read More →
This is the social start of the convention - following directly after the famous "Richard Heyser Memorial Lecture" held by Professor Dorte Hammershøiwith the title: "From head-related transfer functions to risk of damage and hearing rehabilitation"
The will be Drinks, snacks and live music with vocal ensemble "Tonika"!
Come and join us - catch up with your connections and make new connections!
Thursday May 28, 2026 6:30pm - 7:30pm CEST Foyer Building 303ATechnical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark