This work presents the results of a perceptual study investigating the influence on musicians of a virtual acoustics system installed in the live room of a professional recording studio. The study focused on analyzing relationships between a selection of objective acoustic parameters (T30, STLate, LJ); subjective perceptions of 19 solo musicians performing under 11 different acoustic conditions. The experiment was conducted using the VAT (Virtual Acoustic Technology) system; the VAT Suite software developed at the Immersive Media Laboratory (IMLab) in the Sound Recording Department at McGill University. Correlations between quantitative; qualitative analyses show that musicians’ preferences converge on conditions with T30 ≈ 1 s,; that late; lateral energy increases the perception of spatiality, providing a positive balance between clarity; acoustic support. However, longer reverberation reduces comfort; executive control.
A “phantom image” is the illusion of an independent sound source created by two or more loudspeakers. Most often created by manipulating level differences between stereophonic channels (aka, “panning”), the effect is used to create a sense of auditory space between loudspeakers ; is largely taken for granted. In recent years, surround; immersive audio systems have attempted to utilize phantom image processing to render audio objects in desired positions across multiple loudspeaker arrays. This research examined the efficacy of phantom image perception horizontally; vertically from an active listener perspective. After listening to a target loudspeaker, listeners (n = 442) were asked to move a phantom sound to a position to match that of the target loudspeaker. The listener’s phantom placement was then compared to the target,; subjects were allowed “correct” their phantom position. The horizontal experiment was based on a standard stereophonic 60° loudspeaker array with the target loudspeaker at 15° off center. The vertical experiment utilized elevated loudspeakers in a 60° arc with the target loudspeaker elevated 10° above the horizon (lower loudspeaker). Results show nearly universal “undershoot” in horizontal placement error on first attempts with gradual improvement over trials that coalesced around the projected target location. However, after repeated tries, final perceptual image locations were spread over 2/3 of the sound-field around the target loudspeaker. In the vertical trials perceptual locations were spread across the entire sound field in all three trials; failed to show any patterns of coalescence around the target loudspeaker.
Associate Professor of Audio Engineering Technology, interested in the perception and cognition of music and sound, especially timbre and attention. An amateur historical keyboardist. And my first name sounds like "song-he" as in "The song he sang was beautiful."
Target curves for the sound signature of headphones are a helpful design target during the development process. While a lot of attention has been made to fi nd target curves that match the listening preference of consumers, equivalents for studio headphones date back to the 90’s. In the context of music production a mutual target or even standard is essential as to make mixing; mastering more gear-independent. This becomes even more important since the main tool for sound engineers shifts from loudspeakers in professional environments such as acoustically treated studios to headphones, often additionally equipped with virtualization algorithms. This enables them to be more fl exible; to rely less on potentially expensive loudspeaker setups. The diffuse fi eld target curve that is currently still the only standardized target curve for studio headphones is often reported to not match a real loudspeaker-equivalent of studio environments. In this paper, we approach to find a new standard target curve for studio headphones emulating the frequency response of a loudspeaker setup in modern studio environments. For this, we give an overview of current target curves; match them to their equivalent loudspeaker setups. Based on that we propose a new methodology for a measurement-based target curve incorporating typical panning paradigms of music signals based on measurements inside multiple control rooms. To verify the results, we conduct listening tests with professionals in multiple studio environments.
Deep learning has significantly improved speech enhancement performance in controlled laboratory conditions, yet these advances rarely translate into robust real-world benefit for hearing aid users. Current algorithms are trained; evaluated in simplified acoustic scenarios, neglecting multimodal cues, user interaction, environmental dynamics, ; the strict latency; power constraints of embedded devices. As a result, a persistent gap remains between algorithmic performance; everyday listening experience. This position paper reviews recent progress in speech enhancement, embedded Artificial Intelligence hardware,; hearing aid systems,; argues for a shift toward ecologically valid evaluation; hardware-aware design. We propose virtual reality as a reproducible, multisensory benchmarking platform enabling joint assessment of human perception; algorithmic processing. This perspective outlines a research roadmap toward adaptive, context-aware, ; practically deployable hearing technologies.
Few studies exist on the perception; measurement of nonlinear distortion in headphones. This paper reports the detection thresholds; perceived sound quality from real distortion in headphones. Five different distortion measurements were made on the headphones to determine how well they predict audibility; quality. Music samples were binaurally recorded on six headphones at playback levels ranging from 85 to +110 dBA at 3 dB increments. The recordings were reproduced at a normal playback level (83 dBA) through a reference headphone with low distortion. The headphone recordings were post-processed to remove both level; frequency response differences so only nonlinear distortions; residual noise remained. In a second test, listeners rated the similarity in quality of headphones relative to an undistorted reference; a hidden version of it. The results provide evidence audible distortion in headphones with music occurs at significantly higher playback levels (104 to 112 dBA SPL) than what is considered typical; safe. The percentage of measured THD in the headphone had the highest correlation with the detection thresholds while the non-coherent distortion with music best predicted the similarity ratings. We discuss the results; the practical implications they might have on future headphone design, testing; measurement.
This work presents a perceptual model based on a complex IIR filterbank. The filterbank with a frequency resolution of 4 bands per Bark consists of 104 filters whose slopes are designed to take spectral masking effects into account. The filter outputs are used to obtain masking thresholds with the following post processing. To obtain resonable masking thresholds from the spreading outputs, a post masking stage is required. Here, we propose a comodulation dependent adaptation of the postmasking decay to model Comodulation Masking Release (CMR) effects. This approach explicitely considers the dip-listening effect known from literature. The final masking thresholds are obtained by weighting the postmasking outputs by a tonality dependent gain, controlled using spectral flatness estimation. A listening test compares the proposed method to an already known approach using direct CMR based modification of the masking threshold gains.
EMORSION is an exploratory study examining how film audio design shapes audience emotion; immersion. It was conducted using scenes from four films in the horror (2) ; drama (2) genres, with two mainstream; two independent productions. For each scene, multiple alternative audio mixes were created by systematically manipulating three core aspects of audio design; frequency (pitch), dynamics (loudness),; directionality (spatial placement). Three audience groups were exposed to the scenes in a cinema setting, with each group experiencing either one manipulated audio mix; a control mix. Audience responses were assessed through a multimodal framework combining self-reported emotion; immersion via a questionnaire,; physiological measures, including heart rate monitoring; video-based motion tracking. Results show that subtle changes in audio design significantly affect emotional perception; immersion. Unconventional mixes produced greater variability in interpretation, while conventional immersive mixes led to stronger agreement across audiences. Notably, participants often reported perceived visual changes despite no alterations to the visual content.
Josh Reiss is Professor of Audio Engineering with the Centre for Digital Music at Queen Mary University of London. He has published more than 200 scientific papers (including over 50 in premier journals and 6 best paper awards) and co-authored two books. His research has been featured... Read More →
I'm Nelly Garcia. I'm an engineer in communications and electronics with the specialty in acoustics. Now, I'm a PhD Researcher at the Centre for Digital Music (C4DM) at Queen Mary University of London. My main interest is sound design, ways to create sounds from scratch, optimize the workflow of a sound designer and innovative ways to label, categorise or access samples... Read More →
Identifying robust headphone target curves is challenging when preference data from untrained listeners are interpreted without explicit perceptual structure. This work presents a methodological framework in which deep- learning-driven sensory-profile analysis serves as the primary interpretive layer for listening data. Candidate target curves are generated using an Interactive Differential Evolution (IDE) listening experiment that combines paired comparisons with a second- stage absolute-rating task, enabling continuous exploration of the perceptually relevant tuning space while reducing cognitive load. Converged gain sets are analyzed using a Virtual Listener Panel (VLP), a Deep Learning (DL) model trained on large-scale expert evaluations to predict perceptual attributes from rendered musical material. Predicted attributes are reported as relative scores along key sensory dimensions, including bass strength, timbral balance,; brilliance, enabling exploration of sensory clusters, perceptual trade-offs,; potential families of target tunings. Adaptive listening data from three culturally distinct listener panels (Denmark, Japan,; Colombia; 20 participants per site) support the DL-based interpretation. Convergence is quantified as a reduction in population variance, ; cross-site analyses assess the similarity of clustering structures; the consistency of relationships between preference; sensory attributes. Overall, the framework provides a scalable, perceptually grounded approach to interpreting listener-preference data when developing headphone target curves.
Perceptual Audio Evaluation Specialist, FORCE Technology
▪ Acoustics, psychoacoustics, product development, and digital communication as an Audio Engineer in the consumer electronics industry. ▪ Currently employed as a specialist at FORCE Technology's SenseLab department, contributing to enhancing sound quality in a wide range of consumer electronics products, collaborating with audio companies from across the globe... Read More →
Sa quintina is a distinctive emergent vocal phenomenon almost exclusively associated with the sacred polyphonic singing tradition of Castelsardo, perceived as an autonomous “fifth voice” arising during collective performance by four male singers. Although widely acknowledged in ethnomusicological literature, its formation mechanisms remain only partially explored within audio engineering; acoustical research. This paper presents an early-stage, descriptive sonological case study proposing new hypotheses on the formation; spatial reinforcement of sa quintina. The phenomenon is interpreted as a physically grounded, measurable outcome of harmonic fusion; spatial interference, observable through spectral energy distribution; coherence. It is hypothesized to emerge from a converging set of conditions—including non-tempered harmonic textures, differentiated vocal emission techniques, intentional formant tuning,; circular spatial configuration—none of which is assumed to be strictly sufficient in isolation. Building upon previous spectral coherence analyses, the study introduces a Quintina Directionality Index (QDI) to quantify the spatial dimension of the phenomenon. QDI is defined as the ratio between spectral energy in two frequency bands associated with sa quintina (600–750 Hz; 1200–1400 Hz); total spectral energy. The index is evaluated as a function of direction using ambisonic recordings in an anechoic chamber; as a function of microphone position in a controlled field setting. Preliminary observations suggest that sa quintina corresponds to localized regions of enhanced spectral coherence; energy reinforcement, supporting its interpretation as an emergent physical phenomenon that precedes; enables its perceptual salience, rather than a purely auditory illusion.
Live music environments can be simulated; evaluated through spatial audio; augmented reality (AR) technology. However, conducting perceptual studies on AR environments can be challenging, as multiple design considerations; uncontrolled variables come into play. Hence, we developed Naviqual, a tool to create a spatial audio quality map for a virtual live music environment. We generated objective quality contour; polar maps to predict the quality of experience (QoE) across listener locations; directions respectively. We found that these maps strongly aligned with perceptual evaluations by normal-hearing listeners through listening tests. We also found that binaural objective metrics; signal-to-noise ratio both strongly predict QoE across listener translations, with the former outperforming the latter in predicting QoE across listener directions. Overall, Naviqual provides a QoE map for virtual live music environments robust across various listener locations; directions, noise locations, music content,; room acoustics.
The phenomenon in which listeners’ impressions of music are unintentionally altered even when the same sound source is played back remains an important issue. Previous research has shown that the state; combination of audio equipment affect the characteristics of nonlinear distortion in music playback. Hence, we conducted a subjective evaluation of auditory; musical impressions using sound sources with various nonlinear distortions. However, the subjective evaluation was unstable; difficult to assess. The reason was that the sound change was perceived emotionally as a slight change in sound image; musicality,; the interpretation of evaluation terms varies widely among subjects due to the difficulty of verbalizing the impression. Therefore, we evaluated the change in listeners’ stress caused by nonlinear distortion in music playback using the photoplethysmography (PPG). In this study, we conducted a follow-up experiment with improved accuracy. In the experiment, 41 subjects listened to sound sources with even-order harmonic distortion at 2.69% THD, odd-order harmonic distortion at 2.69% THD,; no distortion. The musical piece of sound sources is an original to eliminate familiarity; bias toward existing music. We evaluated changes in subjects’ stress states using the mean pulse-pulse interval (PPI); the root mean square of successive differences (RMSSD), computed from the PPG signal, as indicators of stress. These results reconfirm that nonlinear distortion in music playback affects listeners’ vital responses, as evidenced by significant differences in both mean PPI; RMSSD, as assessed by Cochran's Q test at the 5% significance level.
This paper presents Part 2 of our study on personalized timbre optimization for stereophonic sound reproduction via earphones, following our previous work presented at the AES International Conference on Headphone Technology in 2025. While Part 1 established a novel auditory-model-based framework for reproducing a listener’s natural timbre reference; demonstrated its perceptual validity under controlled conditions, the present study focuses on the practical implementation; validation of this approach for real-world use with consumer True Wireless Stereo (TWS) earphones.
Conventional headphone; earphone personalization techniques primarily target spatial audio reproduction or rely on preference-based equalization, often overlooking the accurate reproduction of natural timbre in stereophonic content. Our approach explicitly addresses this limitation by isolating; optimizing perceptually relevant timbral cues while excluding spatial encoding components, thereby improving timbral fidelity without degrading stereo imaging.
The proposed method originally consists of four stages: high-resolution anatomical scanning of the listener’s upper body, including the pinnae, individualized HRTF computation using the boundary element method, selective removal of spatial encoding components to derive a personalized reference target response curve (PR-TRC),; perceptual optimization using a listener-specific weighting coefficient grounded in auditory reference fidelity rather than preference. In this paper, each stage is simplified ; automated using smartphone-based scanning; AI-assisted processing, enabling end users to complete the entire personalization process via a smartphone connected to a cloud-based server. The resulting personalized target response curve is implemented within the computational; memory constraints of the DSP pipeline of commercial consumer TWS earphones.
A subjective evaluation using the Semantic Differential Method was conducted to assess the perceptual impact of the simplified implementation. Twenty-four listeners evaluated personalized target curves generated by both the original ; simplified methods, as well as two non-personalized target curves commonly used in commercial TWS earphones. The results show that both personalized methods consistently outperform non-personalized conditions in overall sound quality; listener preference. Importantly, no statistically significant degradation in perceived timbral naturalness was observed between the simplified; original methods.
These findings demonstrate that auditory-model-based personalized timbre optimization can be effectively translated into a practical, consumer-ready technology. The proposed approach represents a foundational contribution to future audio personalization; has broad applicability across headphone; earphone systems for stereophonic sound reproduction.
Kimio Hamasaki, an AES Fellow, is a producer and balance engineer for music recordings, a researcher in spatial audio, an educator in audio engineering and acoustics, and a consultant in audio engineering. He has recorded and produced numerous orchestral and operatic works with the Vienna Philharmonic... Read More →
Audio engineering standards often present as objective, yet they frequently rely on a systemic data bias which Perez characterises as the 'default male bias' [1]. This paper examines the hegemony of the male ear, a system of norms that privileges masculine modes of hearing by prioritizing technical structure; text over affective experience; timbre [2]. By transitioning from a visual centric auditory gaze toward an embodied sonic gnosis, researchers can recover haptic; physiological ways of knowing sound. Drawing on the feminist listening praxis of the Female Ear [3], this work explores the recording studio as an analytical space where sonic microaggressions [4] enforce rigid technical standards. The author argues for a new audio praxis that centers ear pleasures [5], validating subjective; affective sensory data as legitimate engineering input. This approach seeks to dismantle the regulatory fiction [6] of a universal hearing standard, promoting a pluralistic understanding of musicking [7] that is inclusive of non normative perspectives.
Recent advances in large-scale multichannel loudspeaker systems have enabled immersive concert formats that extend spatial control beyond conventional stereo; small multichannel configurations. High-density loudspeaker arrays (HDLAs) allow sound to be distributed across complex architectural spaces, challenging established distinctions between composition, performance,; live sound practice. In live contexts, however, the realization of spatial attributes is often constrained by system complexity, limited rehearsal time,; the lack of artist-facing spatial control interfaces. As a result, spatial realization; sound diffusion are frequently delegated to sound engineers, who translate artistic material to the acoustic; architectural conditions of the venue in real time.
This paper examines three immersive concerts presented during Sonic Days 2025 in Denmark, realized on both large-scale; small-scale multichannel loudspeaker systems. The concerts represent contrasting production contexts, including a site-specific spatial composition conceived explicitly for a high-density loudspeaker array ; performances by artists whose practices are typically oriented toward stereo or small multichannel formats. Across these cases, spatialization functioned variously as compositional material, interpretive layer,; adaptive live-mixing practice.
The paper analyzes how control over spatial attributes is negotiated between artists; sound engineers in live immersive concert settings,; how this negotiation affects the interpretation of artistic intent; audience experience. Particular attention is given to the role of sound engineers as active mediators whose decisions shape spatial form, listening perspective,; the relationship between sound; architecture. The findings suggest that immersive concert formats redistribute creative agency across artists, technicians,; technological infrastructures,; point toward the need for revised conceptual frameworks for authorship, performance,; listening in large-scale spatial audio environments.
This presentation develops a conceptual framework for understanding how visitors cognize sound in museum exhibitions. While sound increasingly features in museum practice, research has focused primarily on measuring visitor enjoyment; engagement rather than examining the specific meanings sound generates. This gap reflects the absence of a framework conceptualizing sound's meaning-making capacities to guide empirical investigation. Drawing on scholarship from music studies, semiotics, phenomenology,; embodied cognition, I propose a seven-component spectrum identifying distinct yet interrelated meanings that sound can convey in museums: aesthetic, representational, emotional, sensorial, imaginative, social,; political. These meanings can be apprehended independently or in combination, typically through emergent, pre-conscious perception rather than deliberate awareness. The spectrum builds on the premise that museum sound meaning-making unfolds through dynamics internalized from early childhood as we attune to the world sonically. It draws on the notion of sound as a "sonic aggregate" (Grimshaw; Garner 2015)—encompassing social, contextual, temporal,; embodied experiences—rather than reducing sound to wave phenomena. Visitors actively co-produce meanings by drawing on their moods, memories, knowledge, ; imagination during exhibition encounters. Each meaning category is illustrated with exhibition case studies, demonstrating the spectrum's applicability across diverse sound-based multimodal museum practices—from popular music exhibitions to sound art installations. The spectrum aims to catalyze research through varied methodological approaches; establish analytical standards for studying sound in museums, with potential adoption by international standardization bodies.
Sound Studies Researcher, INET-md | NOVA University lisbon
A PhD in ethnomusicology and museum studies and a curator, I am committed to exploring the diverse meaning-making capabilities of sound when exhibited in museums, encompassing the representational, emotional, sensorial, and social, as well as its ability to foster imagination and... Read More →
Friday May 29, 2026 10:30am - 11:00am CEST Aud 43Technical University of Denmark Asmussens Alle, Building 303A DK-2800 Kgs. Lyngby Denmark
This paper presents the perceptual evaluation of the Open Binaural Renderer (OBR), an open-source librarydeveloped for headphone-based rendering of Immersive Audio Model and Formats (IAMF) content. The evaluationfollowed an iterative framework in which findings from a pilot listening study informed the tuning of renderingprofiles, and the resulting renderer was benchmarked against established proprietary solutions. In the pilot study,19 expert listeners rated the Overall Listening Experience (OLE) of the initial prototype (OBRv1) and five externalrenderers across diverse audio content. Qualitative feedback was analysed using inductive coding to identify salientperceptual dimensions. The pilot revealed content-dependent performance and showed that a single default profilewas inadequate, yielding mixed responses in both the numerical scale and in the qualitative feedback and motivatingthe development of multiple rendering profiles in OBRv2. The main study evaluated two OBRv2 profiles targetingdifferent reverberation characteristics (Direct and Ambient) alongside three top-performing external renderers. Atotal of 39 participants, divided into expert and non-expert groups, rated five perceptual attributes: Voice Quality,Envelopment, Externalisation, Overall Listening Experience, and Timbral Balance. Mixed-design ANOVA revealedsignificant main effects of renderer condition on all attributes. Pairwise comparisons showed that OBRv2,Ambientachieved significantly higher OLE ratings than one proprietary renderer and reached statistical parity with theremaining two, representing a measurable improvement over the prototype. A trade-off between Voice Qualityand Externalisation was observed, driven by the level of reverberation in each renderer. The results demonstratethat iterative, perceptually informed tuning can yield competitive binaural rendering quality in an open-sourceframework.
Professor of Audio Engineering, University of York
Gavin Kearney graduated from Dublin Institute of Technology in 2002 with an Honors degree in Electronic Engineering and has since obtained MSc and PhD degrees in Audio Signal Processing from Trinity College Dublin. He joined the University of York as Lecturer in Sound Design in January... Read More →
With 25+ years of media industry product development, Jani Huoponen is a seasoned expert in developing cutting-edge audio and video technologies for consumer devices and streaming systems. Joining Google in 2010, he’s served as a product manager across key multimedia initiatives... Read More →
Despite the growing number of hearing-impaired workers wearing hearing-aids in occupational settings, understanding speech in multi-talker situations remains challenging. This difficulty is particularly pronounced in open-plan offices, where simultaneous talkers; room reverberation are prone to degrade speech intelligibility. While spatial cues are essential for segregating target speech from competing sources, hearing-aids signal processing may alter binaural information that supports spatial hearing. Accurate evaluation of hearing-aids performance is therefore crucial. Objective speech intelligibility metrics offer an efficient alternative to time-consuming listening tests; however, their validity in complex spatial scenarios involving hearing-impaired listeners remains unclear. Monaural metrics such as HASPI account for individual hearing loss but neglect spatial information, whereas binaural metrics such as MBSTOI incorporate spatial cues but are primarily designed for normal-hearing listeners. This study evaluates the ability of existing objective metrics to predict speech intelligibility for hearing-aid users in multi-talker spatial environments. Listening tests are conducted on 20 hearing-impaired participants fitted with binaural hearing-aids. Four types of multi-talker auditory scenes representative of open-plan offices are reproduced using a loudspeaker array. They involve a target speech, combined with diffuse noise; a localized competing speech source. Objective measurements are performed using an acoustic mannequin fitted with the participants’ hearing-aids. HASPI; MBSTOI values are computed from the binaural signals recorded at the eardrums ; incorporating individual hearing losses. Objective predictions are compared with subjective intelligibility scores,; an ablation analysis is conducted to distinguish the effects of hearing loss modeling from those of binaural processing.
Situational awareness is a multisensory ability that enables individuals to perceive; appropriately take into account their immediate environment. This perception of the world through our senses is carried out continuously; unconsciously throughout the day. When auditory perception is degraded, an individual may no longer correctly perceive a doorbell, a water leak, or an alarm signal, which negatively affects quality of life; may lead to dangerous situations. Auditory perception can in particular be degraded by hearing loss, a common; widespread condition. The most common treatment consists of wearing hearing aids, which are mainly designed to improve speech intelligibility, especially in noisy environments. Feedback from hearing-impaired people; hearing-aid users indicates that, although auditory situational awareness has been recognised as an essential component of well-being, it remains insufficiently studied; requires further investigation. There is currently no standard method for assessing to which extent one's situational awareness is affected by hearing impairment; the use of hearing aids. This is a complex process that requires assessing the perception of relevant sound events within a continuous stream of multisensorial information, by individuals who have different subjective preferences. Most existing methods are limited to evaluating only a subset of the problem, such as identification; localisation of non-speech sound events. The rise of new technologies, such as virtual reality, enables the development of assessment methods within more realistic yet controlled environments. This study aims to review existing methods in order to highlight their limitations in addressing the issue at hand.
Headphone listening has become an integral part of everyday life, spanning music consumption, communication, online media,; increasingly, computer gaming. These diverse listening contexts make individual sound exposure highly variable; difficult to quantify. While music listening ; occupational headphone use have been widely studied, sound exposure from gaming remains comparatively undocumented. This study investigated the relationship between self‑reported exposure through headphones; cochlear function assessed using transient evoked otoacoustic emissions (TEOAE). Forty‑one university students completed a detailed questionnaire on listening habits,; TEOAEs were recorded in both ears across five half‑octave frequency bands. Estimated weekly exposure levels were derived from participants’ reported durations ; contexts of use. TEOAE amplitude, signal‑to‑noise ratio (SNR),; reproducibility showed clear frequency‑dependent patterns; small ear asymmetries, consistent with typical OAE behaviour. Only limited associations were found between self‑reported exposure; TEOAE measures, with significant effects emerging primarily for SNR; reproducibility in the highest‑exposure group. No consistent differences were observed between long‑term gamers; non‑gamers. These findings suggest that self‑reported exposure alone may be insufficient to detect subtle cochlear changes in young adults,; underscore the need for more precise exposure‑monitoring methods when evaluating recreational sound exposure risks.
Binaural rendering is typically assessed via timbre; localization accuracy, while its intrinsic spatial resolution remains rarely quantified. This paper proposes a perceptual evaluation method based on Minimum Audible Angle (MAA) measurements to estimate the azimuthal just-noticeable difference (JND) introduced by binaural rendering algorithms. We systematically compared several rendering algorithms across eight reference azimuths using two participant-allocation paradigms. The results show that spatial resolution is significantly influenced by Ambisonic order; choice of the rendering alrorithm, with MAA thresholds systematically decreasing as the truncation order increases. Furthermore, the propsed method successfully captures physiological spatial characteristics ; identifies resolution limits imposed by reference angles. While both participant-allocation paradigms yield consistent qualitative trends, the repeated-measures design provides superior data stability. These findings demonstrate that the proposed MAA-based method is an effective tool for quantifying the spatial resolution of binaural rendering algorithms.
This study evaluates three Next-Generation Audio (NGA) rendering systems through listening tests using real-life audio content. The testing paradigm prioritized subjective preference over adherence to a ground-truth reference. Participants assessed perceptual spatial audio attributes in both 5.1; 7.1.4 loudspeaker setups. The findings suggest that strict adherence to the rendering algorithm used during content creation is not mandatory in terms of listener preference. While not advocating disregarding artistic intent without consideration, this study proposes that such flexibility in reproduction can be an acceptable compromise.
Toni Hirvonen studied acoustics at the Helsinki University of Technology (now Aalto University), where he obtained a PhD in audio signal processing and spatial audio. After a position as a Marie Curie fellow, he has worked internationally in the audio industry since 2010. His projects... Read More →
Historically, music has developed primarily as a frontal phenomenon, thus limiting the expressive; perceptual potential related to sound space. The recent development of immersive audio systems opens new creative possibilities by expanding the artistic action space from a narrow frontal area to a complete sphere around the listener. The Ambisonic system (Scene-Based Audio), together with Object-Based formats; hybrid solutions, represents fertile ground for creative experimentation; the redefinition of workflows in the field of spatialized sound. In this new context, what is the role of the sound engineer, as an electroacoustic interpreter, in immersive musical artistic creation? The research is based on a multidisciplinary analysis that combines an in-depth study of current immersive audio technologies; their performance, with observations of existing compositional; production approaches. Additionally, a comparative study is conducted on the design choices of the sound engineer as an interpreter, investigating workflows, emerging musical semantics, available tools,; the recovery of the historical repertoire. Particular attention is paid to the experiment aimed at investigating a correlation between the position of a sound ; an emotional trigger in the listener. New directions emerge in the creative role of the sound engineer, who goes beyond the mere technical aspect to become an integral part of the compositional; interpretative process, harmonizing the relationship between technique; art.
Mashup is a distinctive form of music composition which integrates elements from existing songs to create a cohesive audio experience. The digital music landscape, with various audio processing tools; sharing platforms, has facilitated the creation; propagation of mashups by musicians, remixers, audio engineers,; automated systems. While most prior research; studies focus on mashups created by combining elements from individual audio tracks, typically using pop songs, there exists other types of mashups; for example, by incorporating phrases from base melodies into a new arrangement. In this study, we examined listener enjoyment ratings for this type of mashup, utilizing well-known Western classical melodies. A listening test was conducted to assess whether variations in pitch, tempo,; familiarity with the source material correlate with enhanced enjoyment. This paper presents our preliminary findings, with plans for future studies; additional survey responses to strengthen the results; uncover insights for crafting more engaging classical mashups.
Dialogue intelligibility is a fundamental aspect of audio post-production. Ensuring speech clarity in complex sound mixes remains challenging across different playback systems. Selective auditory attention plays a central role in how listeners track dialogue in busy mixes, so small changes in spectral or spatial structure can influence perceived clarity in unexpected ways. This study investigates the effectiveness of psychoacoustically informed techniques, equalisation; spatialisation, in reducing auditory masking; improving the clarity of dialogue. The listening test was completed on participants’ own playback systems, which reflects typical domestic viewing conditions; aligns the study with real-world listening environments. The techniques were tested individually; in combination to assess their impact. Results show that equalisation was more effective than spatialisation in reducing masking, while their combination produced a significant improvement in intelligibility, clarity,; reduced interference. The effectiveness of these methods varied between the two groups of clips, suggesting that their application should be adapted to the specific acoustic context of each scene.
Dialogue and sound editor with 3+ years' experience and 30+ credits in film across feature film, animation, documentary and TV series.Contributed to award-winning and festival recognised productions, including films screened at the Venice Film Festival and the David di Donatello Awards... Read More →
Sound plays a critical role in virtual reality (VR), shaping attention, narrative comprehension, emotional engagement,; experiential plausibility under conditions of embodiment; user agency. Although a growing body of research addresses VR audio techniques, perceptual effects, ; sound taxonomies, existing approaches remain fragmented ; largely descriptive. In particular, they do not provide a unifying, VR-specific account of how sound meaning; emotional intent are operationally linked to user agency ; non-linear narrative progression. This paper presents a narrative review of selected literature spanning game audio frameworks, immersive sound design, narrative theory,; plausibility-related research in games; VR. Through synthesis of these perspectives, the review identifies a conceptual gap in current research, namely the absence of a VR-specific, agency-coupled sound design framework for structuring sound meaning; emotional intent in support of experiential plausibility as users actively shape events in interactive VR environments.
Senior Lecturer, Music Technology & Popular Music, The University of Queensland, School of Music
Dr Eve Klein is a lecturer in music technology at the University of Queensland, Australia. She is also an operatic mezzo soprano, a composer, and an Ableton Live Certified Trainer. Eve's research is concentrated on music technology, recording cultures and contemporary music. Her current... Read More →