Professor Enzo De Sena

Professor and Director of the Institute of Sound Recording (IoSR)

BSc, MSc, PhD, FHEA

e.desena@surrey.ac.uk

http://www.desena.org

07 BC 03

About

Biography

Enzo De Sena received the PhD degree in Electronic Engineering from King’s College London, UK, in 2013. Between 2013 and 2016 he was a postdoctoral researcher at KU Leuven, Belgium. He held visiting researcher positions at Stanford University, Aalborg University, Imperial College London and King’s College London. He is a Senior Member of the IEEE and a member of the IEEE Audio and Acoustic Signal Processing Technical Committee. He currently serves as Associate Editor for the EURASIP Journal on Audio, Speech and Music Processing and IEEE/ACM Transactions on Audio, Speech and Language Processing. He is a recipient of an EPSRC New Investigator Award and a co-recipient of best paper awards at WASPAA-21 and AVAR-22. His research has been showcased at a number of venues, including at the National Gallery, Royal Society Science Exhibition, Bell Labs, WOMAD, and others.

University roles and responsibilities

Director of the Institute of Sound Recording (IoSR)

News

31 JUL 2025

World-first audio research facilities coming to Surrey thanks to £2.2m EPSRC funding

Image of sound waves from a close up of a speaker

10 OCT 2023

Ultimate in surround-sound technology

Research

Research interests

His current research interests include room acoustics modelling, multichannel audio, microphone beam forming and binaural modelling.

Publications

Joshua John Mannall, Paul Calamia, Lauri Savioja, Annika Neidhardt, Russell David Mason, Enzo De Sena (2024)Assessing Diffraction Perception Under Reverberant Conditions in Virtual Reality

When a sound source is occluded, diffraction replaces direct sound as the first wavefront arrival and can influence important aspects of perception such as localisation. Few experiments have investigated how diffraction modelling influences the perceived plausibility of an acoustic simulation. In this paper, an experiment was run to investigate the plausibility of an acoustic simulation with and without diffraction in an L-shaped room in VR. The rendering was carried out using a real-time 6DOF geometrical acoustics and feedback-delay-network hybrid model, and diffraction was modelled using the physically accurate Biot-Tolstoy-Medwin model. The results show that diffraction increases the perceived plausibility of the acoustic simulation. In addition, the study compared diffraction of the direct sound alone and diffraction of both direct and reflected sound. A significant increase in plausibility was found by the addition of diffracted reflection paths, but only in the so-called shadow zone.

Joshua Mannall, Lauri Savioja, Paul Calamia, Russell David Mason, Enzo De Sena (2023)Efficient diffraction modelling using neural networks and infinite impulse response filters *, In: Journal of the Audio Engineering Society. [electronic resource]71(9)pp. 566-576 AUDIO ENGINEERING SOC

DOI: 10.17743/jaes.2022.0107

Creating plausible geometric acoustic simulations in complex scenes requires the inclusion of diffraction modelling. Current real-time diffraction implementations use the Uniform Theory of Diffraction (UTD) which assumes all edges are infinitely long. We utilise recent advances in machine learning to create an efficient infinite impulse response model trained on data generated using the physically accurate Biot-Tolstoy-Medwin model. We propose an approach to data generation that allows our model to be applied to higher-order diffraction. We show that our model is able to approximate the Biot-Tolstoy-Medwin model with a mean absolute level difference of 1.0 dB for 1st-order diffraction while maintaining a higher computational efficiency than the current state of the art using UTD.

Marcela Rada, Russell David Mason, Enzo De Sena (2025)Immersive Music Production Workflows: An Ethnographic Study of Current Practices, In: 158th Audio Engineering Society Convention Audio Engineering Society

This study presents an ethnographic analysis of current immersive music production workflows, examining industry trends, tools, and methodologies. Through interviews and participant observations with professionals across various sectors, the research identifies common patterns, effective strategies, and persistent obstacles in immersive audio production. Key findings highlight the ongoing struggle for standardized workflows, the financial and technological barriers faced by independent artists, and the critical role of collaboration between engineers and creatives. Despite the growing adoption of immersive formats, workflows still follow stereo conventions, treating spatialization as an afterthought and complicating the translation of mixes across playback systems. Additionally, the study explores the evolving influence of object-based and bed-based mixing techniques, monitoring inconsistencies across playback systems, and the need for improved accessibility to immersive production education. By synthesizing qualitative insights, this paper contributes to the broader discourse on immersive music production, offering recommendations for future research and industry-wide best practices to ensure the sustainable integration of spatial audio technologies.

Thomas Dietzen, Enzo De Sena, Toon van Waterschoot (2022)Low-Complexity Steered Response Power Mapping Based on Nyquist-Shannon Sampling, In: 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)pp. 206-210 Institute of Electrical and Electronics Engineers (IEEE)

DOI: 10.1109/WASPAA52581.2021.9632774

The steered response power (SRP) approach to acoustic source localization computes a map of the acoustic scene from the frequency-weighted output power of a beamformer steered towards a set of candidate locations. Equivalently, SRP may be expressed in terms of time-domain generalized cross-correlations (GCCs) at lags equal to the candidate locations' time-differences of arrival (TDOAs). Due to the dense grid of candidate locations, each of which requires inverse Fourier transform (IFT) evaluations, conventional SRP exhibits a high computational complexity. In this paper, we propose a low-complexity SRP approach based on Nyquist-Shannon sampling. Noting that on the one hand the range of possible TDOAs is physically bounded, while on the other hand the GCCs are bandlimited, we critically sample the GCCs around their TDOA interval and approximate the SRP map by interpolation. In usual setups, the number of sample points can be orders of magnitude less than the number of candidate locations and frequency bins, yielding a significant reduction of IFT computations at a limited interpolation cost. Simulations comparing the proposed approximation with conventional SRP indicate low approximation errors and equal localization performance. MATLAB and Python implementations are available online.

Thomas Potter, Zoran Cvetkovic, ENZO DE SENA (2022)On the Relative Importance of Visual and Spatial Audio Rendering on VR Immersion, In: Front. Signal Process. - Audio and Acoustic Signal Processing2 Frontiers Media

DOI: 10.3389/frsip.2022.904866

A study was performed using a virtual environment to investigate the relative importance of spatial audio fidelity and video resolution on perceived audiovisual quality and immersion. Subjects wore a head-mounted display and headphones and were presented with a virtual environment featuring music and speech stimuli using three levels each of spatial audio quality and video resolution. Spatial audio was rendered monaurally, binaurally with head-tracking, and binaurally with head-tracking and room acoustic rendering. Video was rendered at resolutions of 0.5 megapixels per eye, 1.5 megapixels per eye, and 2.5 megapixels per eye. Results showed that both video resolution and spatial audio rendering had a statistically significant effect on both immersion and audiovisual quality. Most strikingly, the results showed that under the conditions that were tested in the experiment, the addition of room acoustic rendering to head-tracked binaural audio had the same improvement on immersion as increasing the video resolution five-fold, from 0.5 megapixels per eye to 2.5 megapixels per eye.

Juan Franco, Bogdan Bǎcilǎ, Tim Brookes, Enzo De Sena (2022)A multi-angle, multi-distance dataset of microphone impulse responses, In: Journal of the Audio Engineering Society70(10)pp. 882-883

DOI: 10.17743/jaes.2022.0027

A new publicly available dataset of microphone impulse responses (IRs) has been generated. The dataset covers 25 microphones, including a Class-1 measurement microphone, plus polar pattern variations for 7 of the microphones. Microphones were included having: omnidirectional, cardioid, supercardioid and bidirectional polar patterns; condenser, moving-coil and ribbon transduction types; single and dual diaphragms; multiple body and head basket shapes; small and large diaphragms; and end-address and side-address designs. Using a custom-developed computer-controlled precision turntable, IRs were captured quasi-anechoically at incident angles from 0º to 355º in steps of 5º, and at source-to-microphone distances of 0.5 m, 1.25 m and 5 m. The resulting dataset is suitable for perceptual and objective studies related to the incident-angle-dependent response of microphones, as well as for the development of tools for predicting and emulating on- and off-axis microphone characteristics. The captured IRs allow generation of frequency response plots with a degree of detail not commonly available in manufacturer-supplied data sheets, and are also particularly well suited to harmonic distortion analysis.

Brian Fitzpatrick, Enzo De Sena, Toon Van Waterschoot (2021)On the convergence of the multipole expansion method, In: SIAM Journal on Numerical Analysis59(5)pp. 2473-2499 Society for Industrial and Applied Mathematics

DOI: 10.1137/20M1370914

The multipole expansion method (MEM) is a spatial discretization technique that is widely used in applications that feature scattering of waves from circular cylinders. Moreover, it also serves as a key component in several other numerical methods in which scattering computations involving arbitrarily shaped objects are accelerated by enclosing the objects in artificial cylinders. A fundamental question is that of how fast the approximation error of the MEM converges to zero as the truncation number goes to infinity. Despite the fact that the MEM was introduced in 1913, and has been in widespread usage as a numerical technique since as far back as 1955, a precise characterization of the asymptotic rate of convergence of the MEM has not been obtained. In this work, we provide a resolution to this issue. While our focus in this paper is on the Dirichlet scattering problem, this is merely for convenience and our results actually establish convergence rates that hold for all MEM formulations irrespective of the specific boundary conditions or boundary integral equation solution representation chosen.

Timucin Berk Atalay, Zuhre Su Gul, Enzo De Sena, Zoran Cvetkovic, Huseyin Hachabiboglu (2022)Scattering Delay Network Simulator of Coupled Volume Acoustics, In: IEEE/ACM Transactions on Audio, Speech, and Language Processing30pp. 582-593 Institute of Electrical and Electronics Engineers (IEEE)

DOI: 10.1109/TASLP.2022.3143697

Artificial reverberators provide a computationally viable alternative to full-scale room acoustics simulation methods for deployment in interactive, immersive systems. Scattering delay network (SDN) is an artificial reverberator that allows direct parametric control over the geometry of a simulated cuboid enclosure, as well as the directional characteristics of the simulated sound sources and microphones. This paper extends the concept of SDN reverberators to multiple enclosures coupled via an aperture. The extension allows independent control of the acoustical properties of the coupled enclosures and the size of the connecting aperture. Transfer functions of the coupled-volume SDN are derived. The effectiveness of the proposed method is evaluated in terms of rendered energy decay curves in comparison to full-scale ray-tracing models and scale model measurements.

Thomas Dietzen, Enzo De Sena, Toon van Waterschoot (2024)Scalable-Complexity Steered Response Power based on Low-Rank and Sparse Interpolation, In: IEEE/ACM transactions on audio, speech, and language processing32pp. 1-16 IEEE

DOI: 10.1109/TASLP.2024.3496317

The steered response power (SRP) is a popular approach to compute a map of the acoustic scene, typically used for acoustic source localization. The SRP map is obtained as the frequency-weighted output power of a beamformer steered towards a grid of candidate locations. Due to the exhaustive search over a fine grid at all frequency bins, conventional frequency domain-based SRP (conv. FD-SRP) results in a high computational complexity. Time domain-based SRP (conv. TD-SRP) implementations reduce computational complexity at the cost of accuracy using the inverse fast Fourier transform (iFFT). In this paper, to enable a more favourable complexity-performance trade-off as compared to conv. FD-SRP and conv. TD-SRP, we consider the problem of constructing a fine SRP map over the entire search space at scalable computational cost. We propose two approaches to this problem. Expressing the conv. FD-SRP map as a matrix transform of frequency-domain GCCs, we decompose the SRP matrix into a sampling matrix and an interpolation matrix. While sampling can be implemented by the iFFT, we propose to use optimal low-rank or sparse approximations of the interpolation matrix for complexity reduction. The proposed approaches, refered to as sampling + low-rank interpolation-based SRP (SLRI-SRP) and sampling + sparse interpolation-based SRP (SSPI-SRP), are evaluated in various localization scenarios with speech as source signals and compared to the state-of-the-art. The results indicate that SSPI-SRP performs better if large array apertures are used, while SLRI-SRP performs better at small array apertures or a large number of microphones. In comparison to conv. FD-SRP, two to three orders of magnitude of complexity reduction can achieved, often times enabling a more favourable complexity-performance trade-off as compared to conv. TD-SRP. A MATLAB implementation is available online.

Sebastian J. Schlecht, Matteo Scerbo, Enzo De Sena, Vesa Valimaki (2024)Modal Excitation in Feedback Delay Networks, In: IEEE signal processing letters31pp. 2690-2694 IEEE

DOI: 10.1109/LSP.2024.3466790

Feedback delay networks (FDNs) are used in audio processing and synthesis. The modal shapes of the system describe the modal excitation by input and output signals. Previously, the Ehrlich-Aberth method was used to find modes in large FDNs. Here, the method is extended to the corresponding eigenvectors indicating the modal shape. In particular, the computational complexity of the proposed analysis method does not depend on the delay-line lengths and is thus suitable for large FDNs, such as artificial reverberators. We show the relation between the compact generalized eigenvectors in the delay state space and the spatially extended modal shapes in the state space. We illustrate this method with an example FDN in which the suggested modal excitation control does not increase the computational cost. The modal shapes can help optimize input and output gains. This letter teaches how selecting the input and output points along the delay lines of an FDN adjusts the spectral shape of the system output.

Alessandro Ilic Mezza, Riccardo Giampiccolo, Enzo De Sena, Alberto Bernardini (2024)Data-driven room acoustic modeling via differentiable feedback delay networks with learnable delay lines, In: EURASIP journal on audio, speech, and music processing2024(1)51pp. 51-20 Springer International Publishing

DOI: 10.1186/s13636-024-00371-5

Over the past few decades, extensive research has been devoted to the design of artificial reverberation algorithms aimed at emulating the room acoustics of physical environments. Despite significant advancements, automatic parameter tuning of delay-network models remains an open challenge. We introduce a novel method for finding the parameters of a feedback delay network (FDN) such that its output renders target attributes of a measured room impulse response. The proposed approach involves the implementation of a differentiable FDN with trainable delay lines, which, for the first time, allows us to simultaneously learn each and every delay-network parameter via backpropagation. The iterative optimization process seeks to minimize a perceptually motivated time-domain loss function incorporating differentiable terms accounting for energy decay and echo density. Through experimental validation, we show that the proposed method yields time-invariant frequency-independent FDNs capable of closely matching the desired acoustical characteristics and outperforms existing methods based on genetic algorithms and analytical FDN design.

Enzo De Sena, Huseyin Hacihabiboglu, Zoran Cvetkovic (2013)Analysis and Design of Multichannel Systems for Perceptual Sound Field Reconstruction, In: IEEE transactions on audio, speech, and language processing21(8)6508825pp. 1653-1665 IEEE

DOI: 10.1109/TASL.2013.2260152

This paper presents a systematic framework for the analysis and design of circular multichannel surround sound systems. Objective analysis based on the concept of active intensity fields shows that for stable rendition of monochromatic plane waves it is beneficial to render each such wave by no more than two channels. Based on that finding, we propose a methodology for the design of circular microphone arrays, in the same configuration as the corresponding loudspeaker system, which aims to capture inter-channel time and intensity differences that ensure accurate rendition of the auditory perspective. The methodology is applicable to regular and irregular microphone/speaker layouts, and a wide range of microphone array radii, including the special case of coincident arrays which corresponds to intensity-based systems. Several design examples, involving first and higher-order microphones are presented. Results of formal listening tests suggest that the proposed design methodology achieves a performance comparable to prior art in the center of the loudspeaker array and a more graceful degradation away from the center.

Jessica Camilleri, Neofytos Kaplanis, Enzo De Sena (2019)Evaluation of Car Cabin Acoustics Using Auralisation over Headphones Audio Engineering Society

The auralization schemes in the domain of automotive audio have primarily utilized dummy head recordings in the past. Recently, spatial reproduction allowed the auralization of cabin acoustics over large loudspeaker arrays. Yet no direct comparisons between those methods exist. In this study, the efficacy of headphone presentation is explored in this context. Six acoustical conditions were presented over headphones to experienced assessors (n=23), who were asked to compare them over six elicited perceptual attributes. In 24 out of 36 cases, the results indicate an agreement between headphone- and loudspeaker-based auralisation of identical stimuli sets. It is concluded that, when compared to loudspeakers-based rendering, headphones-based rendering reveals similar judgment on timbral attributes, while certain spatial attributes should be assessed with caution.

Enzo De Sena, Huseyin Hacihabiboglu, Zoran Cvetkovic (2011)A generalized design method for directivity patterns of spherical microphone arrays, In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)5946344pp. 125-128 IEEE

DOI: 10.1109/ICASSP.2011.5946344

Spherical microphone arrays provide a flexible solution to obtaining higher-order directivity patterns, which are useful in audio recording and reproduction. A general systematic approach to the design of directivity patterns for spherical microphone arrays is introduced in this paper. The directivity patterns are obtained by optimizing a cost function which is a convex combination of a front-back energy ratio and a smoothness term. Most of the standard directivity patterns i.e. omnidirectional, cardioid, subcardioid, hypercardioid and supercardioid are particular solutions of this optimization problem with specific values of two free parameters: the angle of the frontal sector, and the convex combination factor. By varying these two parameters, more general solutions of practical use are obtained.

Enzo De Sena, Huseyin Haciihabiboglu, Zoran Cvetkovic, Julius O. Smith (2015)Efficient Synthesis of Room Acoustics via Scattering Delay Networks, In: IEEE/ACM transactions on audio, speech, and language processing23(9)7113826pp. 1478-1492 IEEE

DOI: 10.1109/TASLP.2015.2438547

An acoustic reverberator consisting of a network of delay lines connected via scattering junctions is proposed. All parameters of the reverberator are derived from physical properties of the enclosure it simulates. It allows for simulation of unequal and frequency-dependent wall absorption, as well as directional sources and microphones. The reverberator renders the first-order reflections exactly, while making progressively coarser approximations of higher-order reflections. The rate of energy decay is close to that obtained with the image method (IM) and consistent with the predictions of Sabine and Eyring equations. The time evolution of the normalized echo density, which was previously shown to be correlated with the perceived texture of reverberation, is also close to that of the IM. However, its computational complexity is one to two orders of magnitude lower, comparable to the computational complexity of a feedback delay network and its memory requirements are negligible.

Enzo De Sena, Niccolo Antonello, Marc Moonen, Toon van Waterschoot (2015)On the Modeling of Rectangular Geometries in Room Acoustic Simulations, In: IEEE/ACM transactions on audio, speech, and language processing23(4)774pp. 774-786 IEEE

DOI: 10.1109/TASLP.2015.2405476

This paper is concerned with an acoustical phenomenon called sweeping echo, which manifests itself in a room impulse response as a distinctive, continuous pitch increase. In this paper, it is shown that sweeping echoes are present (although to greatly varying degrees) in all perfectly rectangular rooms. The theoretical analysis is based on the rigid-wall image solution of the wave equation. Sweeping echoes are found to be caused by the orderly time-alignment of high-order reflections arriving from directions close to the three axial directions. While sweeping echoes have been previously observed in real rooms with a geometry very similar to the rectangular model (e.g., a squash court), they are not perceived in commonly encountered rooms. Room acoustic simulators such as the image method (IM) and finite difference time-domain (FDTD) correctly predict the presence of this phenomenon, which means that rectangular geometries should be used with caution when the objective is to model commonly encountered rooms. Small out-of-square asymmetries in the room geometry are shown to reduce the phenomenon significantly. Randomization of the image sources' position is shown to remove sweeping echoes without the need to model an asymmetrical geometry explicitly. Finally, the performance of three speech and audio processing algorithms is shown to be sensitive to strong sweeping echoes, thus highlighting the need to avoid their occurrence.

Gustavo Marfia, Giovanni Pau, Enzo De Sena, Eugenio Giordano, Mario Gerla (2007)Evaluating Vehicle Network Strategies for Downtown Portland: Opportunistic Infrastructure and the Importance of Realistic Mobility Models, In: MOBIOPP'07 - PROCEEDINGS OF THE FIRST INTERNATIONAL MOBISYS WORKSHOP ON MOBILE OPPORTUNISTIC NETWORKINGpp. 47-51 Assoc Computing Machinery

DOI: 10.1145/1247694.1247704

In an urban environment, vehicles can opportunistically exploit infrastructure through open Access Points (APs) to efficiently communicate with other vehicles. This is to avoid long wireless ad hoc paths, and to alleviate congestion in the wireless grid. Analytic and simulation models are used to optimize the communications and networking strategies. For realistic results, one important challenge is the accurate representation of traffic mobility patterns. In this paper we introduce realistic vehicular mobility traces of downtown Portland, Oregon, obtained from extremely detailed large scale traffic simulations performed at the Los Alamos National Laboratories (LANL). To the best of our knowledge, these are among the most accurate synthetic motion traces available for study, with the exception of actual car trace measurements. The new mobility model is used to evaluate AODV [1] in flat and opportunistic infrastructure routing. To assess the importance of a realistic mobility model for this evaluation, we compare these results with those obtained with CORSIM [2] traces. The paper makes the following contributions: (a) introduction of efficient, opportunistic strategies for extending the AP infrastructure to use vehicle to vehicle paths, and (b) assessment of different mobility models - CORSIM traces and LANL's realistic vehicular traces - in the modeling of different routing strategies.

Gustavo Marfia, Giovanni Pau, Eugenio Giordano, Enzo De Sena, Mario Gerla (2007)VANET: On mobility scenarios and urban infrastructure. A case study, In: 2007 MOBILE NETWORKING FOR VEHICULAR ENVIRONMENTS4300800pp. 31-36 IEEE

DOI: 10.1109/MOVE.2007.4300800

In [1] we show how vehicles can opportunistically exploit infrastructure through open Access Points (APs) to efficiently communicate with other vehicles. We also highlight the importance of the use of a correct mobility model, since the advantages that may derive from the use of an infrastructure may not be appreciated because of a lack of accuracy. We continue our study based on realistic vehicular mobility traces of downtown Portland, Oregon, obtained from extremely detailed large scale traffic simulations performed at the Los Alamos National Laboratories (LANL). This mobility model is used to evaluate both flat and opportunistic infrastructure routing. We here build upon [1] and extend that work to: (a) assess the impact of a range of mobility models on network performance and; (b) discuss the performance trend we may expect during the day, as urban mobility patterns change. We here compare results obtained with CORSIM [2] traces and Random Waypoint (RWP) [3] to the results obtained with realistic mobility traces.

Enzo De Sena, Huseyin Hacihabiboglu, Zoran Cvetkovic (2012)On the Design and Implementation of Higher Order Differential Microphones, In: IEEE transactions on audio, speech, and language processing20(1)5872011pp. 162-174 IEEE

DOI: 10.1109/TASL.2011.2159204

A novel systematic approach to the design of directivity patterns of higher order differential microphones is proposed. The directivity patterns are obtained by optimizing a cost function which is a convex combination of a front-back energy ratio and uniformity within a frontal sector of interest. Most of the standard directivity patterns-omnidirectional, cardioid, subcardioid, hypercardioid, supercardioid-are particular solutions of this optimization problem with specific values of two free parameters: the angular width of the frontal sector and the convex combination factor. More general solutions of practical use are obtained by varying these two parameters. Many of these optimal directivity patterns are trigonometric polynomials with complex roots. A new differential array structure that enables the implementation of general higher order directivity patterns, with complex or real roots, is then proposed. The effectiveness of the proposed design framework and the implementation structure are illustrated by design examples, simulations, and measurements.

Timucin Berk Atalay, Zuhre Su Gul, Enzo De Sena, Zoran Cvetkovic, Hüseyin Hacıhabiboğlu (2022)Scattering Delay Network Simulator of Coupled Volume Acoustics

IEEEArtificial reverberators provide a computationally viable alternative to full-scale room acoustics simulation methods for deployment in interactive, immersive systems. Scattering delay network (SDN) is an artificial reverberator that allows direct parametric control over the geometry of a simulated cuboid enclosure as well as the directional characteristics of the simulated sound sources and microphones. This paper extends the concept of SDN reverberators to multiple enclosures coupled via an aperture. The extension allows independent control of the acoustical properties of the coupled enclosures and the size of the connecting aperture. The transfer function of the coupled-volume SDN system is derived. The effectiveness of the proposed method is evaluated in terms of rendered energy decay curves in comparison to full-scale ray-tracing models and scale model measurements.

Giacomo Vairetti, Enzo De Sena, Michael Catrysse, Soren Holdt Jensen, Marc Moonen, Toon van Waterschoot (2016)MULTICHANNEL IDENTIFICATION OF ROOM ACOUSTIC SYSTEMS WITH ADAPTIVE FILTERS BASED ON ORTHONORMAL BASIS FUNCTIONS, In: 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS2016-7471628pp. 16-20 IEEE

DOI: 10.1109/ICASSP.2016.7471628

Many acoustic signal enhancement applications require adaptive filters with a long impulse response, but with a small number of filter parameters. Fixed-poles infinite impulse response (IIR) adaptive filters based on orthonormal basis functions (OBFs) present advantages over finite impulse response filters and other IIR filters, assuring stability and fast global convergence in the adaptation of the filter parameters. A scalable algorithm is introduced for the estimation of the poles of an adaptive OBF filter from multichannel input-output data. The set of poles, common to all the acoustic channels considered, is estimated in parallel to the adaptation of the linear filter parameters. It will be shown that the result of the identification with common poles is quite robust to variations in the room transfer function, suggesting the possibility that poles may be kept fixed after estimation.

H. Hacihabiboglu, E. De Sena, Z. Cvetkovic (2011)Frequency-Domain Scattering Delay Networks for Simulating Room Acoustics in Virtual Environments, In: 2011 Seventh International Conference on Signal Image Technology & Internet-Based Systems6120647pp. 180-187 IEEE

DOI: 10.1109/SITIS.2011.41

Modelling, simulation and auralisation of room acoustics plays an important role in computer games and virtual reality applications by increasing the level of realism. Accurate simulation of room acoustics is a computationally costly process which is often substituted with artificial reverberators that provide a computationally simpler alternative. However, such systems lack the accuracy and are not in general able to accurately simulate important aspects of room acoustics such as early reflections, source/microphone directivity, and frequency-dependent absorption. A new type of interactive and scalable room simulator named the scattering delay network (SDN) was recently proposed by the authors. A frequency-domain analysis and implementation of that simulator is presented in this paper. Numerical simulation examples which demonstrate the utility of the proposed system are provided.

Enzo De Sena, Zoran Cvetkovic (2013)A COMPUTATIONAL MODEL FOR THE ESTIMATION OF LOCALISATION UNCERTAINTY, In: 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)6637675pp. 388-392 IEEE

DOI: 10.1109/ICASSP.2013.6637675

A computational model for prediction of localisation uncertainty of phantom auditory sources is proposed. The interaural level and time difference pairs due to point sources in free field are used as a reference. The mismatch between these "natural" pairs and interaural time and level difference pairs elicited by phantom sources is quantified by means of the 0.5-norm distance, which is justified on psychoacoustic grounds. The model is validated by results of subjective listening tests, achieving a high level of correlation with experimental data.

Stojan Djordjević, Hüseyin Hacıhabiboğlu, Zoran Cvetkovic, Enzo De Sena (2020)Evaluation of the Perceived Naturalness of Artificial Reverberation Algorithms

Listening tests were carried out using a modified MUSHRA method to compare the perceived naturalness of reverberation generated using scattering delay networks (SDNs), feedback delay networks (FDNs), CATT- Acoustic modelling, and convolution with recorded room impulse responses. The difference in naturalness ratings achieved by reverberation generated using FDNs and SDNs was statistically significant, with the mean rating being 12% higher for SDN stimuli than for FDN stimuli. It was also found that CATT-Acoustic models which had been simplified to a bare rectangular room received lower ratings than models that included furniture or irregular room shaping, suggesting that the scattering and mixing effects of irregularities cause improvements in perceived naturalness of the generated reverberation.

P.J. Dawson, E. De Sena, P. A. Naylor (2018)An acoustic image-source characterisation of surface profiles, In: Proceedings 2018 26th European Signal Processing Conference (EUSIPCO)pp. pp 2130-2134 IEEE

DOI: 10.23919/EUSIPCO.2018.8553206

The image-source method models the specular reflection from a plane by means of a secondary source positioned at the source’s reflected image. The method has been widely used in acoustics to model the reverberant field of rectangular rooms, but can also be used for general-shaped rooms and nonflat reflectors. This paper explores the relationship between the physical properties of a non-flat reflector and the statistical properties of the associated cloud of image-sources. It is shown here that the standard deviation of the image-sources is strongly correlated with the ratio between depth and width of the reflector’s spatial features.

G Vairetti, Enzo De Sena, M Catrysse, SH Jensen, M Moonen, T van Waterschoot (2017)A Scalable Algorithm for Physically Motivated and Sparse Approximation of Room Impulse Responses with Orthonormal Basis Functions, In: IEEE/ACM Trans. Audio, Speech and Language Processing25(7)pp. 1547-1561 IEEE

DOI: 10.1109/TASLP.2017.2700940

Parametric modeling of room acoustics aims at representing room transfer functions (RTFs) by means of digital filters and finds application in many acoustic signal enhancement algorithms. In previous work by other authors, the use of orthonormal basis functions (OBFs) for modeling room acoustics has been proposed. Some advantages of OBF models over all-zero and pole-zero models have been illustrated, mainly focusing on the fact that OBF models typically require less model parameters to provide the same model accuracy. In this paper, it is shown that the orthogonality of the OBF model brings several additional advantages, which can be exploited if a suitable algorithm for identifying the OBF model parameters is applied. Specifically, the orthogonality of OBF models does not only lead to improved model efficiency (as pointed out in previous work), but also leads to improved model scalability and model stability. Its appealing scalability property derives from a previously unexplored interpretation of the OBF model as an approximation to a solution of the inhomogeneous acoustic wave equation. Following this interpretation, a novel identification algorithm is proposed that takes advantage of the OBF model orthogonality to deliver efficient, scalable and stable OBF model estimates, which is not necessarily the case for nonlinear estimation techniques that are normally applied.

LESLIE GASTON-BIRD, RUSSELL DAVID MASON, ENZO DE SENA (2021)Inclusivity in Immersive Audio: Current Participation and Barriers to Entry

Media and entertainment companies have embraced immersive audio technology for cinema, television, games, and music. Meanwhile, in recent years there has been a rise in the number of organizations welcoming underrepresented groups to the field of audio. However, although some disciplines such as music recording are seeing an increase in participation, others are not keeping pace. Immersive and spatial audio are disciplines in which diversity is measurably lacking. Audio based mixed-gender social media groups are comprised of less than 10% women and minorities, and groups dedicated to immersive audio exhibit poorer representation. Barriers to entry are societal as well as economic; however, outreach, networking opportunities, mentoring, and affordable education are remedies have been shown to be effective for related industries and should be adopted by the immersive audio industry.

Stephan Weiss, Sebastian J. Schlecht, Orchisama Das, Enzo De Sena (2023)Polynomial Procrustes Problem: Paraunitary Approximation of Matrices of Analytic Functions

DOI: 10.23919/EUSIPCO58844.2023.10289958

In the narrowband case, the best least squares approximation of a matrix by a unitary one is given by the Procrustes problem. In this paper, we expand this idea to matrices of analytic functions, and characterise a broadband equivalent to the narrowband case: the polynomial Procrustes problem. Its solution is based on an analytic singular value decomposition, and for the case of spectrally majorised, distinct singular values, we demonstrate the application of a suitable algorithm to three problems — time delay estimation, paraunitary matrix completion , and general paraunitary approximations — in simulations.

Timuçin B. Atalay, Zühre Sü Gül, Enzo De Sena, Zoran Cvetkovic, Hüseyin Hacıhabiboğlu (2021)Simulation of coupled volume acoustics with coupled volume scattering delay network models, In: The Journal of the Acoustical Society of America149(4)pp. A117-A117

DOI: 10.1121/10.0004699

Simulation of the acoustics of coupled rooms is an important problem not only in architectural acoustics but also in immersive audio applications that require acoustic simulation at interactive rates. Requirements for such applications are less demanding for accuracy but more demanding for computational cost. Scattering delay network (SDN) is a real-time, interactive room acoustics simulator for cuboid rooms. SDN affords an exact simulation of first-order early reflections, a gracefully degrading simulation of second and higher-order specular reflections and an accurate simulation of the statistical properties of the late reverberation. We propose coupled-volume SDN (CV-SDN) as an extension of the SDN model to simulate acoustics of coupled volumes. The proposed model retains the desirable characteristics of the original SDN model while allowing the simulation of double-slope decays with direct control over the simulated aperture size. The double-slope characteristics of room impulse responses simulated with CV-SDN agree well with those of measured impulse responses from a scale model and state-of-the-art room acoustics simulation software.

Benjamin Burnett, Annika Neidhardt, Zoran Cvetkovic, Huseyin Hacihabiboglu, Enzo De Sena (2023)User Expectation of Room Acoustic Parameters in Virtual Reality Environments, In: 2023 Immersive and 3D Audio: from Architecture to Automotive (I3DA)pp. 1-10 IEEE

DOI: 10.1109/I3DA57090.2023.10289314

This paper explores how visual attributes of a VR scene affect user expectations of room reverberation. A psychoacoustic experiment was run wherein subjects wore a VR headset and adjusted two unlabelled sliders controlling the reverberation time (T60) and the acoustic room size until the reverberant response was closest to their expectation of how the room they were seeing should sound. Different visual characteristics, in particular, room type and size, surface material, and furnishing were modified to determine how these might affect their expectations of the reverberant response. Results showed that visual room size had a significant effect on both the expected T60, in agreement with previous literature, and on the expected acoustic room size. Both relations seem to be well-described by a simple sublinear power law model, which could be used, for instance, to design reverberation time (T60) and acoustic room size values that align well with listeners' expectation for a given visual room volume. Differences in visual surface materials were found to have a statistically significant effect on the expected T60. The level of visual furnishing, on the other hand, only had a marginally significant effect on the expected T60. The results also indicate considerable subjective differences in individual expectations.

Leny Vinceslas, Matteo Scerbo, Huseyin Hacihabiboglu, Zoran Cvetkovic, Enzo De Sena (2023)Low-Complexity Higher Order Scattering Delay Networks, In: 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)pp. 1-5 IEEE

DOI: 10.1109/WASPAA58266.2023.10248064

Room acoustic models are used in immersive applications to create convincing virtual environments. The computational cost of full-scale physical models remains prohibitive for most real-time auralisation systems. Scattering delay network (SDN) is a perceptually-motivated and computationally efficient modelling concept that renders the line of sight and first-order reflections accurately, whilst approximating higher order reflections with progressively coarser spatio-temporal resolution. This paper develops a generalised SDN framework capable of rendering reflections up to a selected order accurately. The generalisation requires two issues to be considered: i) spherical spreading attenuation, and ii) placement of scattering nodes, both of which are addressed here. Simulations demonstrate that the proposed model has a similar energy behaviour to that of the image source method (ISM) and improves over standard SDN in terms of normalised echo density and in terms of accuracy of delay, attenuation and direction of early reflections, whilst maintaining the same complexity as standard SDN.

Matteo Scerbo, Sebastian J Schlecht, Randall Ali, Lauri Savioja, Enzo De Sena (2024)A Common-Slopes Late Reverberation Model Based on Acoustic Radiance Transfer, In: Proceedings of the 27th International Conference on Digital Audio Effects (DAFx24)

In rooms with complex geometry and uneven distribution of energy losses, late reverberation depends on the positions of sound sources and listeners. More precisely, the decay of energy is char-acterised by a sum of exponential curves with position-dependent amplitudes and position-independent decay rates (hence the name common slopes). The amplitude of different energy decay components is a particularly important perceptual aspect that requires efficient modeling in applications such as virtual reality and video games. Acoustic Radiance Transfer (ART) is a room acoustics model focused on late reverberation, which uses a pre-computed acoustic transfer matrix based on the room geometry and materials , and allows interactive changes to source and listener positions. In this work, we present an efficient common-slopes approximation of the ART model. Our technique extracts common slopes from ART using modal decomposition, retaining only the non-oscillating energy modes. Leveraging the structure of ART, changes to the positions of sound sources and listeners only require minimal processing. Experimental results show that even very few slopes are sufficient to capture the positional dependency of late reverberation, reducing model complexity substantially.

Orchisama Das, Enzo De Sena (2023)The Complex Image Method for Simulating Wave Scattering in Room Acoustics, In: 2023 Immersive and 3D Audio: from Architecture to Automotive (I3DA)pp. 1-7 IEEE

DOI: 10.1109/I3DA57090.2023.10289275

The Image Method (IM) has become increasingly popular for small-room acoustics simulations. While it gives an exact solution of the wave equation in shoebox-rooms with rigid walls, the assumption of rigidity is not valid in real rooms. Based on spherical wave reflection from an infinite wall, several authors have independently developed what is known as the Complex Image Method (CIM). However, its adoption in room acoustics has been rare, although it has been shown to give performance equivalent to the boundary element method in shoebox rooms with soft-walls. In this paper, we review the theory behind CIM and provide a Python implementation to study directional scattering patterns as a function of wall impedance. For a highly symmetrical room, room impulse responses simulated with CIM are shown to have less so-called "sweeping echoes" than those simulated by IM.

G Vairetti, N Kaplanis, Enzo De Sena, SH Jonsen, S Bech, M Moonen, T van Waterschoot (2017)The Subwoofer Room Impulse Response (SUBRIR) database, In: Journal of the Audio Engineering Society65(5)pp. 389-401 Audio Engineering Society

This report introduces a new database of room impulse responses (RIRs) measured in an empty rectangular room using subwoofers as sound sources. The purpose of this database, publicly available for download, is to provide acoustic measurements within the frequency region of modal resonances. Performing acoustic measurements at low frequencies presents many difficulties, mainly related to ambient noise and to unavoidable nonlinearities of the subwoofer. In this report, it is shown that these issues can be addressed and partially solved by means of the exponential sine-sweep technique and a careful calibration of the measurement equipment. A procedure for estimating the reverberation time at very low frequencies is proposed, which uses a cosine-modulated filterbank and an approximation of the RIRs using parametric models in order to reduce problems related to low signal-to-noise ratio and to the length of typical band-pass filter responses.

Randall Ali, Thomas Dietzen, Matteo Scerbo, Leny Vinceslas, Toon van Waterschoot, Enzo De Sena (2023)Relating Wave-based and Geometric Acoustics using a Stationary Phase Approximation European Acoustics Association

Room acoustic simulation using physically motivated sound propagation models are typically separated into wave-based methods and geometric methods. While each of these methods has been extensively studied, the question on when to transition from a wave-based to a geometric method still remains somewhat unclear. Towards building greater understanding of the links between wavebased and geometric methods, this paper investigates the transition question by using the method of stationary phase. As a starting point, we consider an elementary scenario with a geometrically interpretable analytic solution, namely that of an inﬁnite rigid boundary mirroring a single monopole sound source, and apply the stationary phase approximation (SPA) to the wave-based boundary integral equation (BIE). The results of the analysis demonstrate how net boundary contributions give rise to the geometric interpretation offered by the SPA and provide the conditions when the SPA is asymptotically equal to the analytical solution in this elementary scenario. Although the results are unsurprising and intuitive, the insights gained from this analysis pave the way for investigating relations between wave-based and geometric methods in more complicated room acoustics scenarios.

Amal Emthyas, Sebastià V. Amengual Garí, Enzo De Sena (2024)Binaural Room Transfer Function Interpolation Via System Inversion, In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024) - Proceedingspp. 616-620 Institute of Electrical and Electronics Engineers (IEEE)

DOI: 10.1109/ICASSP48485.2024.10448457

This paper is concerned with the spatial interpolation of Binaural Room Transfer Functions (BRTFs). The proposed method is a binau-ral extension of Room Transfer Function (RTF) interpolation methods framed as inverse problems, and is based on a parametric representation of the sound field using either Plane Waves (PWs) or Equivalent Sources (ESs). Once the parameters are obtained via system inversion, the BRTFs can be synthesised at any other position. Four combinations of acoustic models (PW or ES) and regular-isation functions (Tikhonov and L1-norm) are tested. The proposed method is shown to have a good performance below 1 kHz, comparable to standard pressure-based RTF interpolation. Using PWs with l1-norm regularisation produces optimal solutions, resulting in a Normalised Mean Squared Error (NMSE) of −15 dB at 700 Hz using 25 BRTF measurements. The important case where the listener's Head-Related Transfer Function (HRTF) is unknown is also tested, revealing that using a different HRTF for inversion and synthesis did not yield a significant drop in performance. It is hypothesised that this is due to the small variations between individual HRTF magnitudes within the operating frequency range of this class of models.

D Pelegrın-Garcıa, Enzo De Sena, T van Waterschoot, M Rychtarikova, C Glorieux (2018)Localization of a Virtual Wall by Means of Active Echolocation by Untrained Sighted Persons, In: Applied Acoustics139pp. 82-92 Elsevier

DOI: 10.1016/j.apacoust.2018.04.018

The active sensing and perception of the environment by auditory means is typically known as echolocation and it can be acquired by humans, who can profit from it in the absence of vision. We investigated the ability of twentyone untrained sighted participants to use echolocation with self-generated oral clicks for aligning themselves within the horizontal plane towards a virtual wall, emulated with an acoustic virtual reality system, at distances between 1 and 32 m, in the absence of background noise and reverberation. Participants were able to detect the virtual wall on 61% of the trials, although with large di↵erences across individuals and distances. The use of louder and shorter clicks led to an increased performance, whereas the use of clicks with lower frequency content allowed for the use of interaural time di↵erences to improve the accuracy of reflection localization at very long distances. The distance of 2 m was the most difficult to detect and localize, whereas the furthest distances of 16 and 32 m were the easiest ones. Thus, echolocation may be used e↵ectively to identify large distant environmental landmarks such as buildings.

Thomas Dietzen, Enzo De Sena, Toon van Waterschoot Low-Complexity Steered Response Power Mapping based on Low-Rank and Sparse Interpolation

DOI: 10.48550/arxiv.2306.08514

For acoustic source localization, a map of the acoustic scene as obtained by the steered response power (SRP) approach can be employed. In SRP, the frequency-weighted output power of a beamformer steered towards a set of candidate locations is obtained from generalized cross-correlations (GCCs). Due to the dense grid of candidate locations, conventional SRP exhibits a high computational complexity. While a number of low-complexity SRP-based localization approaches using non-exhaustive spatial search have been proposed, few studies aim to construct a full SRP map at reduced computational cost. In this paper, we propose two scalable approaches to this problem. Expressing the SRP map as a matrix transform of frequency-domain GCCs, we decompose the SRP matrix into a sampling matrix and an interpolation matrix. While the sampling operation can be implemented efficiently by the inverse fast Fourier transform (iFFT), we propose to use optimal low-rank or sparse approximations of the interpolation matrix for further complexity reduction. The proposed approaches, refered to as sampling + low-rank interpolation-based SRP (SLRI-SRP) and sampling + sparse interpolation-based SRP (SSPI-SRP), are evaluated in a near-field (NF) and a far-field (FF) localization scenario and compared to a state-of-the-art low-rank-based SRP approach (LR-SRP). The results indicate that SSPI-SRP outperforms both SLRI-SRP and LR-SRP over a wide complexity range in terms of approximation error and localization accuracy, achieving a complexity reduction of two to three orders of magnitude as compared to conventional SRP. A MATLAB implementation is available online.

Niccolo Antonello, Enzo De Sena, Marc Moonen, Patrick A. Naylor, Toon van Waterschoot (2018)Joint source localization and dereverberation by sound field interpolation using sparse regularization, In: Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing2018pp. 6892-6896 Institute of Electrical and Electronics Engineers (IEEE)

DOI: 10.1109/ICASSP.2018.8462451

In this paper, source localization and dereverberation are formulated jointly as an inverse problem. The inverse problem consists in the interpolation of the sound field measured by a set of microphones by matching the recorded sound pressure with that of a particular acoustic model. This model is based on a collection of equivalent sources creating either spherical or plane waves. In order to achieve meaningful results, spatial, spatio-temporal and spatio-spectral sparsity can be promoted in the signals originating from the equivalent sources. The inverse problem consists of a large-scale optimization problem that is solved using a first order matrix-free optimization algorithm. It is shown that once the equivalent source signals capable of effectively interpolating the sound field are obtained, they can be readily used to localize a speech sound source in terms of Direction of Arrival (DOA) and to perform dereverberation in a highly reverberant environment.

L Lightburn, Enzo De Sena, A Moore, PA Naylor, M Brookes (2017)Improving the perceptual quality of ideal binary masked speech, In: Proceedings of ICASSP 2017 IEEE

DOI: 10.1109/ICASSP.2017.7952238

It is known that applying a time-frequency binary mask to very noisy speech can improve its intelligibility but results in poor perceptual quality. In this paper we propose a new approach to applying a binary mask that combines the intelligibility gains of conventional binary masking with the perceptual quality gains of a classical speech enhancer. The binary mask is not applied directly as a time-frequency gain as in most previous studies. Instead, the mask is used to supply prior information to a classical speech enhancer about the probability of speech presence in different time-frequency regions. Using an oracle ideal binary mask, we show that the proposed method results in a higher predicted quality than other methods of applying a binary mask whilst preserving the improvements in predicted intelligibility.

Matteo Scerbo, Orchisama Das, Patrick Friend, Enzo De Sena, Patrick Friend (2022)Higher-order scattering delay networks for artificial reverberation, In: Proceedings of the 25th International Conference on Digital Audio Effects (DAFx20in22), Vienna, Austria, September 2022 Universität für Musik und darstellende Kunst Wien

Computer simulations of room acoustics suffer from an efficiency vs accuracy trade-off, with highly accurate wave-based models being highly computationally expensive, and delay-network-based models lacking in physical accuracy. The Scattering Delay Network (SDN) is a highly efficient recursive structure that renders first order reflections exactly while approximating higher order ones. With the purpose of improving the accuracy of SDNs, in this paper, several variations on SDNs are investigated, including appropriate node placement for exact modeling of higher order reflections , redesigned scattering matrices for physically-motivated scattering, and pruned network connections for reduced computational complexity. The results of these variations are compared to state-of-the-art geometric acoustic models for different shoebox room simulations. Objective measures (Normalized Echo Densities (NEDs) and Energy Decay Curves (EDCs)) showed a close match between the proposed methods and the references. A formal listening test was carried out to evaluate differences in perceived naturalness of the synthesized Room Impulse Responses. Results show that increasing SDNs' order and adding directional scattering in a fully-connected network improves perceived naturalness, and higher-order pruned networks give similar performance at a much lower computational cost.

Joshua Mannall, Orchisama Das, Paul Calamia, ENZO DE SENA (2022)Perceptual evaluation of low-complexity diffraction models from a single edge

Geometric acoustic models have a lower computational complexity than wave-based methods due to the assumption that sound propagates as rays, however this fails to consider the wave-like properties of sound such as diffraction. Historically, tthe Biot-Tolstoy-Medwin (BTM) model and the Uniform Theory of Diffraction (UTD) have been used to augment geometric acoustic models with diffraction. Computational efficiency is essential for real-time application and recently two more efficient models, the Volumetric Diffraction and Transmission (VDaT) model and an infinite impulse response filter (IIR) approximation, were proposed to approximate these solutions. A higher-order IIR filter approximation is proposed in this paper. An experiment is carried out to evaluate the perceived naturalness of these approximations compared to the more accurate analytical solutions. Stationary and moving receivers were considered in simple geometries with a single edge. The results suggest that the higher order IIR approximation is perceptually similar to the BTM model. VDaT and the low order IIR approximation were found to be less natural in some cases. While in dynamic scenes, VDaT was found to be significantly more natural than the other models. The experiment was limited in scope by the simplicity of the scenes considered, however the results suggest the models are perceptually similar. Improvements to the higher-order IIR approximation are suggested and a recommendation is made for future perceptual evaluations.

Joshua John Mannall, Orchisama Das, Paul Calamia, Enzo De Sena (2022)Perceptual evaluation of low-complexity diffraction models from a single edge, In: Proceedings of the 2022 International Conference on Audio for Virtual and Augmented Reality 2022 August 15–17, Redmond, WA, USA Audio Engineering Society (AES)

Amal Emthyas, Annika Neidhardt, Sebastià V. Amengual Garí, Enzo De Sena (2024)Spatial interpolation and extrapolation of binaural room impulse responses via system inversion

This paper presents a method for sound field interpolation/extrapolation from a spatially sparse set of binaural room impulse responses (BRIRs). The method focuses on the direct component and early reflections, and is framed as an inverse problem seeking the weight signals of an acoustic model based on the time-domain equivalent source (TES). Once the weight signals are estimated, the (continuous) sound field can be reconstructed and BRIRs can be synthesised at any position and orientation in a source-free volume bounded by the TESs. The L1-norm, sum of L2-norm, and Tikhonov regularisation functions were tested, with L1-norm (imposing spatio-temporal sparsity) performing the best. Simulations exhibit lower normalised mean squared error (NMSE) compared to a nearest-neighbour approach, which uses the spatially closest BRIR measurement for rendering. Results show good temporal alignment of direct sound and reflections, even when a non-individualised head-related impulse response (HRIR) was used for system inversion and BRIR synthesis. The performance is also assessed using an objective measure of perceived coloration called the predicted binaural coloration (PBC) model, which reveals a good perceptual match between interpolated/extrapolated and true BRIRs.

Orchisama Das, Sebastian J. J. Schlecht, Enzo De Sena (2023)Grouped Feedback Delay Networks With Frequency-Dependent Coupling, In: IEEE/ACM transactions on audio, speech, and language processing31pp. 2004-2015 IEEE

DOI: 10.1109/TASLP.2023.3277368

Feedback Delay Networks are one of the most popular and efficient means of generating artificial reverberation. Recently, we proposed the Grouped Feedback Delay Network (GFDN), which couples multiple FDNs while maintaining system stability. The GFDN can be used to model reverberation in coupled spaces that exhibit multi-stage decay. The block feedback matrix determines the inter- and intra-group coupling. In this article, we expand on the design of the block feedback matrix to include frequency-dependent coupling among the various FDN groups. We show how paraunitary feedback matrices can be designed to emulate diffraction at the aperture connecting rooms. Several methods for the construction of nearly paraunitary matrices are investigated. The proposed method supports the efficient rendering of virtual acoustics for complex room topologies in games and XR applications.

Jessica Camilleri, Neofytos Kaplanis, Enzo De Sena (2019)Evaluation of Car Cabin Acoustics Using Auralisation over Headphones, In: Tony Tew, Duncan Williams (eds.), Proceedings 2019 AES International Conference on Immersive and Interactive Audio Audio Engineering Society

DOI: 10.17743/aesconf.2019.978-1-942220-27-5

H Hacıhabiboglu, Enzo De Sena, Z Cvetkovic, J Johnston, JO Smith (2017)Perceptual Spatial Audio Recording, Simulation, and Rendering: An overview of spatial-audio techniques based on psychoacoustics, In: IEEE Signal Processing Magazine34(3)pp. 36-54 IEEE

DOI: 10.1109/MSP.2017.2666081

Developments in immersive audio technologies have been evolving in two directions: physically-motivated and perceptually-motivated systems. Physically motivated techniques aim to reproduce a physically accurate approximation of desired sound fields by employing a very high equipment load and sophisticated computationally intensive algorithms. Perceptuallymotivated techniques, on the other hand, aim to render only the perceptually relevant aspects of the sound scene by means of modest computational and equipment load. This article presents an overview of perceptually motivated techniques, with a focus on multichannel audio recording and reproduction, audio source and reflection culling, and artificial reverberators.

Will J Cassidy, Enzo De Sena (2023)PERCEPTUAL EVALUATION AND GENRE-SPECIFIC TRAINING OF DEEP NEURAL NETWORK MODELS OF A HIGH-GAIN GUITAR AMPLIFIER

Modelling of analogue devices via deep neural networks (DNNs) has gained popularity recently, but their performance is usually measured using accuracy measures alone. This paper aims to assess the performance of DNN models of a high-gain vacuum-tube guitar amplifier using additional subjective measures, including preference and realism. Furthermore, the paper explores how the performance changes when genre-specific training data is used. In five listening tests, subjects rated models of a popular high-gain guitar amplifier, the Peavey 6505, in terms of preference, realism and perceptual accuracy. Two DNN models were used: a long short-term memory recurrent neural network (LSTM-RNN) and a WaveNet-based convolutional neural network (CNN). The LSTM-RNN model was shown to be more accurate when trained with genre-specific data, to the extent that it could not be distinguished from the real amplifier in ABX tests. Despite minor perceptual inaccuracies , subjects found all models to be as realistic as the target in MUSHRA-like experiments, and there was no evidence to suggest that the real amplifier was preferred to any of the models in a mix. Finally, it was observed that a low-gain excerpt was more difficult to emulate, and was therefore useful to reveal differences between the models.

Enzo De Sena, Zoran Cvetkovic, Huseyin Hacıhabiboglu, Marc Moonen, Toon van Waterschoot (2020)Localization Uncertainty in Time-Amplitude Stereophonic Reproduction, In: IEEE/ACM Transactions on Audio, Speech, and Language Processing IEEE

DOI: 10.1109/TASLP.2020.2975419

This paper studies the effects of inter-channel time and level differences in stereophonic reproduction on perceived localization uncertainty, which is defined as how difficult it is for a listener to tell where a sound source is located. Towards this end, a computational model of localization uncertainty is proposed first. The model calculates inter-aural time and level difference cues, and compares them to those associated to freefield point-like sources. The comparison is carried out using a particular distance functional that replicates the increased uncertainty observed experimentally with inconsistent inter-aural time and level difference cues. The model is validated by formal listening tests, achieving a Pearson correlation of 0:99. The model is then used to predict localization uncertainty for stereophonic setups and a listener in central and off-central positions. Results show that amplitude methods achieve a slightly lower localization uncertainty for a listener positioned exactly in the center of the sweet spot. As soon as the listener moves away from that position, the situation reverses, with time-amplitude methods achieving a lower localization uncertainty.

N Antonello, Enzo De Sena, M Moonen, PA Naylor, T van Waterschoot (2017)Room impulse response interpolation using a sparse spatio-temporal representation of the sound field, In: IEEE/ACM Transactions on Audio, Speech, and Language Processing25(10)pp. 1929-1941 IEEE

DOI: 10.1109/TASLP.2017.2730284

Room Impulse Responses (RIRs) are typically measured using a set of microphones and a loudspeaker. When RIRs spanning a large volume are needed, many microphone measurements must be used to spatially sample the sound field. In order to reduce the number of microphone measurements, RIRs can be spatially interpolated. In the present study, RIR interpolation is formulated as an inverse problem. This inverse problem relies on a particular acoustic model capable of representing the measurements. Two different acoustic models are compared: the plane wave decomposition model and a novel time-domain model that consists of a collection of equivalent sources creating spherical waves. These acoustic models can both approximate any reverberant sound field created by a far field sound source. In order to produce an accurate RIR interpolation, sparsity regularization is employed when solving the inverse problem. In particular, by combining different acoustic models with different sparsity promoting regularizations, spatial sparsity, spatio-spectral sparsity and spatio-temporal sparsity are compared. The inverse problem is solved using a matrix-free large scale optimization algorithm. Simulations show that the best RIR interpolation is obtained when combining the novel time-domain acoustic model with the spatio-temporal sparsity regularization, outperforming the results of the plane wave decomposition model even when far fewer microphone measurements are available.

Ege Erdem, Enzo De Sena, Huseyin Hacihabiboglu, Zoran Cvetkovic (2019)Perceptual Soundfield Reconstruction in Three Dimensions via Sound Field Extrapolation, In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)pp. 8023-8027 IEEE

DOI: 10.1109/ICASSP.2019.8682726

Perceptual sound field reconstruction (PSR) is a spatial audio recording and reproduction method based on the application of stereophonic panning laws in microphone array design. PSR allows rendering a perceptually veridical and stable auditory perspective in the horizontal plane of the listener, and involves recording using near-coincident microphone arrays. This paper extends the PSR concept to three dimensions using sound field extrapolation carried out in the spherical-harmonic domain. Sound field rendering is performed using a two-level loudspeaker rig. An active-intensity-based analysis of the rendered sound field shows that the proposed approach can render direction of monochromatic plane waves accurately.

Niccolò Antonello, Enzo De Sena, Marc Moonen, Patrick A. Naylor, Toon van Waterschoot (2019)Joint acoustic localization and dereverberation through plane wave decomposition and sparse regularization, In: IEEE Transactions on Audio, Speech and Language Processing27(12)pp. 1893-1905 IEEE

DOI: 10.1109/TASLP.2019.2933047

Acoustic source localization and dereverberation are formulated jointly as an inverse problem. The inverse problem consists of the approximation of the sound field measured by a set of microphones. The recorded sound pressure is matched with that of a particular acoustic model based on a collection of plane waves arriving from different directions at the microphone positions. In order to achieve meaningful results, spatial and spatio-spectral sparsity can be promoted in the weight signals controlling the plane waves. The large-scale optimization problem resulting from the inverse problem formulation is solved using a first order optimization algorithm combined with a weighted overlap-add procedure. It is shown that once the weight signals capable of effectively approximating the sound field are obtained, they can be readily used to localize a moving sound source in terms of direction of arrival (DOA) and to perform dereverberation in a highly reverberant environment. Results from simulation experiments and from real measurements show that the proposed algorithm is robust against both localized and diffuse noise exhibiting a noise reduction in the dereverberated signals.

Giacomo Vairetti, Enzo De Sena, Michael Catrysse, Soren Holdt Jensen, Marc Moonen, Toon Van Waterschoot (2018)An Automatic Design Procedure for Low-order IIR Parametric Equalizers, In: Journal of the Audio Engineering Society66(11)pp. 935-952 Audio Enginering Society

DOI: 10.17743/jaes.2018.0049

Parametric equalization of an acoustic system aims to compensate for the deviations of its response from a desired target response using parametric digital filters. An optimization procedure is presented for the automatic design of a low-order equalizer using parametric infinite impulse response (IIR) filters, specifically second-order peaking filters and first-order shelving filters. The proposed procedure minimizes the sum of square errors (SSE) between the system and the target complex frequency responses, instead of the commonly used difference in magnitudes, and exploits a previously unexplored orthogonality property of one particular type of parametric filter. This brings a series of advantages over the state-of-the-art procedures, such as an improved mathematical tractability of the equalization problem, with the possibility of computing analytical expressions for the gradients, an improved initialization of the parameters, including the global gain of the equalizer, the incorporation of shelving filters in the optimization procedure, and a more accentuated focus on the equalization of the more perceptually relevant frequency peaks. Examples of loudspeaker and room equalization are provided, as well as a note about extending the procedure to multi-point equalization and transfer function modeling.

Enzo De Sena, Mike Brookes, Patrick A. Naylor, Toon van Waterschoot (2017)Localization experiments with reporting by head orientation: statistical framework and case study, In: Journal of the Audio Engineering Society65(12)pp. 982-996 Audio Engineering Society

DOI: 10.17743/jaes.2017.0038

This research focuses on sound localization experiments in which subjects report the position of an active sound source by turning toward it. A statistical framework for the analysis of the data is presented together with a case study from a large-scale listening experiment. The statistical framework is based on a model that is robust to the presence of front/back confusions and random errors. Closed-form natural estimators are derived, and one-sample and two-sample statistical tests are described. The framework is used to analyze the data of an auralized experiment undertaken by nearly nine hundred subjects. The objective was to explore localization performance in the horizontal plane in an informal setting and with little training, which are conditions that are similar to those typically encountered in consumer applications of binaural audio. Results show that responses had a rightward bias and that speech was harder to localize than percussion sounds, which are results consistent with the literature. Results also show that it was harder to localize sound in a simulated room with a high ceiling despite having a higher direct-to-reverberant ratio than other simulated rooms.

Matteo Scerbo, Lauri Savioja, Enzo De Sena (2024)Room Acoustic Rendering Networks With Control of Scattering and Early Reflections, In: IEEE/ACM transactions on audio, speech, and language processing32pp. 3745-3758 IEEE

DOI: 10.1109/TASLP.2024.3436702

Room acoustic synthesis can be used in virtual reality (VR), augmented reality (AR) and gaming applications to enhance listeners' sense of immersion, realism and externali-sation. A common approach is to use geometrical acoustics (GA) models to compute impulse responses at interactive speed, and fast convolution methods to apply said responses in real time. Alternatively , delay-network-based models are capable of modeling certain aspects of room acoustics, but with a significantly lower computational cost. In order to bridge the gap between these classes of models, recent work introduced delay network designs that approximate Acoustic Radiance Transfer (ART), a GA model that simulates the transfer of acoustic energy between discrete surface patches in an environment. This paper presents two key extensions of such designs. The first extension involves a new physically-based and stability-preserving design of the feedback matrices, enabling more accurate control of scattering and, more in general, of late reverberation properties. The second extension allows an arbitrary number of early reflections to be modeled with high accuracy, meaning the network can be scaled at will between computational cost and early reverberation precision. The proposed extensions are compared to the baseline ART-approximating delay network as well as two reference GA models. The evaluation is based on objective measures of perceptually-relevant features, including frequency-dependent reverberation times, echo density build-up, and early decay time. Results show how the proposed extensions result in a significant improvement over the baseline model, especially for the case of non-convex geometries or the case of unevenly distributed wall absorption, both scenarios of broad practical interest.

Additional publications

E. De Sena, H. Hacıhabiboğlu, Z. Cvetković, and J. O. Smith III "Efficient Synthesis of Room Acoustics via Scattering Delay Networks," IEEE/ACM Trans. on Audio Speech Language Process., vol. 23, no. 9, pp 1478 - 1492, Sept. 2015.
E. De Sena, Niccoló Antonello, Marc Moonen, and Toon van Waterschoot, "On the modeling of rectangular geometries in room acoustic simulations," IEEE/ACM Trans. on Audio Speech Language Process., vol. 23, no. 4, Apr. 2015.
E. De Sena, H. Hacıhabiboğlu, and Z. Cvetković - “Analysis and Design of Multichannel Systems for Perceptual Sound Field Reconstruction,” IEEE Trans. on Audio, Speech and Language Process., vol. 21 , no. 8, pp 1653-1665, Aug. 2013.
E. De Sena, H. Hacihabiboglu and Z. Cvetkovic - "On the design and implementation of higher-order differential microphones," IEEE Trans. on Audio, Speech and Language Process., vol. 20, no. 1, pp 162-174, Jan. 2012.
G. Vairetti, E. De Sena, M. Catrysse, S. H. Jensen, M. Moonen, and T. van Waterschoot, “Multichannel Identification of Room Acoustic Systems with Adaptive Filters based on Orthonormal Basis Functions,” IEEE Int. Conf. on Acoust. Speech and Signal Process. (ICASSP-16), Mar. 2016.
N. Antonello, E. De Sena, M. Moonen, P. A. Naylor, T. van Waterschoot, "Sound field control in a reverberant room using the Finite Difference Time Domain method" in AES 60th Int. Conf., Leuven, Belgium, Feb. 2016.
G. Vairetti, E. De Sena, M. Catrysse, S. H. Jensen, M. Moonen, T. van Waterschoot, "Room acoustic system identification using orthonormal basis function models," in AES 60th Int. Conf., Leuven, Belgium, Feb. 2016.
C. S. J. Doire, M. Brookes, P. A. Naylor, E. De Sena, T. van Waterschoot, S. H. Jensen, “Acoustic Environment Control: Implementation of a Reverberation Enhancement System,” in AES 60th Int. Conf., Leuven, Belgium, Feb. 2016.
E. De Sena, N. Kaplanis, P. A. Naylor, T. van Waterschoot, “Large-scale auralised sound localisation experiment,” in AES 60th Int. Conf., Leuven, Belgium, Feb. 2016.
G. Vairetti, E. De Sena, T. van Waterschoot, M. Moonen, M. Catrysse, N. Kaplanis and S.H. Jensen, "A Physically-motivated Parametric Model for Compact Representation of Room Impulse Responses based on Orthonormal Basis Functions," in Proc. 10th European Congress and Exposition on Noise Control Engineering Maastricht, The Netherlands, June 2015.
E. De Sena and Z. Cvetković, "A Computational Model for the Estimation of Localisation Uncertainty," IEEE Int. Conf. on Acoust. Speech and Signal Process. (ICASSP-13), May 2013, Vancouver, Canada.
H. Hacıhabiboğlu, E. De Sena, and Z. Cvetković, "Frequency-Domain Scattering Delay Networks for Simulating Room Acoustics in Virtual Environments " in proceedings of the 7th ACM/IEEE International Conference on Signal Image Tech. and Internet-based Syst. (SITIS'11), Dijon, France, November 2011.
E. De Sena, H. Hacıhabiboğlu and Z. Cvetković - “A Generalized Design Method for Directivity Patterns of Spherical Microphone Arrays”, in proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP-11), May 2011, Prague, Czech Republic.
E. De Sena, H. Hacıhabiboğlu and Z. Cvetković - “Scattering Delay Network: an Interactive Reverberator for Computer Games”, in AES 41st Int. Conf., February 2011, London, UK.
H. Hacıhabiboğlu, E. De Sena and Z. Cvetković - “Design of a Circular Microphone Array for Panoramic Audio Recording and Reproduction: Microphone Directivity”, presented at the 128th Audio Engineering Society Convention, May 2010, London, UK.
E. De Sena, H. Hacıhabiboğlu and Z. Cvetković - “Design of a Circular Microphone Array for Panoramic Audio Recording and Reproduction: Array Radius”, presented at the 128th Audio Engineering Society Convention, May 2010, London, UK.
E. De Sena, H. Hacıhabiboğlu and Z. Cvetković - “Perceptual Evaluation of a Circularly Symmetric Microphone Array for Panoramic Recording of Audio”, in proceedings of the 2nd Int. Symposium on Ambisonics and Spherical Acoustics, May 2010, Paris, France.
E. Giordano, E. De Sena, G. Pau and M. Gerla - “Vergilius: A Scenario Generator for Vanet”, in proceedings of IEEE 71st Vehicular Technology Conference (VTC), May 2010, Taipei, Taiwan.
G. Marfia, G. Pau, E. Giordano, E. De Sena, M. Gerla – “VANET: On Mobility Scenarios and Urban Infrastructure. A Case Study”, in proceedings of MOVE Workshop in conjunction with IEEE INFOCOM 2007, May 2007, Alaska, USA.
G. Marfia, G. Pau, E. De Sena, E. Giordano, M. Gerla – “Evaluating Vehicle Network Strategies for Downtown Portland: opportunistic infrastructure and the importance of realistic mobility models”, in proceedings of the First International Workshop on Mobile Opportunistic Networking ACM/SIGMOBILE MobiOpp 2007, in conjunction with MobiSys 2007, June 2007, Puerto Rico, USA.
E. De Sena, H. Hacıhabiboğlu, and Z. Cvetković, inventors; King's College London, assignee, "Electronic Device with Digital Reverberator and Method", USPTO Patent n. 8,908,875, filed 2/2/2012, granted 09/12/2014.
H. Hacıhabiboğlu, E. De Sena, and Z. Cvetković, inventors; King's College London, assignee, "Microphone array", USPTO Patent n. 8,976,977, filed 15/10/2010, granted 10/3/2015.