Enzo De Sena

Dr Enzo De Sena


Associate Professor in Audio, Head of Research (Institute of Sound Recording), International Relations Officer (Department of Music and Media)
BSc, MSc, PhD, FHEA

About

University roles and responsibilities

  • International Relations Officer for the Department of Music and Media
  • Head of Research of the Institute of Sound Recording (IoSR)

    Research

    Research interests

    Publications

    For acoustic source localization, a map of the acoustic scene as obtained by the steered response power (SRP) approach can be employed. In SRP, the frequency-weighted output power of a beamformer steered towards a set of candidate locations is obtained from generalized cross-correlations (GCCs). Due to the dense grid of candidate locations, conventional SRP exhibits a high computational complexity. While a number of low-complexity SRP-based localization approaches using non-exhaustive spatial search have been proposed, few studies aim to construct a full SRP map at reduced computational cost. In this paper, we propose two scalable approaches to this problem. Expressing the SRP map as a matrix transform of frequency-domain GCCs, we decompose the SRP matrix into a sampling matrix and an interpolation matrix. While the sampling operation can be implemented efficiently by the inverse fast Fourier transform (iFFT), we propose to use optimal low-rank or sparse approximations of the interpolation matrix for further complexity reduction. The proposed approaches, refered to as sampling + low-rank interpolation-based SRP (SLRI-SRP) and sampling + sparse interpolation-based SRP (SSPI-SRP), are evaluated in a near-field (NF) and a far-field (FF) localization scenario and compared to a state-of-the-art low-rank-based SRP approach (LR-SRP). The results indicate that SSPI-SRP outperforms both SLRI-SRP and LR-SRP over a wide complexity range in terms of approximation error and localization accuracy, achieving a complexity reduction of two to three orders of magnitude as compared to conventional SRP. A MATLAB implementation is available online.

    Timuçin B. Atalay, Zühre Sü Gül, Enzo De Sena, Zoran Cvetkovic, Hüseyin Hacıhabiboğlu (2021)Simulation of coupled volume acoustics with coupled volume scattering delay network models, In: The Journal of the Acoustical Society of America149(4)pp. A117-A117

    Simulation of the acoustics of coupled rooms is an important problem not only in architectural acoustics but also in immersive audio applications that require acoustic simulation at interactive rates. Requirements for such applications are less demanding for accuracy but more demanding for computational cost. Scattering delay network (SDN) is a real-time, interactive room acoustics simulator for cuboid rooms. SDN affords an exact simulation of first-order early reflections, a gracefully degrading simulation of second and higher-order specular reflections and an accurate simulation of the statistical properties of the late reverberation. We propose coupled-volume SDN (CV-SDN) as an extension of the SDN model to simulate acoustics of coupled volumes. The proposed model retains the desirable characteristics of the original SDN model while allowing the simulation of double-slope decays with direct control over the simulated aperture size. The double-slope characteristics of room impulse responses simulated with CV-SDN agree well with those of measured impulse responses from a scale model and state-of-the-art room acoustics simulation software.

    Leny Vinceslas, Matteo Scerbo, Huseyin Hacihabiboglu, Zoran Cvetkovic, Enzo De Sena (2023)Low-Complexity Higher Order Scattering Delay Networks, In: 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)pp. 1-5 IEEE

    Room acoustic models are used in immersive applications to create convincing virtual environments. The computational cost of full-scale physical models remains prohibitive for most real-time auralisation systems. Scattering delay network (SDN) is a perceptually-motivated and computationally efficient modelling concept that renders the line of sight and first-order reflections accurately, whilst approximating higher order reflections with progressively coarser spatio-temporal resolution. This paper develops a generalised SDN framework capable of rendering reflections up to a selected order accurately. The generalisation requires two issues to be considered: i) spherical spreading attenuation, and ii) placement of scattering nodes, both of which are addressed here. Simulations demonstrate that the proposed model has a similar energy behaviour to that of the image source method (ISM) and improves over standard SDN in terms of normalised echo density and in terms of accuracy of delay, attenuation and direction of early reflections, whilst maintaining the same complexity as standard SDN.

    Joshua Mannall, Lauri Savioja, Paul Calamia, Russell Mason, Enzo De Sena (2023)Efficient Diffraction Modeling Using Neural Networks and Infinite Impulse Response Filters, In: Journal of the Audio Engineering Society71(9)pp. 566-576 Audio Engineering Soc

    Creating plausible geometric acoustic simulations in complex scenes requires the inclusion of diffraction modeling. Current real-time diffraction implementations use the Uniform Theory of Diffraction, which assumes all edges are infinitely long. The authors utilize recent advances in machine learning to create an efficient infinite impulse response model trained on data generated using the physically accurate Biot-Tolstoy-Medwin model. The authors propose an approach to data generation that allows their model to be applied to higher-order diffraction. They show that their model is able to approximate the Biot-Tolstoy-Medwin model with a mean absolute level difference of 1.0 dB for first-order diffraction while maintaining a higher computational efficiency than the current state of the art using the Uniform Theory of Diffraction.

    Orchisama Das, Sebastian J. J. Schlecht, Enzo De Sena (2023)Grouped Feedback Delay Networks With Frequency-Dependent Coupling, In: IEEE/ACM transactions on audio, speech, and language processing31pp. 2004-2015 IEEE

    Feedback Delay Networks are one of the most popular and efficient means of generating artificial reverberation. Recently, we proposed the Grouped Feedback Delay Network (GFDN), which couples multiple FDNs while maintaining system stability. The GFDN can be used to model reverberation in coupled spaces that exhibit multi-stage decay. The block feedback matrix determines the inter- and intra-group coupling. In this article, we expand on the design of the block feedback matrix to include frequency-dependent coupling among the various FDN groups. We show how paraunitary feedback matrices can be designed to emulate diffraction at the aperture connecting rooms. Several methods for the construction of nearly paraunitary matrices are investigated. The proposed method supports the efficient rendering of virtual acoustics for complex room topologies in games and XR applications.

    Orchisama Das, Enzo De Sena (2023)The Complex Image Method for Simulating Wave Scattering in Room Acoustics, In: 2023 Immersive and 3D Audio: from Architecture to Automotive (I3DA)pp. 1-7 IEEE

    The Image Method (IM) has become increasingly popular for small-room acoustics simulations. While it gives an exact solution of the wave equation in shoebox-rooms with rigid walls, the assumption of rigidity is not valid in real rooms. Based on spherical wave reflection from an infinite wall, several authors have independently developed what is known as the Complex Image Method (CIM). However, its adoption in room acoustics has been rare, although it has been shown to give performance equivalent to the boundary element method in shoebox rooms with soft-walls. In this paper, we review the theory behind CIM and provide a Python implementation to study directional scattering patterns as a function of wall impedance. For a highly symmetrical room, room impulse responses simulated with CIM are shown to have less so-called "sweeping echoes" than those simulated by IM.

    Stephan Weiss, Sebastian J. Schlecht, Orchisama Das, Enzo De Sena (2023)Polynomial Procrustes Problem: Paraunitary Approximation of Matrices of Analytic Functions

    In the narrowband case, the best least squares approximation of a matrix by a unitary one is given by the Procrustes problem. In this paper, we expand this idea to matrices of analytic functions, and characterise a broadband equivalent to the narrowband case: the polynomial Procrustes problem. Its solution is based on an analytic singular value decomposition, and for the case of spectrally majorised, distinct singular values, we demonstrate the application of a suitable algorithm to three problems — time delay estimation, paraunitary matrix completion , and general paraunitary approximations — in simulations.

    Benjamin Burnett, Annika Neidhardt, Zoran Cvetkovic, Huseyin Hacihabiboglu, Enzo De Sena (2023)User Expectation of Room Acoustic Parameters in Virtual Reality Environments, In: 2023 Immersive and 3D Audio: from Architecture to Automotive (I3DA)pp. 1-10 IEEE

    This paper explores how visual attributes of a VR scene affect user expectations of room reverberation. A psychoacoustic experiment was run wherein subjects wore a VR headset and adjusted two unlabelled sliders controlling the reverberation time (T60) and the acoustic room size until the reverberant response was closest to their expectation of how the room they were seeing should sound. Different visual characteristics, in particular, room type and size, surface material, and furnishing were modified to determine how these might affect their expectations of the reverberant response. Results showed that visual room size had a significant effect on both the expected T60, in agreement with previous literature, and on the expected acoustic room size. Both relations seem to be well-described by a simple sublinear power law model, which could be used, for instance, to design reverberation time (T60) and acoustic room size values that align well with listeners' expectation for a given visual room volume. Differences in visual surface materials were found to have a statistically significant effect on the expected T60. The level of visual furnishing, on the other hand, only had a marginally significant effect on the expected T60. The results also indicate considerable subjective differences in individual expectations.

    Thomas Potter, Zoran Cvetkovic, ENZO DE SENA (2022)On the Relative Importance of Visual and Spatial Audio Rendering on VR Immersion, In: Front. Signal Process. - Audio and Acoustic Signal Processing Frontiers Media

    A study was performed using a virtual environment to investigate the relative importance of spatial audio fidelity and video resolution on perceived audiovisual quality and immersion. Subjects wore a head-mounted display and headphones and were presented with a virtual environment featuring music and speech stimuli using three levels each of spatial audio quality and video resolution. Spatial audio was rendered monaurally, binaurally with head-tracking, and binaurally with head-tracking and room acoustic rendering. Video was rendered at resolutions of 0.5 megapixels per eye, 1.5 megapixels per eye, and 2.5 megapixels per eye. Results showed that both video resolution and spatial audio rendering had a statistically significant effect on both immersion and audiovisual quality. Most strikingly, the results showed that under the conditions that were tested in the experiment, the addition of room acoustic rendering to head-tracked binaural audio had the same improvement on immersion as increasing the video resolution five-fold, from 0.5 megapixels per eye to 2.5 megapixels per eye.

    Timuçin Berk Atalay, Zühre Sü Gül, ENZO DE SENA, Zoran Cvetkovic, Hüseyin Hacıhabiboğlu (2021)Scattering Delay Network Simulator of Coupled Volume Acoustics, In: IEEE/ACM transactions on audio, speech, and language processing a publication of the Signal Processing Society IEEE

    —Artificial reverberators provide a computationally viable alternative to full-scale room acoustics simulation methods for deployment in interactive, immersive systems. Scattering delay network (SDN) is an artificial reverberator that allows direct parametric control over the geometry of a simulated cuboid enclosure, as well as the directional characteristics of the simulated sound sources and microphones. This paper extends the concept of SDN reverberators to multiple enclosures coupled via an aperture. The extension allows independent control of the acoustical properties of the coupled enclosures and the size of the connecting aperture. Transfer functions of the coupled-volume SDN are derived. The effectiveness of the proposed method is evaluated in terms of rendered energy decay curves in comparison to full-scale ray-tracing models and scale model measurements.

    Stojan Djordjević, Hüseyin Hacıhabiboğlu, Zoran Cvetkovic, Enzo De Sena (2020)Evaluation of the Perceived Naturalness of Artificial Reverberation Algorithms

    Listening tests were carried out using a modified MUSHRA method to compare the perceived naturalness of reverberation generated using scattering delay networks (SDNs), feedback delay networks (FDNs), CATT- Acoustic modelling, and convolution with recorded room impulse responses. The difference in naturalness ratings achieved by reverberation generated using FDNs and SDNs was statistically significant, with the mean rating being 12% higher for SDN stimuli than for FDN stimuli. It was also found that CATT-Acoustic models which had been simplified to a bare rectangular room received lower ratings than models that included furniture or irregular room shaping, suggesting that the scattering and mixing effects of irregularities cause improvements in perceived naturalness of the generated reverberation.

    Timucin Berk Atalay, Zuhre Su Gul, Enzo De Sena, Zoran Cvetkovic, Huseyin Hachabiboglu (2022)Scattering Delay Network Simulator of Coupled Volume Acoustics, In: IEEE/ACM transactions on audio, speech, and language processing30582pp. 582-593 IEEE

    Artificial reverberators provide a computationally viable alternative to full-scale room acoustics simulation methods for deployment in interactive, immersive systems. Scattering delay network (SDN) is an artificial reverberator that allows direct parametric control over the geometry of a simulated cuboid enclosure, as well as the directional characteristics of the simulated sound sources and microphones. This paper extends the concept of SDN reverberators to multiple enclosures coupled via an aperture. The extension allows independent control of the acoustical properties of the coupled enclosures and the size of the connecting aperture. Transfer functions of the coupled-volume SDN are derived. The effectiveness of the proposed method is evaluated in terms of rendered energy decay curves in comparison to full-scale ray-tracing models and scale model measurements.

    D Pelegrın-Garcıa, Enzo De Sena, T van Waterschoot, M Rychtarikova, C Glorieux (2018)Localization of a Virtual Wall by Means of Active Echolocation by Untrained Sighted Persons, In: Applied Acoustics139pp. 82-92 Elsevier

    The active sensing and perception of the environment by auditory means is typically known as echolocation and it can be acquired by humans, who can profit from it in the absence of vision. We investigated the ability of twentyone untrained sighted participants to use echolocation with self-generated oral clicks for aligning themselves within the horizontal plane towards a virtual wall, emulated with an acoustic virtual reality system, at distances between 1 and 32 m, in the absence of background noise and reverberation. Participants were able to detect the virtual wall on 61% of the trials, although with large di↵erences across individuals and distances. The use of louder and shorter clicks led to an increased performance, whereas the use of clicks with lower frequency content allowed for the use of interaural time di↵erences to improve the accuracy of reflection localization at very long distances. The distance of 2 m was the most difficult to detect and localize, whereas the furthest distances of 16 and 32 m were the easiest ones. Thus, echolocation may be used e↵ectively to identify large distant environmental landmarks such as buildings.

    H Hacıhabiboglu, Enzo De Sena, Z Cvetkovic, J Johnston, JO Smith (2017)Perceptual Spatial Audio Recording, Simulation, and Rendering: An overview of spatial-audio techniques based on psychoacoustics, In: IEEE Signal Processing Magazine34(3)pp. 36-54 IEEE

    Developments in immersive audio technologies have been evolving in two directions: physically-motivated and perceptually-motivated systems. Physically motivated techniques aim to reproduce a physically accurate approximation of desired sound fields by employing a very high equipment load and sophisticated computationally intensive algorithms. Perceptuallymotivated techniques, on the other hand, aim to render only the perceptually relevant aspects of the sound scene by means of modest computational and equipment load. This article presents an overview of perceptually motivated techniques, with a focus on multichannel audio recording and reproduction, audio source and reflection culling, and artificial reverberators.

    N Antonello, Enzo De Sena, M Moonen, PA Naylor, T van Waterschoot (2017)Room impulse response interpolation using a sparse spatio-temporal representation of the sound field, In: IEEE/ACM Transactions on Audio, Speech, and Language Processing25(10)pp. 1929-1941 IEEE

    Room Impulse Responses (RIRs) are typically measured using a set of microphones and a loudspeaker. When RIRs spanning a large volume are needed, many microphone measurements must be used to spatially sample the sound field. In order to reduce the number of microphone measurements, RIRs can be spatially interpolated. In the present study, RIR interpolation is formulated as an inverse problem. This inverse problem relies on a particular acoustic model capable of representing the measurements. Two different acoustic models are compared: the plane wave decomposition model and a novel time-domain model that consists of a collection of equivalent sources creating spherical waves. These acoustic models can both approximate any reverberant sound field created by a far field sound source. In order to produce an accurate RIR interpolation, sparsity regularization is employed when solving the inverse problem. In particular, by combining different acoustic models with different sparsity promoting regularizations, spatial sparsity, spatio-spectral sparsity and spatio-temporal sparsity are compared. The inverse problem is solved using a matrix-free large scale optimization algorithm. Simulations show that the best RIR interpolation is obtained when combining the novel time-domain acoustic model with the spatio-temporal sparsity regularization, outperforming the results of the plane wave decomposition model even when far fewer microphone measurements are available.

    Enzo De Sena, Zoran Cvetkovic, Huseyin Hacıhabiboglu, Marc Moonen, Toon van Waterschoot (2020)Localization Uncertainty in Time-Amplitude Stereophonic Reproduction, In: IEEE/ACM Transactions on Audio, Speech, and Language Processing IEEE

    This paper studies the effects of inter-channel time and level differences in stereophonic reproduction on perceived localization uncertainty, which is defined as how difficult it is for a listener to tell where a sound source is located. Towards this end, a computational model of localization uncertainty is proposed first. The model calculates inter-aural time and level difference cues, and compares them to those associated to freefield point-like sources. The comparison is carried out using a particular distance functional that replicates the increased uncertainty observed experimentally with inconsistent inter-aural time and level difference cues. The model is validated by formal listening tests, achieving a Pearson correlation of 0:99. The model is then used to predict localization uncertainty for stereophonic setups and a listener in central and off-central positions. Results show that amplitude methods achieve a slightly lower localization uncertainty for a listener positioned exactly in the center of the sweet spot. As soon as the listener moves away from that position, the situation reverses, with time-amplitude methods achieving a lower localization uncertainty.

    G Vairetti, Enzo De Sena, M Catrysse, SH Jensen, M Moonen, T van Waterschoot (2017)A Scalable Algorithm for Physically Motivated and Sparse Approximation of Room Impulse Responses with Orthonormal Basis Functions, In: IEEE/ACM Trans. Audio, Speech and Language Processing25(7)pp. 1547-1561 IEEE

    Parametric modeling of room acoustics aims at representing room transfer functions (RTFs) by means of digital filters and finds application in many acoustic signal enhancement algorithms. In previous work by other authors, the use of orthonormal basis functions (OBFs) for modeling room acoustics has been proposed. Some advantages of OBF models over all-zero and pole-zero models have been illustrated, mainly focusing on the fact that OBF models typically require less model parameters to provide the same model accuracy. In this paper, it is shown that the orthogonality of the OBF model brings several additional advantages, which can be exploited if a suitable algorithm for identifying the OBF model parameters is applied. Specifically, the orthogonality of OBF models does not only lead to improved model efficiency (as pointed out in previous work), but also leads to improved model scalability and model stability. Its appealing scalability property derives from a previously unexplored interpretation of the OBF model as an approximation to a solution of the inhomogeneous acoustic wave equation. Following this interpretation, a novel identification algorithm is proposed that takes advantage of the OBF model orthogonality to deliver efficient, scalable and stable OBF model estimates, which is not necessarily the case for nonlinear estimation techniques that are normally applied.

    Niccolò Antonello, Enzo De Sena, Marc Moonen, Patrick A. Naylor, Toon van Waterschoot (2019)Joint acoustic localization and dereverberation through plane wave decomposition and sparse regularization, In: IEEE Transactions on Audio, Speech and Language Processing27(12)pp. 1893-1905 IEEE

    Acoustic source localization and dereverberation are formulated jointly as an inverse problem. The inverse problem consists of the approximation of the sound field measured by a set of microphones. The recorded sound pressure is matched with that of a particular acoustic model based on a collection of plane waves arriving from different directions at the microphone positions. In order to achieve meaningful results, spatial and spatio-spectral sparsity can be promoted in the weight signals controlling the plane waves. The large-scale optimization problem resulting from the inverse problem formulation is solved using a first order optimization algorithm combined with a weighted overlap-add procedure. It is shown that once the weight signals capable of effectively approximating the sound field are obtained, they can be readily used to localize a moving sound source in terms of direction of arrival (DOA) and to perform dereverberation in a highly reverberant environment. Results from simulation experiments and from real measurements show that the proposed algorithm is robust against both localized and diffuse noise exhibiting a noise reduction in the dereverberated signals.

    Giacomo Vairetti, Enzo De Sena, Michael Catrysse, Soren Holdt Jensen, Marc Moonen, Toon Van Waterschoot (2018)An Automatic Design Procedure for Low-order IIR Parametric Equalizers, In: Journal of the Audio Engineering Society66(11)pp. 935-952 Audio Enginering Society

    Parametric equalization of an acoustic system aims to compensate for the deviations of its response from a desired target response using parametric digital filters. An optimization procedure is presented for the automatic design of a low-order equalizer using parametric infinite impulse response (IIR) filters, specifically second-order peaking filters and first-order shelving filters. The proposed procedure minimizes the sum of square errors (SSE) between the system and the target complex frequency responses, instead of the commonly used difference in magnitudes, and exploits a previously unexplored orthogonality property of one particular type of parametric filter. This brings a series of advantages over the state-of-the-art procedures, such as an improved mathematical tractability of the equalization problem, with the possibility of computing analytical expressions for the gradients, an improved initialization of the parameters, including the global gain of the equalizer, the incorporation of shelving filters in the optimization procedure, and a more accentuated focus on the equalization of the more perceptually relevant frequency peaks. Examples of loudspeaker and room equalization are provided, as well as a note about extending the procedure to multi-point equalization and transfer function modeling.

    Enzo De Sena, Mike Brookes, Patrick A. Naylor, Toon van Waterschoot (2017)Localization experiments with reporting by head orientation: statistical framework and case study, In: Journal of the Audio Engineering Society65(12)pp. 982-996 Audio Engineering Society

    This research focuses on sound localization experiments in which subjects report the position of an active sound source by turning toward it. A statistical framework for the analysis of the data is presented together with a case study from a large-scale listening experiment. The statistical framework is based on a model that is robust to the presence of front/back confusions and random errors. Closed-form natural estimators are derived, and one-sample and two-sample statistical tests are described. The framework is used to analyze the data of an auralized experiment undertaken by nearly nine hundred subjects. The objective was to explore localization performance in the horizontal plane in an informal setting and with little training, which are conditions that are similar to those typically encountered in consumer applications of binaural audio. Results show that responses had a rightward bias and that speech was harder to localize than percussion sounds, which are results consistent with the literature. Results also show that it was harder to localize sound in a simulated room with a high ceiling despite having a higher direct-to-reverberant ratio than other simulated rooms.

    Joshua Mannall, Lauri Savioja, Paul Calamia, Russell David Mason, Enzo De Sena (2023)Efficient diffraction modelling using neural networks and infinite impulse response filters *, In: Journal of the Audio Engineering Society. [electronic resource]

    Creating plausible geometric acoustic simulations in complex scenes requires the inclusion of diffraction modelling. Current real-time diffraction implementations use the Uniform Theory of Diffraction (UTD) which assumes all edges are infinitely long. We utilise recent advances in machine learning to create an efficient infinite impulse response model trained on data generated using the physically accurate Biot-Tolstoy-Medwin model. We propose an approach to data generation that allows our model to be applied to higher-order diffraction. We show that our model is able to approximate the Biot-Tolstoy-Medwin model with a mean absolute level difference of 1.0 dB for 1st-order diffraction while maintaining a higher computational efficiency than the current state of the art using UTD.

    Randall Ali, Thomas Dietzen, Matteo Scerbo, Leny Vinceslas, Toon van Waterschoot, Enzo De Sena (2023)Relating Wave-based and Geometric Acoustics using a Stationary Phase Approximation European Acoustics Association

    Room acoustic simulation using physically motivated sound propagation models are typically separated into wave-based methods and geometric methods. While each of these methods has been extensively studied, the question on when to transition from a wave-based to a geometric method still remains somewhat unclear. Towards building greater understanding of the links between wavebased and geometric methods, this paper investigates the transition question by using the method of stationary phase. As a starting point, we consider an elementary scenario with a geometrically interpretable analytic solution, namely that of an infinite rigid boundary mirroring a single monopole sound source, and apply the stationary phase approximation (SPA) to the wave-based boundary integral equation (BIE). The results of the analysis demonstrate how net boundary contributions give rise to the geometric interpretation offered by the SPA and provide the conditions when the SPA is asymptotically equal to the analytical solution in this elementary scenario. Although the results are unsurprising and intuitive, the insights gained from this analysis pave the way for investigating relations between wave-based and geometric methods in more complicated room acoustics scenarios.

    Joshua John Mannall, Orchisama Das, Paul Calamia, Enzo De Sena (2022)Perceptual evaluation of low-complexity diffraction models from a single edge, In: Proceedings of the 2022 International Conference on Audio for Virtual and Augmented Reality 2022 August 15–17, Redmond, WA, USA Audio Engineering Society (AES)

    Geometric acoustic models have a lower computational complexity than wave-based methods due to the assumption that sound propagates as rays, however this fails to consider the wave-like properties of sound such as diffraction. Historically, tthe Biot-Tolstoy-Medwin (BTM) model and the Uniform Theory of Diffraction (UTD) have been used to augment geometric acoustic models with diffraction. Computational efficiency is essential for real-time application and recently two more efficient models, the Volumetric Diffraction and Transmission (VDaT) model and an infinite impulse response filter (IIR) approximation, were proposed to approximate these solutions. A higher-order IIR filter approximation is proposed in this paper. An experiment is carried out to evaluate the perceived naturalness of these approximations compared to the more accurate analytical solutions. Stationary and moving receivers were considered in simple geometries with a single edge. The results suggest that the higher order IIR approximation is perceptually similar to the BTM model. VDaT and the low order IIR approximation were found to be less natural in some cases. While in dynamic scenes, VDaT was found to be significantly more natural than the other models. The experiment was limited in scope by the simplicity of the scenes considered, however the results suggest the models are perceptually similar. Improvements to the higher-order IIR approximation are suggested and a recommendation is made for future perceptual evaluations.

    Modelling of analogue devices via deep neural networks (DNNs) has gained popularity recently, but their performance is usually measured using accuracy measures alone. This paper aims to assess the performance of DNN models of a high-gain vacuum-tube guitar amplifier using additional subjective measures, including preference and realism. Furthermore, the paper explores how the performance changes when genre-specific training data is used. In five listening tests, subjects rated models of a popular high-gain guitar amplifier, the Peavey 6505, in terms of preference, realism and perceptual accuracy. Two DNN models were used: a long short-term memory recurrent neural network (LSTM-RNN) and a WaveNet-based convolutional neural network (CNN). The LSTM-RNN model was shown to be more accurate when trained with genre-specific data, to the extent that it could not be distinguished from the real amplifier in ABX tests. Despite minor perceptual inaccuracies , subjects found all models to be as realistic as the target in MUSHRA-like experiments, and there was no evidence to suggest that the real amplifier was preferred to any of the models in a mix. Finally, it was observed that a low-gain excerpt was more difficult to emulate, and was therefore useful to reveal differences between the models.

    P.J. Dawson, E. De Sena, P. A. Naylor (2018)An acoustic image-source characterisation of surface profiles, In: Proceedings 2018 26th European Signal Processing Conference (EUSIPCO)pp. pp 2130-2134 IEEE

    The image-source method models the specular reflection from a plane by means of a secondary source positioned at the source’s reflected image. The method has been widely used in acoustics to model the reverberant field of rectangular rooms, but can also be used for general-shaped rooms and nonflat reflectors. This paper explores the relationship between the physical properties of a non-flat reflector and the statistical properties of the associated cloud of image-sources. It is shown here that the standard deviation of the image-sources is strongly correlated with the ratio between depth and width of the reflector’s spatial features.

    Juan Franco, Bogdan Bǎcilǎ, Tim Brookes, Enzo De Sena (2022)A multi-angle, multi-distance dataset of microphone impulse responses, In: Journal of the Audio Engineering Society

    A new publicly available dataset of microphone impulse responses (IRs) has been generated. The dataset covers 25 microphones, including a Class-1 measurement microphone, plus polar pattern variations for 7 of the microphones. Microphones were included having: omnidirectional, cardioid, supercardioid and bidirectional polar patterns; condenser, moving-coil and ribbon transduction types; single and dual diaphragms; multiple body and head basket shapes; small and large diaphragms; and end-address and side-address designs. Using a custom-developed computer-controlled precision turntable, IRs were captured quasi-anechoically at incident angles from 0º to 355º in steps of 5º, and at source-to-microphone distances of 0.5 m, 1.25 m and 5 m. The resulting dataset is suitable for perceptual and objective studies related to the incident-angle-dependent response of microphones, as well as for the development of tools for predicting and emulating on- and off-axis microphone characteristics. The captured IRs allow generation of frequency response plots with a degree of detail not commonly available in manufacturer-supplied data sheets, and are also particularly well suited to harmonic distortion analysis.

    ENZO DE SENA, Brian Fitzpatrick, Toon Van Waterschoot (2021)On the Convergence of the Multipole Expansion Method, In: SIAM Journal on Numerical Analysis Society for Industrial and Applied Mathematics

    The multipole expansion method (MEM) is a spatial discretization technique that is widely used in applications that feature scattering of waves from circular cylinders. Moreover, it also serves as a key component in several other numerical methods in which scattering computations involving arbitrarily shaped objects are accelerated by enclosing the objects in artificial cylinders. A fundamental question is that of how fast the approximation error of the MEM converges to zero as the truncation number goes to infinity. Despite the fact that the MEM was introduced in 1913, and has been in widespread usage as a numerical technique since as far back as 1955, a precise characterization of the asymptotic rate of convergence of the MEM has not been obtained. In this work, we provide a resolution to this issue. While our focus in this paper is on the Dirichlet scattering problem, this is merely for convenience and our results actually establish convergence rates that hold for all MEM formulations irrespective of the specific boundary conditions or boundary integral equation solution representation chosen.

    Thomas Dietzen, ENZO DE SENA, Toon Van Waterschoot (2021)LOW-COMPLEXITY STEERED RESPONSE POWER MAPPING BASED ON NYQUIST-SHANNON SAMPLING

    The steered response power (SRP) approach to acoustic source lo-calization computes a map of the acoustic scene from the frequency-weighted output power of a beamformer steered towards a set of candidate locations. Equivalently, SRP may be expressed in terms of time-domain generalized cross-correlations (GCCs) at lags equal to the candidate locations' time-differences of arrival (TDOAs). Due to the dense grid of candidate locations, each of which requires inverse Fourier transform (IFT) evaluations, conventional SRP exhibits a high computational complexity. In this paper, we propose a low-complexity SRP approach based on Nyquist-Shannon sampling. Noting that on the one hand the range of possible TDOAs is physically bounded, while on the other hand the GCCs are band-limited, we critically sample the GCCs around their TDOA interval and approximate the SRP map by interpolation. In usual setups, the number of sample points can be orders of magnitude less than the number of candidate locations and frequency bins, yielding a significant reduction of IFT computations at a limited interpolation cost. Simulations comparing the proposed approximation with conventional SRP indicate low approximation errors and equal localization performance. A MATLAB implementation is available online.

    Joshua Mannall, Orchisama Das, Paul Calamia, ENZO DE SENA (2022)Perceptual evaluation of low-complexity diffraction models from a single edge

    Geometric acoustic models have a lower computational complexity than wave-based methods due to the assumption that sound propagates as rays, however this fails to consider the wave-like properties of sound such as diffraction. Historically, tthe Biot-Tolstoy-Medwin (BTM) model and the Uniform Theory of Diffraction (UTD) have been used to augment geometric acoustic models with diffraction. Computational efficiency is essential for real-time application and recently two more efficient models, the Volumetric Diffraction and Transmission (VDaT) model and an infinite impulse response filter (IIR) approximation, were proposed to approximate these solutions. A higher-order IIR filter approximation is proposed in this paper. An experiment is carried out to evaluate the perceived naturalness of these approximations compared to the more accurate analytical solutions. Stationary and moving receivers were considered in simple geometries with a single edge. The results suggest that the higher order IIR approximation is perceptually similar to the BTM model. VDaT and the low order IIR approximation were found to be less natural in some cases. While in dynamic scenes, VDaT was found to be significantly more natural than the other models. The experiment was limited in scope by the simplicity of the scenes considered, however the results suggest the models are perceptually similar. Improvements to the higher-order IIR approximation are suggested and a recommendation is made for future perceptual evaluations.

    Matteo Scerbo, Orchisama Das, Patrick Friend, Enzo De Sena, Patrick Friend (2022)Higher-order scattering delay networks for artificial reverberation, In: Proceedings of the 25th International Conference on Digital Audio Effects (DAFx20in22), Vienna, Austria, September 2022 Universität für Musik und darstellende Kunst Wien

    Computer simulations of room acoustics suffer from an efficiency vs accuracy trade-off, with highly accurate wave-based models being highly computationally expensive, and delay-network-based models lacking in physical accuracy. The Scattering Delay Network (SDN) is a highly efficient recursive structure that renders first order reflections exactly while approximating higher order ones. With the purpose of improving the accuracy of SDNs, in this paper, several variations on SDNs are investigated, including appropriate node placement for exact modeling of higher order reflections , redesigned scattering matrices for physically-motivated scattering, and pruned network connections for reduced computational complexity. The results of these variations are compared to state-of-the-art geometric acoustic models for different shoebox room simulations. Objective measures (Normalized Echo Densities (NEDs) and Energy Decay Curves (EDCs)) showed a close match between the proposed methods and the references. A formal listening test was carried out to evaluate differences in perceived naturalness of the synthesized Room Impulse Responses. Results show that increasing SDNs' order and adding directional scattering in a fully-connected network improves perceived naturalness, and higher-order pruned networks give similar performance at a much lower computational cost.

    L Lightburn, Enzo De Sena, A Moore, PA Naylor, M Brookes (2017)Improving the perceptual quality of ideal binary masked speech, In: Proceedings of ICASSP 2017 IEEE

    It is known that applying a time-frequency binary mask to very noisy speech can improve its intelligibility but results in poor perceptual quality. In this paper we propose a new approach to applying a binary mask that combines the intelligibility gains of conventional binary masking with the perceptual quality gains of a classical speech enhancer. The binary mask is not applied directly as a time-frequency gain as in most previous studies. Instead, the mask is used to supply prior information to a classical speech enhancer about the probability of speech presence in different time-frequency regions. Using an oracle ideal binary mask, we show that the proposed method results in a higher predicted quality than other methods of applying a binary mask whilst preserving the improvements in predicted intelligibility.

    Jessica Camilleri, Neofytos Kaplanis, Enzo De Sena (2019)Evaluation of Car Cabin Acoustics Using Auralisation over Headphones, In: Tony Tew, Duncan Williams (eds.), Proceedings 2019 AES International Conference on Immersive and Interactive Audio Audio Engineering Society

    The auralization schemes in the domain of automotive audio have primarily utilized dummy head recordings in the past. Recently, spatial reproduction allowed the auralization of cabin acoustics over large loudspeaker arrays. Yet no direct comparisons between those methods exist. In this study, the efficacy of headphone presentation is explored in this context. Six acoustical conditions were presented over headphones to experienced assessors (n=23), who were asked to compare them over six elicited perceptual attributes. In 24 out of 36 cases, the results indicate an agreement between headphone- and loudspeaker-based auralisation of identical stimuli sets. It is concluded that, when compared to loudspeakers-based rendering, headphones-based rendering reveals similar judgment on timbral attributes, while certain spatial attributes should be assessed with caution.

    Niccolo Antonello, Enzo De Sena, Marc Moonen, Patrick A. Naylor, Toon van Waterschoot (2018)Joint source localization and dereverberation by sound field interpolation using sparse regularization, In: Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing2018pp. 6892-6896 Institute of Electrical and Electronics Engineers (IEEE)

    In this paper, source localization and dereverberation are formulated jointly as an inverse problem. The inverse problem consists in the interpolation of the sound field measured by a set of microphones by matching the recorded sound pressure with that of a particular acoustic model. This model is based on a collection of equivalent sources creating either spherical or plane waves. In order to achieve meaningful results, spatial, spatio-temporal and spatio-spectral sparsity can be promoted in the signals originating from the equivalent sources. The inverse problem consists of a large-scale optimization problem that is solved using a first order matrix-free optimization algorithm. It is shown that once the equivalent source signals capable of effectively interpolating the sound field are obtained, they can be readily used to localize a speech sound source in terms of Direction of Arrival (DOA) and to perform dereverberation in a highly reverberant environment.

    Ege Erdem, Enzo De Sena, Huseyin Hacihabiboglu, Zoran Cvetkovic (2019)Perceptual Soundfield Reconstruction in Three Dimensions via Sound Field Extrapolation, In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)pp. 8023-8027 IEEE

    Perceptual sound field reconstruction (PSR) is a spatial audio recording and reproduction method based on the application of stereophonic panning laws in microphone array design. PSR allows rendering a perceptually veridical and stable auditory perspective in the horizontal plane of the listener, and involves recording using near-coincident microphone arrays. This paper extends the PSR concept to three dimensions using sound field extrapolation carried out in the spherical-harmonic domain. Sound field rendering is performed using a two-level loudspeaker rig. An active-intensity-based analysis of the rendered sound field shows that the proposed approach can render direction of monochromatic plane waves accurately.

    LESLIE GASTON-BIRD, RUSSELL DAVID MASON, ENZO DE SENA (2021)Inclusivity in Immersive Audio: Current Participation and Barriers to Entry

    Media and entertainment companies have embraced immersive audio technology for cinema, television, games, and music. Meanwhile, in recent years there has been a rise in the number of organizations welcoming underrepresented groups to the field of audio. However, although some disciplines such as music recording are seeing an increase in participation, others are not keeping pace. Immersive and spatial audio are disciplines in which diversity is measurably lacking. Audio based mixed-gender social media groups are comprised of less than 10% women and minorities, and groups dedicated to immersive audio exhibit poorer representation. Barriers to entry are societal as well as economic; however, outreach, networking opportunities, mentoring, and affordable education are remedies have been shown to be effective for related industries and should be adopted by the immersive audio industry.

    G Vairetti, N Kaplanis, Enzo De Sena, SH Jonsen, S Bech, M Moonen, T van Waterschoot (2017)The Subwoofer Room Impulse Response (SUBRIR) database, In: Journal of the Audio Engineering Society65(5)pp. 389-401 Audio Engineering Society

    This report introduces a new database of room impulse responses (RIRs) measured in an empty rectangular room using subwoofers as sound sources. The purpose of this database, publicly available for download, is to provide acoustic measurements within the frequency region of modal resonances. Performing acoustic measurements at low frequencies presents many difficulties, mainly related to ambient noise and to unavoidable nonlinearities of the subwoofer. In this report, it is shown that these issues can be addressed and partially solved by means of the exponential sine-sweep technique and a careful calibration of the measurement equipment. A procedure for estimating the reverberation time at very low frequencies is proposed, which uses a cosine-modulated filterbank and an approximation of the RIRs using parametric models in order to reduce problems related to low signal-to-noise ratio and to the length of typical band-pass filter responses.

    Additional publications