Enzo De Sena

Dr Enzo De Sena


Lecturer in Audio, International Relations Officer (Department of Music and Media)
BSc, MSc, PhD

Biography

University roles and responsibilities

  • International Relations Officer for the Department of Music and Media

Research

Research interests

My publications

Publications

Lightburn L, De Sena E, Moore A, Naylor P, Brookes M (2017) Improving the perceptual quality of ideal binary masked speech, Proceedings of ICASSP 2017 IEEE
It is known that applying a time-frequency binary mask to very noisy speech can improve its intelligibility but results in poor perceptual quality. In this paper we propose a new approach to applying a binary mask that combines the intelligibility gains of conventional binary masking with the perceptual quality gains of a classical speech enhancer. The binary mask is not applied directly as a time-frequency gain as in most previous studies. Instead, the mask is used to supply prior information to a classical speech enhancer about the probability of speech presence in different time-frequency regions. Using an oracle ideal binary mask, we show that the proposed method results in a higher predicted quality than other methods of applying a binary mask whilst preserving the improvements in predicted intelligibility.
Hac1habiboglu H, De Sena E, Cvetkovic Z, Johnston J, Smith J (2017) Perceptual Spatial Audio Recording, Simulation, and Rendering: An overview of spatial-audio techniques based on psychoacoustics, IEEE Signal Processing Magazine 34 (3) pp. 36-54 IEEE
Developments in immersive audio technologies have been evolving in two directions: physically-motivated and perceptually-motivated systems. Physically motivated techniques aim to reproduce a physically accurate approximation of desired sound fields by employing a very high equipment load and sophisticated computationally intensive algorithms. Perceptuallymotivated techniques, on the other hand, aim to render only the perceptually relevant aspects of the sound scene by means of modest computational and equipment load. This article presents an overview of perceptually motivated techniques, with a focus on multichannel audio recording and reproduction, audio source and reflection culling, and artificial reverberators.
Vairetti G, De Sena E, Catrysse M, Jensen S, Moonen M, van Waterschoot T (2017) A Scalable Algorithm for Physically Motivated and Sparse Approximation of Room Impulse Responses with Orthonormal Basis Functions, IEEE/ACM Trans. Audio, Speech and Language Processing 25 (7) pp. 1547-1561 IEEE
Parametric modeling of room acoustics aims at representing room transfer functions (RTFs) by means of digital filters and finds application in many acoustic signal enhancement algorithms. In previous work by other authors, the use of orthonormal basis functions (OBFs) for modeling room acoustics has been proposed. Some advantages of OBF models over all-zero and pole-zero models have been illustrated, mainly focusing on the fact that OBF models typically require less model parameters to provide the same model accuracy. In this paper, it is shown that the orthogonality of the OBF model brings several additional advantages, which can be exploited if a suitable algorithm for identifying the OBF model parameters is applied. Specifically, the orthogonality of OBF models does not only lead to improved model efficiency (as pointed out in previous work), but also leads to improved model scalability and model stability. Its appealing scalability property derives from a previously unexplored interpretation of the OBF model as an approximation to a solution of the inhomogeneous acoustic wave equation. Following this interpretation, a novel identification algorithm is proposed that takes advantage of the OBF model orthogonality to deliver efficient, scalable and stable OBF model estimates, which is not necessarily the case for nonlinear estimation techniques that are normally applied.
Vairetti G, Kaplanis N, De Sena E, Jonsen S, Bech S, Moonen M, van Waterschoot T (2017) The Subwoofer Room Impulse Response (SUBRIR) database, Journal of the Audio Engineering Society 65 (5) pp. 389-401 Audio Engineering Society
This report introduces a new database of room impulse responses (RIRs) measured in an empty rectangular room using subwoofers as sound sources. The purpose of this database, publicly available for download, is to provide acoustic measurements within the frequency region of modal resonances. Performing acoustic measurements at low frequencies presents many difficulties, mainly related to ambient noise and to unavoidable nonlinearities of the subwoofer. In this report, it is shown that these issues can be addressed and partially solved by means of the exponential sine-sweep technique and a careful calibration of the measurement equipment. A procedure for estimating the reverberation time at very low frequencies is proposed, which uses a cosine-modulated filterbank and an approximation of the RIRs using parametric models in order to reduce problems related to low signal-to-noise ratio and to the length of typical band-pass filter responses.
Antonello N, De Sena E, Moonen M, Naylor P, van Waterschoot T (2017) Room impulse response interpolation using a sparse
spatio-temporal representation of the sound field,
IEEE/ACM Transactions on Audio, Speech, and Language Processing 25 (10) pp. 1929-1941 IEEE
Room Impulse Responses (RIRs) are typically measured
using a set of microphones and a loudspeaker. When
RIRs spanning a large volume are needed, many microphone
measurements must be used to spatially sample the sound field.
In order to reduce the number of microphone measurements,
RIRs can be spatially interpolated. In the present study, RIR
interpolation is formulated as an inverse problem. This inverse
problem relies on a particular acoustic model capable of representing
the measurements. Two different acoustic models are
compared: the plane wave decomposition model and a novel
time-domain model that consists of a collection of equivalent
sources creating spherical waves. These acoustic models can
both approximate any reverberant sound field created by a far
field sound source. In order to produce an accurate RIR interpolation,
sparsity regularization is employed when solving the
inverse problem. In particular, by combining different acoustic
models with different sparsity promoting regularizations, spatial
sparsity, spatio-spectral sparsity and spatio-temporal sparsity are
compared. The inverse problem is solved using a matrix-free large
scale optimization algorithm. Simulations show that the best RIR
interpolation is obtained when combining the novel time-domain
acoustic model with the spatio-temporal sparsity regularization,
outperforming the results of the plane wave decomposition model
even when far fewer microphone measurements are available.
De Sena E, Brookes M, Naylor P, van Waterschoot T (2017) Localization experiments with reporting by head orientation: statistical framework and case study, Journal of the Audio Engineering Society 65 (12) pp. 982-996 Audio Engineering Society
This research focuses on sound localization experiments in which subjects report the position of an active sound source by turning toward it. A statistical framework for the analysis of the data is presented together with a case study from a large-scale listening experiment. The statistical framework is based on a model that is robust to the presence of front/back confusions and random errors. Closed-form natural estimators are derived, and one-sample and two-sample statistical tests are described. The framework is used to analyze the data of an auralized experiment undertaken by nearly nine hundred subjects. The objective was to explore localization performance in the horizontal plane in an informal setting and with little training, which are conditions that are similar to those typically encountered in consumer applications of binaural audio. Results show that responses had a rightward bias and that speech was harder to localize than percussion sounds, which are results consistent with the literature. Results also show that it was harder to localize sound in a simulated room with a high ceiling despite having a higher direct-to-reverberant ratio than other simulated rooms.
Antonello Niccolo, De Sena Enzo, Moonen Marc, Naylor Patrick A., van Waterschoot Toon (2018) Joint source localization and dereverberation by sound field interpolation using sparse regularization, Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing 2018 pp. 6892-6896 Institute of Electrical and Electronics Engineers (IEEE)
In this paper, source localization and dereverberation are formulated
jointly as an inverse problem. The inverse problem
consists in the interpolation of the sound field measured by a
set of microphones by matching the recorded sound pressure
with that of a particular acoustic model. This model is based
on a collection of equivalent sources creating either spherical
or plane waves. In order to achieve meaningful results, spatial,
spatio-temporal and spatio-spectral sparsity can be promoted
in the signals originating from the equivalent sources.
The inverse problem consists of a large-scale optimization
problem that is solved using a first order matrix-free optimization
algorithm. It is shown that once the equivalent source
signals capable of effectively interpolating the sound field are
obtained, they can be readily used to localize a speech sound
source in terms of Direction of Arrival (DOA) and to perform
dereverberation in a highly reverberant environment.
Pelegr1n-Garc1a D, De Sena E, van Waterschoot T, Rychtarikova M, Glorieux C (2018) Localization of a Virtual Wall by Means of Active
Echolocation by Untrained Sighted Persons,
Applied Acoustics 139 pp. 82-92 Elsevier
The active sensing and perception of the environment by auditory means is
typically known as echolocation and it can be acquired by humans, who can
profit from it in the absence of vision. We investigated the ability of twentyone
untrained sighted participants to use echolocation with self-generated oral
clicks for aligning themselves within the horizontal plane towards a virtual wall,
emulated with an acoustic virtual reality system, at distances between 1 and 32
m, in the absence of background noise and reverberation. Participants were able
to detect the virtual wall on 61% of the trials, although with large diµerences
across individuals and distances. The use of louder and shorter clicks led to an
increased performance, whereas the use of clicks with lower frequency content
allowed for the use of interaural time diµerences to improve the accuracy of
reflection localization at very long distances. The distance of 2 m was the most
difficult to detect and localize, whereas the furthest distances of 16 and 32 m
were the easiest ones. Thus, echolocation may be used eµectively to identify
large distant environmental landmarks such as buildings.
Vairetti Giacomo, De Sena Enzo, Catrysse Michael, Jensen Soren Holdt, Moonen Marc, Van Waterschoot Toon (2018) An Automatic Design Procedure for
Low-order IIR Parametric Equalizers,
Journal of the Audio Engineering Society Audio Enginering Society
Parametric equalization of an acoustic system aims to compensate for the deviations
of its response from a desired target response using parametric digital filters.
An optimization procedure is presented for the automatic design of a low-order equalizer
using parametric infinite impulse response (IIR) filters, specifically second-order
peaking filters and first-order shelving filters. The proposed procedure minimizes the
sum of square errors (SSE) between the system and the target complex frequency
responses, instead of the commonly used difference in magnitudes, and exploits a
previously unexplored orthogonality property of one particular type of parametric
filter. This brings a series of advantages over the state-of-the-art procedures, such as
an improved mathematical tractability of the equalization problem, with the possibility
of computing analytical expressions for the gradients, an improved initialization
of the parameters, including the global gain of the equalizer, the incorporation of
shelving filters in the optimization procedure, and a more accentuated focus on
the equalization of the more perceptually relevant frequency peaks. Examples of
loudspeaker and room equalization are provided, as well as a note about extending
the procedure to multi-point equalization and transfer function modeling.

Additional publications