Dominic Ward

Research Software Developer

+44 (0)1483 684731

dominic.ward@surrey.ac.uk

08 BB 01

Publications

Ward Dominic, Wierstorf Hagen, Mason Russell, Plumbley Mark, Hummersone Christopher (2017) Estimating the loudness balance of musical mixtures using audio source separation,Proceedings of the 3rd Workshop on Intelligent Music Production (WIMP 2017)

To assist with the development of intelligent mixing systems, it would be useful to be able to extract the loudness balance of sources in an existing musical mixture. The relative-to-mix loudness level of four instrument groups was predicted using the sources extracted by 12 audio source separation algorithms. The predictions were compared with the ground truth loudness data of the original unmixed stems obtained from a recent dataset involving 100 mixed songs. It was found that the best source separation system could predict the relative loudness of each instrument group with an average root-mean-square error of 1.2 LU, with superior performance obtained on vocals.

Wierstorf H, Ward D, Mason R, Girgis E, Hummersone C, Plumbley M (2017) Perceptual Evaluation of Source Separation for Remixing Music,143rd AES Convention Paper No 9880 Audio Engineering Society

Music remixing is difficult when the original multitrack recording is not available. One solution is to estimate the elements of a mixture using source separation. However, existing techniques suffer from imperfect separation and perceptible artifacts on single separated sources. To investigate their influence on a remix, five state-of-the-art source separation algorithms were used to remix six songs by increasing the level of the vocals. A listening test was conducted to assess the remixes in terms of loudness balance and sound quality. The results show that some source separation algorithms are able to increase the level of the vocals by up to 6 dB at the cost of introducing a small but perceptible degradation in sound quality.

Ward Dominic, Wierstorf Hagen, Mason Russell, Grais Emad M., Plumbley Mark (2018) BSS eval or peass? Predicting the perception of singing-voice separation,Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)pp. 596-600 Institute of Electrical and Electronics Engineers (IEEE)

DOI: 10.1109/ICASSP.2018.8462194

There is some uncertainty as to whether objective metrics for predicting the perceived quality of audio source separation are sufficiently accurate. This issue was investigated by employing a revised experimental methodology to collect subjective ratings of sound quality and interference of singing-voice recordings that have been extracted from musical mixtures using state-of-the-art audio source separation. A correlation analysis between the experimental data and the measures of two objective evaluation toolkits, BSS Eval and PEASS, was performed to assess their performance. The artifacts-related perceptual score of the PEASS toolkit had the strongest correlation with the perception of artifacts and distortions caused by singing-voice separation. Both the source-to-interference ratio of BSS Eval and the interference-related perceptual score of PEASS showed comparable correlations with the human ratings of interference.

Grais Emad M, Wierstorf Hagen, Ward Dominic, Plumbley Mark D (2018) Multi-Resolution Fully Convolutional Neural Networks for Monaural Audio Source Separation,Proceedings of LVA/ICA 2018 (Lecture Notes in Computer Science)10891pp. 340-350 Springer Verlag

DOI: 10.1007/978-3-319-93764-9_32

In deep neural networks with convolutional layers, all the neurons in each layer typically have the same size receptive fields (RFs) with the same resolution. Convolutional layers with neurons that have large RF capture global information from the input features, while layers with neurons that have small RF size capture local details with high resolution from the input features. In this work, we introduce novel deep multi-resolution fully convolutional neural networks (MR-FCN), where each layer has a range of neurons with different RF sizes to extract multi- resolution features that capture the global and local information from its input features. The proposed MR-FCN is applied to separate the singing voice from mixtures of music sources. Experimental results show that using MR-FCN improves the performance compared to feedforward deep neural networks (DNNs) and single resolution deep fully convolutional neural networks (FCNs) on the audio source separation problem.

Grais Emad M, Ward Dominic, Plumbley Mark D (2018) Raw Multi-Channel Audio Source Separation using Multi-Resolution Convolutional Auto-Encoders,Proceedings of the 2018 26th European Signal Processing Conference (EUSIPCO)pp. 1577-1581 Institute of Electrical and Electronics Engineers (IEEE)

DOI: 10.23919/EUSIPCO.2018.8553571

Supervised multi-channel audio source separation requires extracting useful spectral, temporal, and spatial features from the mixed signals. The success of many existing systems is therefore largely dependent on the choice of features used for training. In this work, we introduce a novel multi-channel, multiresolution convolutional auto-encoder neural network that works on raw time-domain signals to determine appropriate multiresolution features for separating the singing-voice from stereo music. Our experimental results show that the proposed method can achieve multi-channel audio source separation without the need for hand-crafted features or any pre- or post-processing.

Ward, Dominic (2018) Latent Variable Analysis and Signal Separation: 14th International Conference, LVA/ICA 2018, Guildford, UK, July 2?5, 2018, Proceedings,10891 Springer International Publishing

DOI: 10.1007/978-3-319-93764-9

This book constitutes the proceedings of the 14th International Conference on Latent Variable Analysis and Signal Separation, LVA/ICA 2018, held in Guildford, UK, in July 2018.The 52 full papers were carefully reviewed and selected from 62 initial submissions. As research topics the papers encompass a wide range of general mixtures of latent variables models but also theories and tools drawn from a great variety of disciplines such as structured tensor decompositions and applications; matrix and tensor factorizations; ICA methods; nonlinear mixtures; audio data and methods; signal separation evaluation campaign; deep learning and data-driven methods; advances in phase retrieval and applications; sparsity-related methods; and biomedical data and methods.

Ward Dominic, Mason Russell D., Kim Ryan Chungeun, Stöter Fabian-Robert, Liutkus Antoine, Plumbley Mark D. (2018) SISEC 2018: state of the art in musical audio source separation - Subjective selection of the best algorithm,Proceedings of the 4th Workshop on Intelligent Music Production, Huddersfield, UK, 14 September 2018 University of Huddersfield

The Signal Separation Evaluation Campaign (SiSEC) is a large-scale regular event aimed at evaluating current progress in source separation through a systematic and reproducible comparison of the participants? algorithms, providing the source separation community with an invaluable glimpse of recent achievements and open challenges. This paper focuses on the music separation task from SiSEC 2018, which compares algorithms aimed at recovering instrument stems from a stereo mix. In this context, we conducted a subjective evaluation whereby 34 listeners picked which of six competing algorithms, with high objective performance scores, best separated the singing-voice stem from 13 professionally mixed songs. The subjective results reveal strong differences between the algorithms, and highlight the presence of song-dependent performance for state-of-the-art systems. Correlations between the subjective results and the scores of two popular performance metrics are also presented.