Dr Chris Hummersone

Lecturer in Sound Recording, MSR Admissions Officer

Qualifications: BMus (Tonmeister), PhD (Surrey), MAES, MIEEE, FHEA

Email:
Phone: Work: 01483 68 6167
Room no: 05 BC 03

Further information

Biography

I graduated from the Tonmeister course in June 2007 and joined the IoSR as a research student in October 2007. I completed my thesis, entitled "A Psychoacoustic Engineering Approach to Machine Sound Source Separation in Reverberant Environments", in September 2010 and joined the IoSR as a lecturer in October 2010. In my spare time I enjoy playing the saxophone, cycling and running, having completed the London Marathon in 2007 and 2012.

Research Interests

My research interests include modelling the precedence effect and binaural localisation, audio quality in time–frequency processing, computational auditory scene analysis, and machine listening for the automated evaluation of audio quality.

Publications

Highlights

  • Hummersone C, Stokes T, Brookes T. (2014) On the Ideal Ratio Mask as the Goal of Computational Auditory Scene Analysis. in Naik GR, Wang W (eds.) Blind Source Separation: Advances in Theory, Algorithms and Applications Berlin/Heidelberg : Springer Article number 12 , pp. 349-368.

    Abstract

    The ideal binary mask (IBM) is widely considered to be the benchmark for time–frequency-based sound source separation techniques such as computational auditory scene analysis (CASA). However, it is well known that binary masking introduces objectionable distortion, especially musical noise. This can make binary masking unsuitable for sound source separation applications where the output is auditioned. It has been suggested that soft masking reduces musical noise and leads to a higher quality output. A previously defined soft mask, the ideal ratio mask (IRM), is found to have similar properties to the IBM, may correspond more closely to auditory processes, and offers additional computational advantages. Consequently, the IRM is proposed as the goal of CASA. To further support this position, a number of studies are reviewed that show soft masks to provide superior performance to the IBM in applications such as automatic speech recognition and speech intelligibility. A brief empirical study provides additional evidence demonstrating the objective and perceptual superiority of the IRM over the IBM.

Journal articles

  • Hummersone C, Mason RD, Brookes TS. (2013) 'A Comparison of Computational Precedence Models for Source Separation in Reverberant Environments'. Journal of the Audio Engineering Society, 61 (7/8 (July/August)), pp. 508-520.

    Abstract

    Reverberation is a problem for source separation algorithms. Because the precedence effect allows human listeners to suppress the perception of reflections arising from room boundaries, numerous computational models have incorporated the precedence effect. However, relatively little work has been done on using the precedence effect in source separation algorithms. This paper compares several precedence models and their influence on the performance of a baseline separation algorithm. The models were tested in a variety of reverberant rooms and with a range of mixing parameters. Although there was a large difference in performance among the models, the one that was based on interaural coherence and onset-based inhibition produced the greatest performance improvement. There is a trade-off between selecting reliable cues that correspond closely to free-field conditions and maximizing the proportion of the input signals that contributes to localization. For optimal source separation performance, it is necessary to adapt the dynamic component of the precedence model to the acoustic conditions of the room.

  • Baykaner K, Hummersone C, Mason R, Bech S. (2013) 'The computational prediction of masking thresholds for ecologically valid interference scenarios'. Proceedings of Meetings on Acoustics, 19

    Abstract

    Auditory interference scenarios, where a listener wishes to attend to some target audio while being presented with interfering audio, are prevalent in daily life. The goal of developing an accurate computational model which can predict masking thresholds for such scenarios is still incomplete. While some sophisticated, physiologically inspired, masking prediction models exist, they are rarely tested with ecologically valid programmes (such as music and speech). In order to test the accuracy of model predictions human listener data were required. To that end a masking threshold experiment was conducted for a variety of target and interferer programmes. The results were analysed alongside predictions made by the computational auditory signal processing and prediction model of (Jepsen et al. 2008). Masking thresholds were predicted to within 3.6 dB root mean squared error with the greatest prediction inaccuracies occurring in the presence of speech. These results are comparable to those in (Glasberg and Moore 2005) for predicting the audibility of time-varying sounds in the presence of background sounds, which otherwise represent the most accurate predictions of this type in the literature. © 2013 Acoustical Society of America.

  • Hummersone C, Mason R, Brookes T. (2011) 'Ideal Binary Mask Ratio: a novel metric for assessing binary-mask-based sound source separation algorithms'. IEEE Transactions on Audio, Speech and Language Processing, 19 (7), pp. 2039-2045.

    Abstract

    A number of metrics has been proposed in the literature to assess sound source separation algorithms. The addition of convolutional distortion raises further questions about the assessment of source separation algorithms in reverberant conditions as reverberation is shown to undermine the optimality of the ideal binary mask (IBM) in terms of signal-to-noise ratio (SNR). Furthermore, with a range of mixture parameters common across numerous acoustic conditions, SNR–based metrics demonstrate an inconsistency that can only be attributed to the convolutional distortion. This suggests the necessity for an alternate metric in the presence of convolutional distortion, such as reverberation. Consequently, a novel metric—dubbed the IBM ratio (IBMR)—is proposed for assessing source separation algorithms that aim to calculate the IBM. The metric is robust to many of the effects of convolutional distortion on the output of the system and may provide a more representative insight into the performance of a given algorithm.

  • Hummersone C, Mason R, Brookes T. (2010) 'Dynamic precedence effect modeling for source separation in reverberant environments'. IEEE Transactions on Audio, Speech and Language Processing, 18 (7), pp. 1867-1871.

    Abstract

    Reverberation continues to present a major problem for sound source separation algorithms. However, humans demonstrate a remarkable robustness to reverberation and many psychophysical and perceptual mechanisms are well documented. The precedence effect is one of these mechanisms; it aids our ability to localize sounds in reverberation. Despite this, relatively little work has been done on incorporating the precedence effect into automated source separation. Furthermore, no work has been carried out on adapting a precedence model to the acoustic conditions under test and it is unclear whether such adaptation, analogous to the perceptual Clifton effect, is even necessary. Hence, this study tests a previously proposed binaural separation/precedence model in real rooms with a range of reverberant conditions. The precedence model inhibitory time constant and inhibitory gain are varied in each room in order to establish the necessity for adaptation to the acoustic conditions. The paper concludes that adaptation is necessary and can yield significant gains in separation performance. Furthermore, it is shown that the initial time delay gap and the direct-to-reverberant ratio are important factors when considering this adaptation. © 2010 IEEE.

Conference papers

  • Simpson AJR, Roma G, Grais EM, Mason R, Hummersone C, Plumbley MD. (2017) 'Psychophysical Evaluation of Audio Source Separation Methods'. Springer LNCS, Grenoble, France: 13th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA 2017)
    [ Status: Accepted ]

    Abstract

    Source separation evaluation is typically a top-down process, starting with perceptual measures which capture fitness-for-purpose and followed by attempts to find physical (objective) measures that are predictive of the perceptual measures. In this paper, we take a contrasting bottom-up approach. We begin with the physical measures provided by the Blind Source Separation Evaluation Toolkit (BSS Eval) and we then look for corresponding perceptual correlates. This approach is known as psychophysics and has the distinct advantage of leading to interpretable, psychophysical models. We obtained perceptual similarity judgments from listeners in two experiments featuring vocal sources within musical mixtures. In the first experiment, listeners compared the overall quality of vocal signals estimated from musical mixtures using a range of competing source separation methods. In a loudness experiment, listeners compared the loudness balance of the competing musical accompaniment and vocal. Our preliminary results provide provisional validation of the psychophysical approach

  • Hermes K, Brookes TS, Hummersone C. (2016) 'The harmonic centroid as a predictor of string instrument timbral clarity'. Audio Engineering Society Audio Engineering Society proceedings, Paris, France: 140th Convention of Audio Engineering Society

    Abstract

    Spectrum is an important factor in determining timbral clarity. An experiment where listeners rate the changes in timbral clarity resulting from spectral equalisation (EQ) can provide insight into the relationship between EQ and the clarity of string instruments. Overall, higher frequencies contribute to clarity more positively than lower ones, but the relationship is programme-item-dependent. Fundamental frequency and spectral slope both appear to be important. Change in harmonic centroid (or dimensionless spectral centroid) correlates well with change in clarity, more so than octave band boosted/cut, harmonic number boosted/cut, or other variations on the spectral centroid.

  • Simpson AJR, Roma G, Grais EM, Mason RD, Hummersone C, Liutkus A, Plumbley MD. (2016) 'Evaluation of Audio Source Separation Models Using Hypothesis-Driven Non-Parametric Statistical Methods'. Budapest: European Signal Processing Conference (EUSIPCO) 2016
    [ Status: Accepted ]

    Abstract

    Audio source separation models are typically evaluated using objective separation quality measures, but rigorous statistical methods have yet to be applied to the problem of model comparison. As a result, it can be difficult to establish whether or not reliable progress is being made during the development of new models. In this paper, we provide a hypothesis-driven statistical analysis of the results of the recent source separation SiSEC challenge involving twelve competing models tested on separation of voice and accompaniment from fifty pieces of “professionally produced” contemporary music. Using nonparametric statistics, we establish reliable evidence for meaningful conclusions about the performance of the various models.

  • Hermes K, Brookes TS, Hummersone C. (2015) 'The influence of dumping bias on timbral clarity ratings'. Audio Engineering Society 139th International AES Convention papers, New York, USA: 139th International AES Convention

    Abstract

    When listening test subjects are required to rate changes in a single attribute, but also hear changes in other attributes, their ratings can become skewed by “dumping bias.” To assess the influence of dumping bias on timbral “clarity” ratings, listeners were asked to rate stimuli: (i) in terms of clarity only; and (ii) in terms of clarity, warmth, fullness, and brightness. Clarity ratings of type (i) showed (up to 20%) larger interquartile ranges than those of type (ii). It is concluded that in single-attribute timbral rating experiments, statistical noise—potentially resulting from dumping bias—can be reduced by allowing listeners to rate additional attributes either simultaneously or beforehand.

  • Stokes T, Hummersone C, Brookes T, Mason A. (2014) 'Perceptual quality of audio separated using sigmoidal masks'. 137th Audio Engineering Society Convention 2014, , pp. 167-173.

    Abstract

    Separation of underdetermined audio mixtures is often performed in the Time-Frequency (TF) domain by masking each TF element according to its target-to-mixture ratio. This work uses sigmoidal functions to map the target-to-mixture ratio to mask values. The series of functions used encompasses the ratio mask and an approximation of the binary mask. Mixtures are chosen to represent a range of different amounts of TF overlap, then separated and evaluated using objective measures. PEASS results show improved interferer suppression and artifact scores can be achieved using softer masking than that applied by binary or ratio masks. The improvement in these scores gives an improved overall perceptual score; this observation is repeated at multiple TF resolutions.

  • Baykaner K, Hummersone C, Mason RD, Bech S. (2014) 'The acceptability of speech with interfering radio programme material'. Audio Engineering Society Preprint, Berlin: 136th Audio Engineering Society Convention 9020

    Abstract

    A listening test was conducted to investigate the acceptability of audio-on-audio interference for radio pro- grammes featuring speech as the target. 21 subjects, including na ̈ıve and expert listeners, were presented with 200 randomly assigned pairs of stimuli and asked to report, for each trial, whether the listening scenario was acceptable or unacceptable. Stimuli pairs were set to randomly selected SNRs ranging from 0 to 45 dB. Results showed no significant di↵erence between subjects according to listening experience. A logistic re- gression to acceptability was carried out based on SNR. The model had accuracy R2 = 0.87, RMSE = 14%,and RMSE* = 7%. By accounting for the presence of background audio in the target programme, 90% of the variance could be explained.

  • Baykaner K, Hummersone C, Mason RD, Bech S. (2013) 'Selection of temporal windows for the computational prediction of masking thresholds'. Vancouver: IEEE International Conference on Acoustics, Speech, and Signal Processing

    Abstract

    In the field of auditory masking threshold predictions an op- timal method for buffering a continuous, ecologically valid programme combination into discrete temporal windows has yet to be determined. An investigation was carried out into the use of a variety of temporal window durations, shapes, and steps, in order to discern the resultant effect upon the accu- racy of various masking threshold prediction models. Selec- tion of inappropriate temporal windows can triple the predic- tion error in some cases. Overlapping windows were found to produce the lowest errors provided that the predictions were smoothed appropriately. The optimal window shape varied across the tested models. The most accurate variant of each model resulted in root mean squared errors of 2.3, 3.4, and 4.2 dB.

  • Stokes T, Hummersone C, Brookes TS. (2013) 'Reducing Binary Masking Artefacts in Blind Audio Source Separation'. Audio Engineering Society Proceedings of the 134th Audio Engineering Society Convention, Rome, Italy: 134th Audio Engineering Society Convention paper 8853

    Abstract

    Binary masking is a common technique for separating target audio from an interferer. Its use is often justi ed by the high signal-to-noise ratio achieved. The mask can introduce musical noise artefacts, limiting its perceptual performance and that of techniques that use it. Three mask-processing techniques, involving adding noise or cepstral smoothing, are tested and the processed masks are compared to the ideal binary mask using the perceptual evaluation for audio source separation (PEASS) toolkit. Each processing technique's parameters are optimised before the comparison is made. Each technique is found to improve the overall perceptual score of the separation. Results show a trade-o between interferer suppression and artefact reduction.

  • Baykaner K, Hummersone C, Mason R, Bech S. (2013) 'The prediction of the acceptability of auditory interference based on audibility'. Proceedings of the AES International Conference, , pp. 162-168.

    Abstract

    In order to evaluate the ability of sound eld control methods to generate independent listening zones within domestic and automotive environments, it is useful to be able to predict, without listening tests, the accept- Ability of auditory interference scenarios. It was considered likely that a relationship would exist between masking thresholds and acceptability thresholds, thus a listening test was carried out to gather acceptability thresholds to compare with existing masking data collected under identical listening conditions. An analysis of the data revealed that a linear regression model could be used to predict acceptability thresholds, from only masking thresholds, with RMSE = 2.6 dB and R = 0.86. The same linear regression model was used to predict acceptability thresholds but with masking threshold predictions as the input. The results had RMSE = 4.2 dB and R = 0.88. Copyright © (2013) by the Audio Engineering Society.

  • Stokes T, Brookes TS, Hummersone C. (2012) 'Improving the Quality of Separated Audio: What Works?'. Salford UK: 1st Anniversary Celebration for the BBC Audio Research Partnership
  • Hummersone C, Mason R, Brookes T. (2010) 'A comparison of computational precedence models for source separation in reverberant environments'. Audio Engineering Society Audio Engineering Society Preprint, London, UK: 128th Audio Engineering Society Convention 7981
  • Zielinski S, Hardisty P, Hummersone C, Rumsey F. (2007) 'Potential biases in MUSHRA listening tests'. Audio Engineering Society Audio Engineering Society Preprint, New York: 123rd Audio Engineering Society Convention 7179

    Abstract

    The method described in the ITU-R BS.1534-1 standard, commonly known as MUSHRA (MUltiple Stimulus with Hidden Reference and Anchors), is widely used for the evaluation of systems exhibiting intermediate quality levels, in particular low-bit rate codecs. This paper demonstrates that this method, despite its popularity, is not immune to biases. In two different experiments designed to investigate potential biases in the MUSHRA test, systematic discrepancies in the results were observed with a magnitude up to 22%. The data indicates that these discrepancies could be attributed to the stimulus spacing and range equalizing biases.

Book chapters

  • Hummersone C, Stokes T, Brookes T. (2014) 'On the Ideal Ratio Mask as the Goal of Computational Auditory Scene Analysis'. in Naik GR, Wang W (eds.) Blind Source Separation: Advances in Theory, Algorithms and Applications Berlin/Heidelberg : Springer Article number 12 , pp. 349-368.

    Abstract

    The ideal binary mask (IBM) is widely considered to be the benchmark for time–frequency-based sound source separation techniques such as computational auditory scene analysis (CASA). However, it is well known that binary masking introduces objectionable distortion, especially musical noise. This can make binary masking unsuitable for sound source separation applications where the output is auditioned. It has been suggested that soft masking reduces musical noise and leads to a higher quality output. A previously defined soft mask, the ideal ratio mask (IRM), is found to have similar properties to the IBM, may correspond more closely to auditory processes, and offers additional computational advantages. Consequently, the IRM is proposed as the goal of CASA. To further support this position, a number of studies are reviewed that show soft masks to provide superior performance to the IBM in applications such as automatic speech recognition and speech intelligibility. A brief empirical study provides additional evidence demonstrating the objective and perceptual superiority of the IRM over the IBM.

Posters

  • Brookes T, Hummersone C. (2010) Machine Listening for Sound Quality Evaluation. Machine Listening Workshop 2010, Queen Mary University of London
  • Hummersone C, Mason R, Brookes T. (2010) A perceptually–inspired approach to machine sound source separation in real rooms. University of Surrey Postgraduate Research Conference

Theses and dissertations

  • Hummersone C. (2011) A Psychoacoustic Engineering Approach to Machine Sound Source Separation in Reverberant Environments.

    Abstract

    Reverberation continues to present a major problem for sound source separation algorithms, due to its corruption of many of the acoustical cues on which these algorithms rely. However, humans demonstrate a remarkable robustness to reverberation and many psychophysical and perceptual mechanisms are well documented. This thesis therefore considers the research question: can the reverberation–performance of existing psychoacoustic engineering approaches to machine source separation be improved? The precedence effect is a perceptual mechanism that aids our ability to localise sounds in reverberant environments. Despite this, relatively little work has been done on incorporating the precedence effect into automated sound source separation. Consequently, a study was conducted that compared several computational precedence models and their impact on the performance of a baseline separation algorithm. The algorithm included a precedence model, which was replaced with the other precedence models during the investigation. The models were tested using a novel metric in a range of reverberant rooms and with a range of other mixture parameters. The metric, termed Ideal Binary Mask Ratio, is shown to be robust to the effects of reverberation and facilitates meaningful and direct comparison between algorithms across different acoustic conditions. Large differences between the performances of the models were observed. The results showed that a separation algorithm incorporating a model based on interaural coherence produces the greatest performance gain over the baseline algorithm. The results from the study also indicated that it may be necessary to adapt the precedence model to the acoustic conditions in which the model is utilised. This effect is analogous to the perceptual Clifton effect, which is a dynamic component of the precedence effect that appears to adapt precedence to a given acoustic environment in order to maximise its effectiveness. However, no work has been carried out on adapting a precedence model to the acoustic conditions under test. Specifically, although the necessity for such a component has been suggested in the literature, neither its necessity nor benefit has been formally validated. Consequently, a further study was conducted in which parameters of each of the previously compared precedence models were varied in each room in order to identify if, and to what extent, the separation performance varied with these parameters. The results showed tha

Teaching

My teaching duties include:

  • Year 1 Acoustics and Computer Audio Systems
  • Year 3 Audio Programming
  • Year 3 Technical Project (Audio Research Seminars)

I have previously taught:

  • Year 1 Technical Ear Training
  • Year 2/3 Video Engineering

Book a tutorial (IoSR members only)

Departmental Duties

I am the Admissions Officer for the Tonmeister programme.

Downloads

Some of the software I have written, and other digital resources, are available from the IoSR Software webpage. My Mathworks profile contains a number of Matlab functions related to acoustics, signal processing, plotting, and statistics.

Page Owner: ch0022
Page Created: Monday 21 November 2016 09:33:40 by rxserver
Last Modified: Monday 21 November 2016 10:36:52 by pj0010
Assembly date: Tue Mar 28 09:35:33 BST 2017
Content ID: 168164
Revision: 1
Community: 1201

Rhythmyx folder: //Sites/surrey.ac.uk/DMM/People
Content type: rx:StaffProfile