Dr Chris Hummersone

Lecturer in Sound Recording, MSR Admissions Officer

Qualifications: BMus (Tonmeister), PhD (Surrey), MAES, MIEEE, FHEA

Email:
Phone: Work: 01483 68 6167
Room no: 05 BC 03

Further information

Biography

I graduated from the Tonmeister course in June 2007 and joined the IoSR as a research student in October 2007. I completed my thesis, entitled "A Psychoacoustic Engineering Approach to Machine Sound Source Separation in Reverberant Environments", in September 2010 and joined the IoSR as a lecturer in October 2010. In my spare time I enjoy playing the saxophone, cycling and running, having completed the London Marathon in 2007 and 2012.

Research Interests

My research interests include modelling the precedence effect and binaural localisation, audio quality in time–frequency processing, computational auditory scene analysis, and machine listening for the automated evaluation of audio quality.

Publications

Highlights

  • Hummersone C, Stokes T, Brookes T. (2014) On the Ideal Ratio Mask as the Goal of Computational Auditory Scene Analysis. in Naik GR, Wang W (eds.) Blind Source Separation: Advances in Theory, Algorithms and Applications Berlin/Heidelberg : Springer Article number 12 , pp. 349-368.

    Abstract

    The ideal binary mask (IBM) is widely considered to be the benchmark for time–frequency-based sound source separation techniques such as computational auditory scene analysis (CASA). However, it is well known that binary masking introduces objectionable distortion, especially musical noise. This can make binary masking unsuitable for sound source separation applications where the output is auditioned. It has been suggested that soft masking reduces musical noise and leads to a higher quality output. A previously defined soft mask, the ideal ratio mask (IRM), is found to have similar properties to the IBM, may correspond more closely to auditory processes, and offers additional computational advantages. Consequently, the IRM is proposed as the goal of CASA. To further support this position, a number of studies are reviewed that show soft masks to provide superior performance to the IBM in applications such as automatic speech recognition and speech intelligibility. A brief empirical study provides additional evidence demonstrating the objective and perceptual superiority of the IRM over the IBM.

Journal articles

  • Hummersone C, Mason RD, Brookes TS. (2013) 'A Comparison of Computational Precedence Models for Source Separation in Reverberant Environments'. Journal of the Audio Engineering Society, 61 (7/8 (July/August)), pp. 508-520.

    Abstract

    Reverberation is a problem for source separation algorithms. Because the precedence effect allows human listeners to suppress the perception of reflections arising from room boundaries, numerous computational models have incorporated the precedence effect. However, relatively little work has been done on using the precedence effect in source separation algorithms. This paper compares several precedence models and their influence on the performance of a baseline separation algorithm. The models were tested in a variety of reverberant rooms and with a range of mixing parameters. Although there was a large difference in performance among the models, the one that was based on interaural coherence and onset-based inhibition produced the greatest performance improvement. There is a trade-off between selecting reliable cues that correspond closely to free-field conditions and maximizing the proportion of the input signals that contributes to localization. For optimal source separation performance, it is necessary to adapt the dynamic component of the precedence model to the acoustic conditions of the room.

  • Baykaner K, Hummersone C, Mason R, Bech S. (2013) 'The computational prediction of masking thresholds for ecologically valid interference scenarios'. Proceedings of Meetings on Acoustics, 19

    Abstract

    Auditory interference scenarios, where a listener wishes to attend to some target audio while being presented with interfering audio, are prevalent in daily life. The goal of developing an accurate computational model which can predict masking thresholds for such scenarios is still incomplete. While some sophisticated, physiologically inspired, masking prediction models exist, they are rarely tested with ecologically valid programmes (such as music and speech). In order to test the accuracy of model predictions human listener data were required. To that end a masking threshold experiment was conducted for a variety of target and interferer programmes. The results were analysed alongside predictions made by the computational auditory signal processing and prediction model of (Jepsen et al. 2008). Masking thresholds were predicted to within 3.6 dB root mean squared error with the greatest prediction inaccuracies occurring in the presence of speech. These results are comparable to those in (Glasberg and Moore 2005) for predicting the audibility of time-varying sounds in the presence of background sounds, which otherwise represent the most accurate predictions of this type in the literature. © 2013 Acoustical Society of America.

  • Hummersone C, Mason R, Brookes T. (2011) 'Ideal Binary Mask Ratio: a novel metric for assessing binary-mask-based sound source separation algorithms'. IEEE Transactions on Audio, Speech and Language Processing, 19 (7), pp. 2039-2045.

    Abstract

    A number of metrics has been proposed in the literature to assess sound source separation algorithms. The addition of convolutional distortion raises further questions about the assessment of source separation algorithms in reverberant conditions as reverberation is shown to undermine the optimality of the ideal binary mask (IBM) in terms of signal-to-noise ratio (SNR). Furthermore, with a range of mixture parameters common across numerous acoustic conditions, SNR–based metrics demonstrate an inconsistency that can only be attributed to the convolutional distortion. This suggests the necessity for an alternate metric in the presence of convolutional distortion, such as reverberation. Consequently, a novel metric—dubbed the IBM ratio (IBMR)—is proposed for assessing source separation algorithms that aim to calculate the IBM. The metric is robust to many of the effects of convolutional distortion on the output of the system and may provide a more representative insight into the performance of a given algorithm.

  • Hummersone C, Mason R, Brookes T. (2010) 'Dynamic precedence effect modeling for source separation in reverberant environments'. IEEE Transactions on Audio, Speech and Language Processing, 18 (7), pp. 1867-1871.

    Abstract

    Reverberation continues to present a major problem for sound source separation algorithms. However, humans demonstrate a remarkable robustness to reverberation and many psychophysical and perceptual mechanisms are well documented. The precedence effect is one of these mechanisms; it aids our ability to localize sounds in reverberation. Despite this, relatively little work has been done on incorporating the precedence effect into automated source separation. Furthermore, no work has been carried out on adapting a precedence model to the acoustic conditions under test and it is unclear whether such adaptation, analogous to the perceptual Clifton effect, is even necessary. Hence, this study tests a previously proposed binaural separation/precedence model in real rooms with a range of reverberant conditions. The precedence model inhibitory time constant and inhibitory gain are varied in each room in order to establish the necessity for adaptation to the acoustic conditions. The paper concludes that adaptation is necessary and can yield significant gains in separation performance. Furthermore, it is shown that the initial time delay gap and the direct-to-reverberant ratio are important factors when considering this adaptation. © 2010 IEEE.

Conference papers

  • Wierstorf H, Ward D, Mason RD, Grais E, Hummersone C, Plumbley MD. (2017) 'Perceptual Evaluation of Source Separation for Remixing Music'. Audio Engineering Society 143rd AES Convention Paper No 9880, New York: 143rd AES Convention 2017

    Abstract

    Music remixing is difficult when the original multitrack recording is not available. One solution is to estimate the elements of a mixture using source separation. However, existing techniques suffer from imperfect separation and perceptible artifacts on single separated sources. To investigate their influence on a remix, five state-of-the-art source separation algorithms were used to remix six songs by increasing the level of the vocals. A listening test was conducted to assess the remixes in terms of loudness balance and sound quality. The results show that some source separation algorithms are able to increase the level of the vocals by up to 6 dB at the cost of introducing a small but perceptible degradation in sound quality.

  • Ward D, Wierstorf H, Mason RD, Plumbley MD, Hummersone C . (2017) 'Estimating the loudness balance of musical mixtures using audio source separation'. Proceedings of the 3rd Workshop on Intelligent Music Production (WIMP 2017), Salford, UK: 3rd Workshop on Intelligent Music Production, (WIMP 2017)

    Abstract

    To assist with the development of intelligent mixing systems, it would be useful to be able to extract the loudness balance of sources in an existing musical mixture. The relative-to-mix loudness level of four instrument groups was predicted using the sources extracted by 12 audio source separation algorithms. The predictions were compared with the ground truth loudness data of the original unmixed stems obtained from a recent dataset involving 100 mixed songs. It was found that the best source separation system could predict the relative loudness of each instrument group with an average root-mean-square error of 1.2 LU, with superior performance obtained on vocals.

  • Simpson A, Roma G, Grais E, Mason RD, Hummersone C, Plumbley MD. (2017) 'Psychophysical Evaluation of Audio Source Separation Methods'. Springer LNCS: Latent Variable Analysis and Signal Separation, Grenoble, France: 13th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA 2017) 10169, pp. 211-221.

    Abstract

    Source separation evaluation is typically a top-down process, starting with perceptual measures which capture fitness-for-purpose and followed by attempts to find physical (objective) measures that are predictive of the perceptual measures. In this paper, we take a contrasting bottom-up approach. We begin with the physical measures provided by the Blind Source Separation Evaluation Toolkit (BSS Eval) and we then look for corresponding perceptual correlates. This approach is known as psychophysics and has the distinct advantage of leading to interpretable, psychophysical models. We obtained perceptual similarity judgments from listeners in two experiments featuring vocal sources within musical mixtures. In the first experiment, listeners compared the overall quality of vocal signals estimated from musical mixtures using a range of competing source separation methods. In a loudness experiment, listeners compared the loudness balance of the competing musical accompaniment and vocal. Our preliminary results provide provisional validation of the psychophysical approach

  • Hermes K, Brookes TS, Hummersone C . (2016) 'The harmonic centroid as a predictor of string instrument timbral clarity'. Audio Engineering Society proceedings, Paris, France: 140th Convention of Audio Engineering Society

    Abstract

    Spectrum is an important factor in determining timbral clarity. An experiment where listeners rate the changes in timbral clarity resulting from spectral equalisation (EQ) can provide insight into the relationship between EQ and the clarity of string instruments. Overall, higher frequencies contribute to clarity more positively than lower ones, but the relationship is programme-item-dependent. Fundamental frequency and spectral slope both appear to be important. Change in harmonic centroid (or dimensionless spectral centroid) correlates well with change in clarity, more so than octave band boosted/cut, harmonic number boosted/cut, or other variations on the spectral centroid.

  • Simpson A, Roma G, Grais E, Mason RD, Hummersone C, Liutkus A, Plumbley MD. (2016) 'Evaluation of Audio Source Separation Models Using Hypothesis-Driven Non-Parametric Statistical Methods'. European Signal Processing Conference (EUSIPCO) 2016, Budapest: European Signal Processing Conference (EUSIPCO) 2016

    Abstract

    Audio source separation models are typically evaluated using objective separation quality measures, but rigorous statistical methods have yet to be applied to the problem of model comparison. As a result, it can be difficult to establish whether or not reliable progress is being made during the development of new models. In this paper, we provide a hypothesis-driven statistical analysis of the results of the recent source separation SiSEC challenge involving twelve competing models tested on separation of voice and accompaniment from fifty pieces of “professionally produced” contemporary music. Using nonparametric statistics, we establish reliable evidence for meaningful conclusions about the performance of the various models.

  • Hermes K, Brookes TS, Hummersone C. (2015) 'The influence of dumping bias on timbral clarity ratings'. Audio Engineering Society 139th International AES Convention papers, New York, USA: 139th International AES Convention

    Abstract

    When listening test subjects are required to rate changes in a single attribute, but also hear changes in other attributes, their ratings can become skewed by “dumping bias.” To assess the influence of dumping bias on timbral “clarity” ratings, listeners were asked to rate stimuli: (i) in terms of clarity only; and (ii) in terms of clarity, warmth, fullness, and brightness. Clarity ratings of type (i) showed (up to 20%) larger interquartile ranges than those of type (ii). It is concluded that in single-attribute timbral rating experiments, statistical noise—potentially resulting from dumping bias—can be reduced by allowing listeners to rate additional attributes either simultaneously or beforehand.

  • Stokes T, Hummersone C, Brookes T, Mason A. (2014) 'Perceptual quality of audio separated using sigmoidal masks'. 137th Audio Engineering Society Convention 2014, , pp. 167-173.

    Abstract

    Separation of underdetermined audio mixtures is often performed in the Time-Frequency (TF) domain by masking each TF element according to its target-to-mixture ratio. This work uses sigmoidal functions to map the target-to-mixture ratio to mask values. The series of functions used encompasses the ratio mask and an approximation of the binary mask. Mixtures are chosen to represent a range of different amounts of TF overlap, then separated and evaluated using objective measures. PEASS results show improved interferer suppression and artifact scores can be achieved using softer masking than that applied by binary or ratio masks. The improvement in these scores gives an improved overall perceptual score; this observation is repeated at multiple TF resolutions.

  • Baykaner K, Hummersone C, Mason RD, Bech S. (2014) 'The acceptability of speech with interfering radio programme material'. Audio Engineering Society Preprint, Berlin: 136th Audio Engineering Society Convention 9020

    Abstract

    A listening test was conducted to investigate the acceptability of audio-on-audio interference for radio pro- grammes featuring speech as the target. 21 subjects, including na ̈ıve and expert listeners, were presented with 200 randomly assigned pairs of stimuli and asked to report, for each trial, whether the listening scenario was acceptable or unacceptable. Stimuli pairs were set to randomly selected SNRs ranging from 0 to 45 dB. Results showed no significant di↵erence between subjects according to listening experience. A logistic re- gression to acceptability was carried out based on SNR. The model had accuracy R2 = 0.87, RMSE = 14%,and RMSE* = 7%. By accounting for the presence of background audio in the target programme, 90% of the variance could be explained.

  • Stokes T, Hummersone C, Brookes TS. (2013) 'Reducing Binary Masking Artefacts in Blind Audio Source Separation'. Audio Engineering Society Proceedings of the 134th Audio Engineering Society Convention, Rome, Italy: 134th Audio Engineering Society Convention paper 8853

    Abstract

    Binary masking is a common technique for separating target audio from an interferer. Its use is often justi ed by the high signal-to-noise ratio achieved. The mask can introduce musical noise artefacts, limiting its perceptual performance and that of techniques that use it. Three mask-processing techniques, involving adding noise or cepstral smoothing, are tested and the processed masks are compared to the ideal binary mask using the perceptual evaluation for audio source separation (PEASS) toolkit. Each processing technique's parameters are optimised before the comparison is made. Each technique is found to improve the overall perceptual score of the separation. Results show a trade-o between interferer suppression and artefact reduction.

  • Baykaner K, Hummersone C, Mason R, Bech S. (2013) 'The prediction of the acceptability of auditory interference based on audibility'. Proceedings of the AES International Conference, , pp. 162-168.

    Abstract

    In order to evaluate the ability of sound eld control methods to generate independent listening zones within domestic and automotive environments, it is useful to be able to predict, without listening tests, the accept- Ability of auditory interference scenarios. It was considered likely that a relationship would exist between masking thresholds and acceptability thresholds, thus a listening test was carried out to gather acceptability thresholds to compare with existing masking data collected under identical listening conditions. An analysis of the data revealed that a linear regression model could be used to predict acceptability thresholds, from only masking thresholds, with RMSE = 2.6 dB and R = 0.86. The same linear regression model was used to predict acceptability thresholds but with masking threshold predictions as the input. The results had RMSE = 4.2 dB and R = 0.88. Copyright © (2013) by the Audio Engineering Society.

  • Stokes T, Brookes TS, Hummersone C. (2012) 'Improving the Quality of Separated Audio: What Works?'. Salford UK: 1st Anniversary Celebration for the BBC Audio Research Partnership
  • Hummersone C, Mason R, Brookes T. (2010) 'A comparison of computational precedence models for source separation in reverberant environments'. Audio Engineering Society Audio Engineering Society Preprint, London, UK: 128th Audio Engineering Society Convention 7981
  • Zielinski S, Hardisty P, Hummersone C, Rumsey F. (2007) 'Potential biases in MUSHRA listening tests'. Audio Engineering Society Audio Engineering Society Preprint, New York: 123rd Audio Engineering Society Convention 7179

    Abstract

    The method described in the ITU-R BS.1534-1 standard, commonly known as MUSHRA (MUltiple Stimulus with Hidden Reference and Anchors), is widely used for the evaluation of systems exhibiting intermediate quality levels, in particular low-bit rate codecs. This paper demonstrates that this method, despite its popularity, is not immune to biases. In two different experiments designed to investigate potential biases in the MUSHRA test, systematic discrepancies in the results were observed with a magnitude up to 22%. The data indicates that these discrepancies could be attributed to the stimulus spacing and range equalizing biases.

Book chapters

  • Hummersone C, Stokes T, Brookes T. (2014) 'On the Ideal Ratio Mask as the Goal of Computational Auditory Scene Analysis'. in Naik GR, Wang W (eds.) Blind Source Separation: Advances in Theory, Algorithms and Applications Berlin/Heidelberg : Springer Article number 12 , pp. 349-368.

    Abstract

    The ideal binary mask (IBM) is widely considered to be the benchmark for time–frequency-based sound source separation techniques such as computational auditory scene analysis (CASA). However, it is well known that binary masking introduces objectionable distortion, especially musical noise. This can make binary masking unsuitable for sound source separation applications where the output is auditioned. It has been suggested that soft masking reduces musical noise and leads to a higher quality output. A previously defined soft mask, the ideal ratio mask (IRM), is found to have similar properties to the IBM, may correspond more closely to auditory processes, and offers additional computational advantages. Consequently, the IRM is proposed as the goal of CASA. To further support this position, a number of studies are reviewed that show soft masks to provide superior performance to the IBM in applications such as automatic speech recognition and speech intelligibility. A brief empirical study provides additional evidence demonstrating the objective and perceptual superiority of the IRM over the IBM.

Posters

  • Brookes T, Hummersone C. (2010) Machine Listening for Sound Quality Evaluation. Machine Listening Workshop 2010, Queen Mary University of London
  • Hummersone C, Mason R, Brookes T. (2010) A perceptually–inspired approach to machine sound source separation in real rooms. University of Surrey Postgraduate Research Conference

Theses and dissertations

  • Koya D. (2017) Predicting the overall spatial quality of automotive audio systems..
    [ Status: Approved ]

    Abstract

    The spatial quality of automotive audio systems is often compromised due to their unideal listening environments. Automotive audio systems need to be developed quickly due to industry demands. A suitable perceptual model could evaluate the spatial quality of automotive audio systems with similar reliability to formal listening tests but take less time. Such a model is developed in this research project by adapting an existing model of spatial quality for automotive audio use. The requirements for the adaptation were investigated in a literature review. A perceptual model called QESTRAL was reviewed, which predicts the overall spatial quality of domestic multichannel audio systems. It was determined that automotive audio systems are likely to be impaired in terms of the spatial attributes that were not considered in developing the QESTRAL model, but metrics are available that might predict these attributes. To establish whether the QESTRAL model in its current form can accurately predict the overall spatial quality of automotive audio systems, MUSHRA listening tests using headphone auralisation with head tracking were conducted to collect results to be compared against predictions by the model. Based on guideline criteria, the model in its current form could not accurately predict the overall spatial quality of automotive audio systems. To improve prediction performance, the QESTRAL model was recalibrated and modified using existing metrics of the model, those that were proposed from the literature review, and newly developed metrics. The most important metrics for predicting the overall spatial quality of automotive audio systems included those that were interaural cross-correlation (IACC) based, relate to localisation of the frontal audio scene, and account for the perceived scene width in front of the listener. Modifying the model for automotive audio systems did not invalidate its use for domestic audio systems. The resulting model predicts the overall spatial quality of 2- and 5-channel automotive audio systems with a cross-validation performance of R^2 = 0.85 and root-mean-square error (RMSE) = 11.03%.

  • Hermes K. (2017) Towards measuring music mix quality : the factors contributing to the spectral clarity of single sounds..
    [ Status: Approved ]

    Abstract

    Mixing music is the process of combining tracks of recorded audio to an overall piece. This is a complicated process and, hence, automatic mixing or metering tools would be useful. The aim of the current research project was to work towards measuring the perceived quality of music mixes by establishing predictors for one important perceptual attribute of high- quality mixes (spectral clarity). A review of academic and non-academic literature revealed that the high-level parameters that are responsible for determining the perceived quality of a music mix are ‘clarity and separation’, ‘balance’, ‘impact and interest’ and ‘freedom from technical faults’, alongside context-specific parameters. A further in-depth literature review established that clarity and separation—the chosen focus for this research—depend on spectral, spatial and intensity factors, and temporal changes in these factors. Spectral factors play an important role across all areas of literature consulted (namely timbral clarity, clarity in concert halls, masking, loudness, auditory scene analysis and speech intelligibility), and so the impact of mix EQ on spectral clarity was investigated in a series of experiments. These experiments determined that two important factors contribute to the spectral clarity of single sounds. These are the harmonic centroid (spectral centroid divided by the sound’s average fundamental frequency) and mid-range spectral peakiness (related to sharp peaks in the frequency spectrum). For sounds modified by simple spectral filtering, these two factors are sufficient to model clarity changes with a Spearman correlation ranging from 0.631 (bass and vocal stimuli) to 0.848 (string stimuli). For sounds in a mix, however, other factors become important. Adding a peak audibility measure proved useful. This measure determined whether the audibility of peaks in the spectra of the target sounds was increased or decreased through EQ. Target and overall mix harmonic centroids and mid- range spectral peakiness, combined with peak audibility, correlated positively with target spectral clarity (r=0.568). Findings could contribute to the development of marketable products such as a piece of software able to judge the overall sound quality of a mix, automatic mixers or sonically improved music production software. Further work will allow a more comprehensive and generalizable model to be developed.

Teaching

My teaching duties include:

  • Year 1 Acoustics and Computer Audio Systems
  • Year 3 Audio Programming
  • Year 3 Technical Project (Audio Research Seminars)

I have previously taught:

  • Year 1 Technical Ear Training
  • Year 2/3 Video Engineering

Book a tutorial (IoSR members only)

Departmental Duties

I am the Admissions Officer for the Tonmeister programme.

Downloads

Some of the software I have written, and other digital resources, are available from the IoSR Software webpage. My Mathworks profile contains a number of Matlab functions related to acoustics, signal processing, plotting, and statistics.

Page Owner: ch0022
Page Created: Monday 21 November 2016 09:33:40 by rxserver
Last Modified: Monday 21 November 2016 10:36:52 by pj0010
Assembly date: Sat Feb 24 00:53:11 GMT 2018
Content ID: 168164
Revision: 1
Community: 1201

Rhythmyx folder: //Sites/surrey.ac.uk/DMM/People
Content type: rx:StaffProfile