Dr Russell Mason

Senior Lecturer (IoSR), Tonmeister Programme Director

Qualifications: BMus(Tonmeister), PhD (Surrey)

Email:
Phone: Work: 01483 68 6535
Room no: 08A BC 03

Further information

Biography

Dr Russell Mason is a Senior Lecturer in the Institute of Sound Recording (IoSR) at the University of Surrey. He is Programme Director for the Tonmeister programme, teaches Audio Engineering to students in all years of this programme, and conducts research into aspects of perception and measurement of audio.

Russell graduated from the Tonmeister Music and Sound Recording degree course at Surrey in 1998, and was offered the opportunity to continue his studies with a PhD jointly in the IoSR and the Department of Psychology, and sponsored by BBC Research and Development. This project, in collaboration with Bang & Olufsen, Genelec and Nokia among others, examined the perception of spatial impression detail, including creating new methods for synthesising spatial sound and a binaural hearing model capable of predicting the perceived effect. Following his PhD, Russell gained funding from the Engineering and Physical Sciences Research Council (EPSRC) to continue the work, which resulted in an advanced binaural hearing model that can successfully predict the perceived width and location of sound sources in the horizontal plane, and the perceived width of acoustical environments.

In 2004, Russell gained a lectureship in the IoSR, and in 2009 was promoted to Senior Lecturer. During this time, Russell has taught a wide range of modules, including Electronics Practicals, Audio Laboratory, Audio Research Seminars, Computer Audio Systems, Technical Projects, and Audio Engineering 1, 2 & 3. He has also taught Master's level Speech and Audio Processing for the Department of Electrical and Electronic Engineering.

Russell has also undertaken a number of critical administrative roles for the Tonmeister programme. This includes serving as Admissions Tutor for 7 years, during which he revised the admission procedure, including the creation of an online interview booking system and the creation of new entrance tests. He is currently Programme Director for the Tonmeister programme, which includes responsibility for the direction of the programme, managing changes and updates to the programme, managing the IoSR budgets, overseeing upgrades and management of facilities, and mentoring junior staff.

Russell has continued his research into psychoacoustic engineering; specifically the perception and measurement of audio. This research involved the development of novel subjective testing methods and stimulus synthesis techniques, as well as developing new computational algorithms to mimic human auditory perception and furthering understanding of the human auditory system. Recent projects have investigated the role of head movement in spatial perception, optimal design of microphone arrays for localisation around the horizontal plane, computational auditory scene analysis based on psychoacoustic cues, and perception and modelling of spectral magnitude distortions. He is currently working on further development of measurement techniques of spatial attributes of sound, such as integration of results across frequency and perception of elevation cues. He is also applying the techniques he has developed to the fields of auditory masking and modelling of quality in the presence of interfering signals.

So far, Russell has been involved in funding bids that have brought nearly £900,000 to Surrey (approximately half of this has been as principal investigator) from a range of sources including EPSRC and industry. This has included collaboration with a wide range of industrial partners such as Bang & Olufsen, Adrian James Acoustics, Harman-Becker Automotive, Wolfson Microelectronics, and BBC Research and Development.

Outside the University, Russell is an active member of the Audio Engineering Society. He was a member of the committee of the British section for many years including the creation of the section website. He has also contributed towards the organisation of a number of British section conferences, and many International conferences in the UK and abroad. He acted as papers co-chair for the Audio Metadata conference and has chaired many paper sessions and given invited presentations to conferences in Japan, Australia, France, Germany and the USA.

Russell also has a strong musical background, having performed music from the age of 4 and studying trumpet, piano and music theory for many years in both jazz and classical styles. In his spare time, Russell is an active musician, playing trumpet in the department Trad Band and in the department Big Band that he co-founded as a final year undergraduate. He also appreciates a good curry.

Research Collaborations

Current research projects:

  • Perceptually optimised sound zones
  • Perception of loudspeaker directivity
  • Perception and measurement of auditory elevation cues
  • Adaptation of spatial audio quality measures to the automotive environment

Past research projects:

  • The role of head movement in the analysis of spatial impression - Funded by EPSRC (EP/D049253/1)
  • Automated separation of sound sources in reverberant environments using spatial cues - Funded by EPSRC (EP/P503892/1)
  • Perceptual sound field reconstruction and coherent emulation - Funded by EPSRC (EP/E064507/1)
  • Perceptually motivated measurement of spatial sound attributes for audio-based information systems - Funded by EPSRC (GR/R55528/01)
  • Eureka 1653 Multichannel enhancement of domestic user stereo applications (MEDUSA) - in collaboration with Bang & Olufsen, Genelec, Nokia and the University of Lulea-Pitea

Publications

Journal articles

  • Francombe J, Brookes T, Mason R. (2017) 'Evaluation of Spatial Audio Reproduction Methods (Part 1): Elicitation of Perceptual Differences'. Journal of the Audio Engineering Society, 65 (3), pp. 198-211.

    Abstract

    There are a wide variety of spatial audio reproduction systems available, from a single loudspeaker to many spatially distributed loudspeakers. An important factor in the selection, development, or optimization of such systems is listener preference, and the important perceptual characteristics that contribute to this. An experiment was performed to determine the attributes that contribute to listener preference for a range of spatial audio reproduction methods. Experienced and inexperienced listeners made preference ratings for combinations of seven program items replayed over eight reproduction systems, and reported the reasons for their judgments. Automatic text clustering reduced redundancy in the responses by approximately 90%, facilitating subsequent group discussions that produced clear attribute labels, descriptions, and scale end-points. Twenty-seven and twenty-four attributes contributed to preference for the experienced and inexperienced listeners respectively. The two sets of attributes contain a degree of overlap (ten attributes from the two sets were closely related); however, the experienced listeners used more technical terms whilst the inexperienced listeners used more broad descriptive categories.

  • Francombe J, Brookes T, Mason R, Woodcock J. (2017) 'Evaluation of Spatial Audio Reproduction Methods (Part 2): Analysis of Listener Preference'. Journal of the Audio Engineering Society, 65 (3), pp. 212-225.

    Abstract

    It is desirable to determine which of the many different spatial audio reproduction systems listeners prefer, and the perceptual attributes that are most important to listener experience, so that future systems can be perceptually optimized. A paired comparison preference rating experiment was performed alongside a free elicitation task for eight reproduction methods (consumer and professional systems with a wide range of expected quality) and seven program items (representative of potential broadcast material). The experiment was performed by groups of experienced and inexperienced listeners. Thurstone Case V modeling was used to produce preference scales. Both listener groups preferred systems with increased spatial content; nineand five-channel systems were most preferred. The use of elicited attributes was analyzed alongside the preference ratings, resulting in an approximate hierarchy of attribute importance: three attributes (amount of distortion, output quality, and bandwidth) were found to be important for differentiating systems where there was a large preference difference; sixteen were always important (most notably enveloping and horizontal width); and seven were used alongside small preference differences.

  • Simon-Galvez MF, Menzies D, Mason RD, Fazi FM. (2016) 'Object-Based Audio Reproduction using a Listener-Position Adaptive Stereo System'. Journal of the Audio Engineering Society, 64 (10), pp. 740-751.

    Abstract

    This work introduces a listener-position adaptive stereo reproduction system that allows for the reproduction of 2D object-based audio and for a more accurate localisation when the listener is located outside the sweet spot. The adaptation is composed of two parts; a compensation system that updates the loudspeakers feeds so that the loudspeaker input signals are delivered to the listener with the same magnitude and phase independently of the listening position, as it would occur in a symmetric listening configuration, and an object-based rendering system using conventional panning algorithms. Robustness simulations show that an accurate localisation is possible when the audio objects are panned in the angle seen between the listener and the two loudspeakers. This has been further assessed by objective and subjective localisation experiments.

  • Pearce A, Brookes T, Dewhirst M, Mason R. (2016) 'Eliciting the most prominent perceived differences between microphones'. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 139 (5), pp. 2970-2981.
  • Francombe J, Mason R, Dewhirst M, Bech S. (2015) 'A model of distraction in an audio-on-audio interference situation with music program material'. Journal of the Audio Engineering Society, 63 (1-2), pp. 63-77.

    Abstract

    Audio-on-audio interference situations are a common occurrence in everyday life; they may be naturally occurring or be a side-effect of a non-ideal personal sound zone system. In order to evaluate and optimize such situations in a perceptually relevant manner, it is desirable to develop a model of listener experience. Distraction ratings were collected for 100 randomly created audio-on-audio interference situations with music target and interferer programs. A large set of features was also extracted from the audio; the feature extraction was motivated by a qualitative analysis of subject responses. An iterative linear regression procedure was used to develop a predictive model. The selected features were related to the overall loudness, loudness ratio, perceptual evaluation of audio source separation (PEASS) toolbox interference-related perceptual score, and frequency content of the interferer. The model was found to predict accurately for the training and validation data sets (RMSE of approximately 10%), with the exception of a small number of outlying stimuli.

  • Baykaner K, Coleman P, Mason R, Jackson PJB, Francombe J, Olik M, Bech S. (2015) 'The relationship between target quality and interference in sound zones'. Journal of the Audio Engineering Society, 63 (1/2), pp. 78-89.

    Abstract

    Sound zone systems aim to produce regions within a room where listeners may consume separate audio programs with minimal acoustical interference. Often, there is a trade-off between the acoustic contrast achieved between the zones, and the fidelity of the reproduced audio program (the target quality). An open question is whether reducing contrast (i.e. allowing greater interference) can improve target quality. The planarity control sound zoning method can be used to improve spatial reproduction, though at the expense of decreased contrast. Hence, this can be used to investigate the relationship between target quality (which is affected by the spatial presentation) and distraction (which is related to the perceived effect of interference). An experiment was conducted investigating target quality and distraction, and examining their relationship with overall quality within sound zones. Sound zones were reproduced using acoustic contrast control, planarity control and pressure matching applied to a circular loudspeaker array. Overall quality was related to target quality and distraction, each having a similar magnitude of effect; however, the result was dependent upon program combination. The highest mean overall quality was a compromise between distraction and target quality, with energy arriving from up to 15 degrees either side of the target direction.

  • Francombe J, Mason R, Dewhirst M, Bech S. (2014) 'Elicitation of attributes for the evaluation of audio-on-audio interference.'. Journal of the Acoustical Society of America, United States: 136 (5), pp. 2630-2641.

    Abstract

    An experiment to determine the perceptual attributes of the experience of listening to a target audio program in the presence of an audio interferer was performed. The first stage was a free elicitation task in which a total of 572 phrases were produced. In the second stage, a consensus vocabulary procedure was used to reduce these phrases into a comprehensive set of attributes. Groups of experienced and inexperienced listeners determined nine and eight attributes, respectively. These attribute sets were combined by the listeners to produce a final set of 12 attributes: masking, calming, distraction, separation, confusion, annoyance, environment, chaotic, balance and blend, imagery, response to stimuli over time, and short-term response to stimuli. In the third stage, a simplified ranking procedure was used to select only the most useful and relevant attributes. Four attributes were selected: distraction, annoyance, balance and blend, and confusion. Ratings using these attributes were collected in the fourth stage, and a principal component analysis performed. This suggested two dimensions underlying the perception of an audio-on-audio interference situation: The first dimension was labeled "distraction" and accounted for 89% of the variance; the second dimension, accounting for 10% of the variance, was labeled "balance and blend."

  • Ashby T, Brookes T, Mason R. (2014) 'Towards a head-movement-aware spatial localisation model: Elevation'. 21st International Congress on Sound and Vibration 2014, ICSV 2014, 4, pp. 2808-2815.

    Abstract

    Copyright © (2014) by the International Institute of Acoustics & Vibration All rights reserved.A multiple-microphone-sphere-based localisation model has been developed that predicts source location by modelling the cues given by head movement. In order to inform improvements to this model, a series of experiments was devised to investigate the impact of head movement cues on the localisation response accuracy of human listeners. It was shown that head movements improve elevation localisation response accuracy for noise sources. When pinna cues are impaired the significance of head movement cues increases. The improved localisation resulting from head movement is due to dynamic cues available during the period of movement, and not to improved static cues available once the head is turned to face the sound source. Head movements improve elevation localisation to a similar degree for band- limited sources with differing centre frequencies (500 Hz, 2 kHz and 6 kHz), which indicates that both dynamic ILDs and dynamic ITDs are used. Head movements do not improve elevation response accuracy for programme items with less than an octave bandwidth. Head movements improve elevation response accuracy to a greater degree for sources further away from the equatorial plane.

  • Hummersone C, Mason RD, Brookes TS. (2013) 'A Comparison of Computational Precedence Models for Source Separation in Reverberant Environments'. Journal of the Audio Engineering Society, 61 (7/8 (July/August)), pp. 508-520.

    Abstract

    Reverberation is a problem for source separation algorithms. Because the precedence effect allows human listeners to suppress the perception of reflections arising from room boundaries, numerous computational models have incorporated the precedence effect. However, relatively little work has been done on using the precedence effect in source separation algorithms. This paper compares several precedence models and their influence on the performance of a baseline separation algorithm. The models were tested in a variety of reverberant rooms and with a range of mixing parameters. Although there was a large difference in performance among the models, the one that was based on interaural coherence and onset-based inhibition produced the greatest performance improvement. There is a trade-off between selecting reliable cues that correspond closely to free-field conditions and maximizing the proportion of the input signals that contributes to localization. For optimal source separation performance, it is necessary to adapt the dynamic component of the precedence model to the acoustic conditions of the room.

  • Kim C, Mason R, Brookes TS. (2013) 'Head movements made by listeners in experimental and real-life listening activities'. Journal of the Audio Engineering Society, 61 (6 (June)), pp. 425-438.

    Abstract

    Understanding the way in which listeners move their heads must be part of any objective model for evaluating and reproducing the sonic experience of space. Head movement is part of the listening experience because it allows for sensing the spatial distribution of parameters. In the first experiment, the head positions of subjects was recorded when they were asked to evaluate perceived source location, apparent source width, envelopment, and timbre of synthesis stimuli. Head motion was larger when judging source width than when judging direction or timbre. In the second experiment, head movement was observed in natural listening activities such as concerts, movies, and video games. Because the statistics of movement were similar to that observed in the first experiment, laboratory results can to be used as the basis of an objective model of spatial behavior. The results were based on 10 subjects.

  • Francombe J, Mason R, Dewhirst M, Bech S. (2013) 'Modeling listener distraction resulting from audio-on-audio interference.'. J Acoust Soc Am, 133 (5)

    Abstract

    As devices that produce audio become more commonplace and increasingly portable, situations in which two competing audio programs are present occur more regularly. In order to support the design of systems intended to mitigate the effects of interfering audio (including sound field control, noise cancelation or source separation systems), it is desirable to model the perceived distraction in such situations. Distraction ratings were collected for a range of audio-on-audio interference situations including various target and interferer programs at three interferer levels, with and without road noise. Time-frequency target-to-interferer ratio (TIR) maps of the stimuli were created using a simple auditory model. A number of feature sets were extracted from the TIR maps, including combinations of mean, standard deviation, minimum and maximum TIR taken across the duration of the program item. In order to predict distraction ratings from the features, linear regression models were produced. The models were evaluated for goodness-of-fit (RMSE) and generalizability (using a K-fold cross-validation procedure). The best model performed well, with almost all predictions falling within the 95% confidence intervals of the perceptual data. A validation data set was used to test the model, suggesting areas for future improvement.

  • Baykaner K, Hummersone C, Mason R, Bech S. (2013) 'The computational prediction of masking thresholds for ecologically valid interference scenarios.'. J Acoust Soc Am, 133 (5)

    Abstract

    Auditory interference scenarios, where a listener wishes to attend to some target audio while being presented with interfering audio, are prevalent in daily life. The goal of developing an accurate computational model which can predict masking thresholds for such scenarios is still incomplete. While some sophisticated, physiologically inspired, masking prediction models exist, they are rarely tested with ecologically valid programs (such as music and speech). In order to test the accuracy of model predictions human listener data is required. To that end a masking threshold experiment was conducted for a variety of target and interferer programs. The results were analyzed alongside predictions made by the computational auditory signal processing and prediction model described by Jepsen et al. (2008). Masking thresholds were predicted to within 3 dB root mean squared error with the greatest prediction inaccuracies occurring in the presence of speech. These results are comparable to those of the model by Glasberg and Moore (2005) for predicting the audibility of time-varying sounds in the presence of background sounds, which otherwise represent the most accurate predictions of this type in the literature.

  • Hummersone C, Mason R, Brookes T. (2011) 'Ideal Binary Mask Ratio: a novel metric for assessing binary-mask-based sound source separation algorithms'. IEEE Transactions on Audio, Speech and Language Processing, 19 (7), pp. 2039-2045.

    Abstract

    A number of metrics has been proposed in the literature to assess sound source separation algorithms. The addition of convolutional distortion raises further questions about the assessment of source separation algorithms in reverberant conditions as reverberation is shown to undermine the optimality of the ideal binary mask (IBM) in terms of signal-to-noise ratio (SNR). Furthermore, with a range of mixture parameters common across numerous acoustic conditions, SNR–based metrics demonstrate an inconsistency that can only be attributed to the convolutional distortion. This suggests the necessity for an alternate metric in the presence of convolutional distortion, such as reverberation. Consequently, a novel metric—dubbed the IBM ratio (IBMR)—is proposed for assessing source separation algorithms that aim to calculate the IBM. The metric is robust to many of the effects of convolutional distortion on the output of the system and may provide a more representative insight into the performance of a given algorithm.

  • Kim C, Mason RD, Brookes T. (2011) 'Head-movement-aware signal capture for evaluation of spatial acoustics'. Building Acoustics, 18 (1), pp. 207-226.

    Abstract

    This research incorporates the nature of head movement made in listening activities, into the development of a quasi- binaural acoustical measurement technique for the evaluation of spatial impression. A listening test was conducted where head movements were tracked whilst the subjects rated the perceived source width, envelopment, source direction and timbre of a number of stimuli. It was found that the extent of head movements was larger when evaluating source width and envelopment than when evaluating source direction and timbre. It was also found that the locus of ear positions corresponding to these head movements formed a bounded sloped path, higher towards the rear and lower towards the front. This led to the concept of a signal capture device comprising a torso-mounted sphere with multiple microphones. A prototype was constructed and used to measure three binaural parameters related to perceived spatial impression - interaural time and level differences (ITD and ILD) and interaural cross- correlation coefficient (IACC). Comparison of the prototype measurements to those made with a rotating Head and Torso Simulator (HATS) showed that the prototype could be perceptually accurate for the prediction of source direction using ITD and ILD, and for the prediction of perceived spatial impression using IACC. Further investigation into parameter derivation and interpolation methods indicated that 21 pairs of discretely spaced microphones were sufficient to measure the three binaural parameters across the sloped range of ear positions identified in the listening test.

  • Hummersone C, Mason R, Brookes T. (2010) 'Dynamic precedence effect modeling for source separation in reverberant environments'. IEEE Transactions on Audio, Speech and Language Processing, 18 (7), pp. 1867-1871.

    Abstract

    Reverberation continues to present a major problem for sound source separation algorithms. However, humans demonstrate a remarkable robustness to reverberation and many psychophysical and perceptual mechanisms are well documented. The precedence effect is one of these mechanisms; it aids our ability to localize sounds in reverberation. Despite this, relatively little work has been done on incorporating the precedence effect into automated source separation. Furthermore, no work has been carried out on adapting a precedence model to the acoustic conditions under test and it is unclear whether such adaptation, analogous to the perceptual Clifton effect, is even necessary. Hence, this study tests a previously proposed binaural separation/precedence model in real rooms with a range of reverberant conditions. The precedence model inhibitory time constant and inhibitory gain are varied in each room in order to establish the necessity for adaptation to the acoustic conditions. The paper concludes that adaptation is necessary and can yield significant gains in separation performance. Furthermore, it is shown that the initial time delay gap and the direct-to-reverberant ratio are important factors when considering this adaptation. © 2010 IEEE.

  • Neher T, Brookes T, Mason R. (2006) 'Musically representative test signals for interaural cross-correlation coefficient measurement'. Acta Acustica united with Acustica, 92 (5), pp. 787-796.

    Abstract

    Typically, measurements that aim to predict perceived spatial impression of music signals in concert halls are performed by calculating the interaural cross-correlation coefficient (IACC) of a binaurally-recorded impulse response. Previous research, however, has shown that this can lead to results very different from those obtained if a musical input signal is used. The reasons for this discrepancy were investigated, and it was found that the overall duration of the source signal, its onset and offset times, and the magnitude and rate of any spectral fluctuations, have a very strong effect on the IACC. Two test signals, synthesised to be representative of a wide range of musical stimuli, can extend the external validity of traditional IACC-based measurements. © S. Hirzel Verlag EAA.

  • Mason R, Brookes T, Rumsey F. (2005) 'Frequency dependency of the relationship between perceived auditory source width and the interaural cross-correlation coefficient for time-invariant stimuli.'. Journal of the Acoustical Society of America, United States: 117 (3 Pt 1), pp. 1337-1350.

    Abstract

    Previous research has indicated that the relationship between the interaural cross-correlation coefficient (IACC) of a narrow-band sound and its perceived auditory source width is dependent on its frequency. However, this dependency has not been investigated in sufficient detail for researchers to be able to properly model it in order to produce a perceptually relevant IACC-based model of auditory source width. A series of experiments has therefore been conducted to investigate this frequency dependency in a controlled manner, and to derive an appropriate model. Three main factors were discovered in the course of these experiments. First, the nature of the frequency dependency of the perceived auditory source width of stimuli with an IACC of 1 was determined, and an appropriate mathematical model was derived. Second, the loss of perceived temporal detail at high frequencies, caused by the breakdown of phase locking in the ear, was found to be relevant, and the model was modified accordingly using rectification and a low-pass filter. Finally, it was found that there was a further frequency dependency at low frequencies, and a method for modeling this was derived. The final model was shown to predict the experimental data well.

  • Mason R, Brookes T, Rumsey F. (2005) 'The effect of various source signal properties on measurements of the interaural cross-correlation coefficient'. Acoustical Science and Technology, 26 (2), pp. 102-113.

    Abstract

    Measurements that attempt to predict the perceived spatial impression of musical signals in concert halls typically are conducted by calculating the interaural cross-correlation coefficient (IACC) of an impulse response. The causes of interaural decorrelation are investigated and it is found that this is affected by frequency dependent interaural time and level differences and variations in these over time. It is found that the IACC of impulsive and of narrowband tonal signals can be very different from each other in a wide range of acoustical environments, due to the differences in the spectral content and the duration of the signals. From this, it is concluded that measurements made of impulsive signals are unsuitable for attempting to predict the perceived spatial impression of musical signals. It is suggested that further work is required to develop a set of test signals that is representative of a wide range of musical stimuli

  • Mason R, Ford N, Rumsey F, de Bruyn B. (2001) 'Verbal and non-verbal elicitation techniques in the subjective assessment of spatial sound reproduction'. Journal of the Audio Engineering Society, 49 (5), pp. 366-384.

    Abstract

    Current research into spatial audio has shown an increasing interest in the way subjective attributes of reproduced sound are elicited from listeners. The emphasis at present is on verbal semantics, however, studies suggest that nonverbal methods of elicitation could be beneficial. Research into the relative merits of these methods has found that nonverbal responses may result in different elicited attributes compared to verbal techniques. Nonverbal responses may be closer to the perception of the stimuli than the verbal interpretation of this perception. There is evidence that drawing is not as accurate as other nonverbal methods of elicitation when it comes to reporting the localization of auditory images. However, the advantage of drawing is its ability to describe the whole auditory space rather than a single dimension.

Conference papers

  • Simpson AJR, Roma G, Grais EM, Mason R, Hummersone C, Plumbley MD. (2017) 'Psychophysical Evaluation of Audio Source Separation Methods'. Springer LNCS, Grenoble, France: 13th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA 2017)
    [ Status: Accepted ]

    Abstract

    Source separation evaluation is typically a top-down process, starting with perceptual measures which capture fitness-for-purpose and followed by attempts to find physical (objective) measures that are predictive of the perceptual measures. In this paper, we take a contrasting bottom-up approach. We begin with the physical measures provided by the Blind Source Separation Evaluation Toolkit (BSS Eval) and we then look for corresponding perceptual correlates. This approach is known as psychophysics and has the distinct advantage of leading to interpretable, psychophysical models. We obtained perceptual similarity judgments from listeners in two experiments featuring vocal sources within musical mixtures. In the first experiment, listeners compared the overall quality of vocal signals estimated from musical mixtures using a range of competing source separation methods. In a loudness experiment, listeners compared the loudness balance of the competing musical accompaniment and vocal. Our preliminary results provide provisional validation of the psychophysical approach

  • Rämö J, Marsh S, Bech S, Mason RD, Holdt Jensen S. (2016) 'Validation of a perceptual distraction model in a complex personal sound zone system'. Audio Engineering Society Los Angeles, USA: 141st Audio Engineering Society Convention (9665)

    Abstract

    This paper evaluates a previously proposed perceptual model predicting user’s perceived distraction caused by interfering audio programmes. The distraction model was originally trained using a simple sound reproduction system for music-on-music interference situations and it has not been formally tested using more complex sound systems. A listening experiment was conducted to evaluate the performance of the model, using music target and speech interferer reproduced by a complex personal sound-zone system. The model was found to successfully predict the perceived distraction of a more complex sound reproducing system with different target-interferer pairs than it was originally trained for. Thus, the model can be used as a tool for personal sound-zone evaluation and optimization tasks.

  • Simpson AJR, Roma G, Grais EM, Mason RD, Hummersone C, Liutkus A, Plumbley MD. (2016) 'Evaluation of Audio Source Separation Models Using Hypothesis-Driven Non-Parametric Statistical Methods'. Budapest: European Signal Processing Conference (EUSIPCO) 2016
    [ Status: Accepted ]

    Abstract

    Audio source separation models are typically evaluated using objective separation quality measures, but rigorous statistical methods have yet to be applied to the problem of model comparison. As a result, it can be difficult to establish whether or not reliable progress is being made during the development of new models. In this paper, we provide a hypothesis-driven statistical analysis of the results of the recent source separation SiSEC challenge involving twelve competing models tested on separation of voice and accompaniment from fifty pieces of “professionally produced” contemporary music. Using nonparametric statistics, we establish reliable evidence for meaningful conclusions about the performance of the various models.

  • Francombe J, Brookes T, Mason R, Woodcock J. (2016) 'Determining and Labeling the Preference Dimensions of Spatial Audio Replay'. IEEE Lisbon, Portugal: QoMEX 2016
    [ Status: Accepted ]

    Abstract

    There are many spatial audio reproduction systems currently in domestic use (e.g. mono, stereo, surround sound, sound bars, and headphones). In an experiment, pairwise pref-erence magnitude ratings for a range of such systems were collected from trained and untrained listeners. The ratings were analysed using internal preference mapping to: (i) uncover the principal perceptual dimensions of listener preference; (ii) label the dimensions based on the important perceptual attributes; and (iii) observe differences between trained and untrained listeners. To aid with labelling the dimensions, perceptual attributes were elicited alongside the preference ratings and were analysed by: (i) considering a metric derived from the frequency of use of each attribute and the magnitude of the related preference judgements; and (ii) observing attribute use for comparisons between specific methods. The first preference dimension accounted for the vast majority of the variance in ratings; it was related to multiple important attributes, including those associated with spatial capability and freedom from distortion. All participants exhibited a preference for reproduction methods that were positively correlated with the first dimension (most notably 5-, 9-, and 22-channel surround sound). The second dimension accounted for only a very small proportion of the variance, and appeared to separate the headphone method from the other methods. The trained and untrained listeners generally showed opposite preferences in the second dimension, suggesting that trained listeners have a higher preference for headphone reproduction than untrained listeners.

  • Pearce A, Brookes TS, Mason R, Dewhirst M. (2016) 'Measurements to determine the ranking accuracy of perceptual models'. Audio Engineering Society Paris, France: 140th Convention of the Audio Engineering Society
    [ Status: Accepted ]

    Abstract

    Linear regression is commonly used in the audio industry to create objective measurement models that predict subjective data. For any model development, the measure used to evaluate the accuracy of the prediction is important. The most common measures assume a linear relationship between the subjective data and the prediction, though in the early stages of model development this is not always the case. Measures based on rank ordering (such as Spearman’s test), can alternatively be used. Spearman’s test, however, does not consider the variance of the subjective data. This paper presents a method of incorporating the subjective variance into the Spearman’s rank ordering test using Monte Carlo simulations, and shows how this can be beneficial in the development of predictive models.

  • Francombe J, Brookes TS, Mason R, Melchior F. (2015) 'Loudness matching multichannel audio programme material with listeners and predictive models'. 139th International AES Convention papers, New York, USA: 139th International AES Convention

    Abstract

    Loudness measurements are often necessary in psychoacoustic research and legally required in broadcasting. However, existing loudness models have not been widely tested with new multichannel audio systems. A trained listening panel used the method of adjustment to balance the loudnesses of eight reproduction methods: low-quality mono, mono, stereo, 5-channel, 9-channel, 22-channel, ambisonic cuboid, and headphones. Seven programme items were used, including music, sport, and a lm soundtrack. The results were used to test loudness models including simple energy-based metrics, variants of ITU-R BS.1770, and complex psychoacoustically motivated models. The mean differences between the perceptual results and model predictions were statistically insignificant for all but the simplest model. However, some weaknesses in the model predictions were highlighted.

  • Francombe J, Brookes T, Mason RD. (2015) 'Perceptual evaluation of spatial quality: where next?'. Florence: 22nd International Congress on Sound and Vibration

    Abstract

    From the early days of reproduced sound, engineers have sought to reproduce the spatial properties of sound fields, leading to the development of a range of technologies. Two-channel stereo has been prevalent for many years; however, systems with a higher number of discrete channels (including rear and height loudspeakers) are becoming more common and, recently, there has been a move towards loudspeaker-agnostic methods using audio objects. Perceptual evaluation, and perceptually-informed objective measurement, of alternative reproduction systems can inform further development and steer future innovations. It is important, therefore, that any gaps in the field of perceptual evaluation and measurement are identified and that future work aims to fill those gaps. A standard research paradigm in the field is identification of the perceptual attributes of a stimulus set, facilitating controlled listening tests and leading to the development of predictive models. There have been numerous studies that aim to discover the perceptual attributes of reproduced spatial sound, leading to more than fifty descriptive terms. However, a literature review revealed the following key problems: (i) there is little agreement on exact definitions, nor on the relative importance of each attribute; (ii) there may be important attributes that have not yet been identified (e.g. attributes arising from differences between real and reproduced audio, or pertaining to new 3D or object-based methods); and (iii) there is no model of overall spatial quality based directly on the important attributes. Consequently, the authors contend that future research should focus on: (i) ascertaining which attributes of reproduced spatial audio are most important to listeners; (ii) identifying any important attributes currently missing; (iii) determining the relationships between the important attributes and listener preference; (iv) modelling overall spatial quality in terms of the important perceptual attributes; and (v) modelling these perceptual attributes in terms of their physical correlates.

  • Francombe J, Brookes T, Mason RD. (2015) 'Elicitation of the differences between real and reproduced audio'. Audio Engineering Society Preprint, Warsaw: 138th Audio Engineering Society Convention 9307

    Abstract

    To improve the experience of listening to reproduced audio, it is beneficial to determine the differences between listening to a live performance and a recording. An experiment was performed in which three live performances (a jazz duet, a jazz-rock quintet, and a brass quintet) were captured and simultaneously replayed over a nine-channel with-height surround sound system. Experienced and inexperienced listeners moved freely between the live performance and the reproduction and described the difference in listening experience. In subsequent group discussions, the experienced listeners produced twenty-nine categories using some terms that are not commonly found in the current spatial audio literature. The inexperienced listeners produced five categories that overlapped with the experienced group terms but that were not as detailed.

  • Francombe J, Brookes T, Mason R, Flindt R, Coleman P, Liu Q, Jackson PJB. (2015) 'Production and reproduction of programme material for a variety of spatial audio formats'. Proc. AES 138th Int. Conv. (e-Brief), Warsaw, Warsaw: 138th Audio Engineering Society Convention, pp. 4-4.

    Abstract

    For subjective experimentation on 3D audio systems, suitable programme material is needed. A large-scale recording session was performed in which four ensembles were recorded with a range of existing microphone techniques (aimed at mono, stereo, 5.0, 9.0, 22.0, ambisonic, and headphone reproduction) and a novel 48-channel circular microphone array. Further material was produced by remixing and augmenting pre-existing multichannel content. To mix and monitor the programme items (which included classical, jazz, pop and experimental music, and excerpts from a sports broadcast and a lm soundtrack), a flexible 3D audio reproduction environment was created. Solutions to the following challenges were found: level calibration for different reproduction formats; bass management; and adaptable signal routing from different software and fille formats.

  • Pike C, Mason RD, Brookes T. (2014) 'Auditory compensation for spectral coloration'. Audio Engineering Society Preprint, New York: 137th Audio Engineering Society Convention 9138

    Abstract

    The “spectral compensation effect” (Watkins, 1991) describes a decrease in perceptual sensitivity to spectral modifications caused by the transmission channel (e.g., loudspeakers, listening rooms). Few studies have examined this effect: its extent and perceptual mechanisms are not confirmed. The extent to which compensation affects the perception of sounds colored by loudspeakers and other channels should be determined. This compensation has been mainly studied with speech. Evidence suggests that speech engages special perceptual mechanisms, so compensation might not occur with non-speech sounds. The current study provides evidence of compensation for spectrum in nonspeech tests: channel coloration was reduced by approximately 20%.

  • Ashby T, Mason RD, Brookes T. (2014) 'Elevation localisation response accuracy on vertical planes of differing azimuth'. Audio Engineering Society Preprint, Berlin, Germany: 136th Audio Engineering Society Convention 9046

    Abstract

    Head movement has been shown to significantly improve localisation response accuracy in elevation. It is unclear from previous research whether this is due to static cues created once the head has reached a new stationary position or dynamic cues created through the act of moving the head. In this experiment listeners were asked to report the location of loudspeakers placed on vertical planes at four different azimuth angles (0°, 36°, 72°, 108°) with no head movement. Static elevation response accuracy was significantly more accurate for sources away from the median plane. This finding, combined with the statement that listeners orient to face the source when localising, suggests that dynamic cues are the cause of improved localisation through head movement.

  • Pike C, Mason RD, Brookes T. (2014) 'The effect of auditory memory on the perception of timbre'. Audio Engineering Society Preprint, Berlin: 136th Audio Engineering Society Convention 9028

    Abstract

    Listeners are more sensitive to timbral differences when comparing stimuli side-by-side than temporally-separated. The contributions of auditory memory and spectral compensation to this effect are unclear. A listening test examined the role of auditory memory in timbral discrimination, across retention intervals (RIs) of up to 40 s. For timbrally complex music stimuli discrimination accuracy was good across all RIs, but there was increased sensitivity to onset spectrum, which decreased with increasing RI. Noise stimuli showed no onset sensitivity but discrimination performance declined with RIs of 40 s. The difference between program types may suggest different onset sensitivity and memory encoding (categorical vs non-categorical). The onset bias suggests that memory effects should be measured prior to future investigation of spectral compensation.

  • Francombe J, Mason RD, Dewhirst M, Bech S. (2014) 'Investigation of a random radio sampling method for selecting ecologically valid music programme material'. Audio Engineering Society Preprint, Berlin: 136th Audio Engineering Society Convention 9029

    Abstract

    When performing subjective tests of an audio system, it is necessary to use appropriately selected programme material to excite that system. Programme material is often required to be wide-ranging and representative of commonly consumed audio, whilst having minimal selection bias. A random radio sampling procedure was investigated for its ability to produce such a stimulus set. Nine popular stations were sampled at six di↵erent times of day over a number of days to produce a 200-item pool. Musical and signal-based characteristics were examined; the items were found to span a wide range of genres and years, and physical similarities were found between items in the same genre. The proposed method is beneficial for collecting a wide and representative stimulus set.

  • Baykaner K, Hummersone C, Mason RD, Bech S. (2014) 'The acceptability of speech with interfering radio programme material'. Audio Engineering Society Preprint, Berlin: 136th Audio Engineering Society Convention 9020

    Abstract

    A listening test was conducted to investigate the acceptability of audio-on-audio interference for radio pro- grammes featuring speech as the target. 21 subjects, including na ̈ıve and expert listeners, were presented with 200 randomly assigned pairs of stimuli and asked to report, for each trial, whether the listening scenario was acceptable or unacceptable. Stimuli pairs were set to randomly selected SNRs ranging from 0 to 45 dB. Results showed no significant di↵erence between subjects according to listening experience. A logistic re- gression to acceptability was carried out based on SNR. The model had accuracy R2 = 0.87, RMSE = 14%,and RMSE* = 7%. By accounting for the presence of background audio in the target programme, 90% of the variance could be explained.

  • Olik M, Coleman P, Jackson PJB, Francombe J, Mason R, Olsen M, Møller M, Bech S. (2013) 'A comparative performance study of sound zoning methods in a reflective environment'. Proceedings of the 52nd AES International Conference, , pp. 214-223.

    Abstract

    Whilst sound zoning methods have typically been studied under anechoic conditions, it is desirable to evaluate the performance of various methods in a real room. Three control methods were implemented (delay and sum, DS; acoustic contrast control, ACC; and pressure matching, PM) on two regular 24-element loudspeaker arrays (line and circle). The acoustic contrast between two zones was evaluated and the reproduced sound fields compared for uniformity of energy distribution. ACC generated the highest contrast, whilst PM produced a uniform bright zone. Listening tests were also performed using monophonic auralisations from measured system responses to collect ratings of perceived distraction due to the alternate audio programme. Distraction ratings were affected by control method and programme material. Copyright © (2013) by the Audio Engineering Society.

  • Francombe J, Baykaner K, Mason R, Dewhirst M, Coleman P, Olik M, Jackson PJB, Bech S, Pedersen JA. (2013) 'Perceptually optimised loudspeaker selection for the creation of personal sound zones'. Proceedings of the 52nd AES International Conference, , pp. 169-178.

    Abstract

    Sound eld control methods can be used to create multiple zones of audio in the same room. Separation achieved by such systems has classically been evaluated using physical metrics including acoustic contrast and target-to-interferer ratio (TIR). However, to optimise the experience for a listener it is desirable to consider perceptual factors. A search procedure was used to select 5 loudspeakers for production of 2 sound zones using acoustic contrast control. Comparisons were made between searches driven by physical (programme-independent TIR) and perceptual (distraction predictions from a statistical model) cost func- Tions. Performance was evaluated on TIR and predicted distraction in addition to subjective ratings. The perceptual cost function showed some benefits over physical optimisation, although the model used needs further work. Copyright © (2013) by the Audio Engineering Society.

  • Baykaner K, Hummersone C, Mason RD, Bech S. (2013) 'Selection of temporal windows for the computational prediction of masking thresholds'. Vancouver: IEEE International Conference on Acoustics, Speech, and Signal Processing

    Abstract

    In the field of auditory masking threshold predictions an op- timal method for buffering a continuous, ecologically valid programme combination into discrete temporal windows has yet to be determined. An investigation was carried out into the use of a variety of temporal window durations, shapes, and steps, in order to discern the resultant effect upon the accu- racy of various masking threshold prediction models. Selec- tion of inappropriate temporal windows can triple the predic- tion error in some cases. Overlapping windows were found to produce the lowest errors provided that the predictions were smoothed appropriately. The optimal window shape varied across the tested models. The most accurate variant of each model resulted in root mean squared errors of 2.3, 3.4, and 4.2 dB.

  • Pike C, Brookes T, Mason R. (2013) 'Auditory adaptation to loudspeaker and listening room acoustics'. 135th Audio Engineering Society Convention 2013, , pp. 116-125.

    Abstract

    Timbrai qualities of loudspeakers and rooms are often compared in listening tests involving short listening periods. Outside the laboratory, listening occurs over a longer time course. In a study by Olive et al. (1995) smaller timbrai differences between loudspeakers and between rooms were reported when comparisons were made over longer versus shorter time periods. This is a form of timbrai adaptation, a decrease in sensitivity to timbre over time. The current study confirms this adaptation and establishes that it is not due to response bias but may be due to timbrai memory, specific mechanisms compensating for transmission channel acoustics, or attentional factors. Modifications to listening tests may be required where tests need to be representative of listening outside of the laboratory.

  • Baykaner K, Hummersone C, Mason R, Bech S. (2013) 'The prediction of the acceptability of auditory interference based on audibility'. Proceedings of the 52nd AES International Conference, , pp. 162-168.

    Abstract

    In order to evaluate the ability of sound eld control methods to generate independent listening zones within domestic and automotive environments, it is useful to be able to predict, without listening tests, the accept- Ability of auditory interference scenarios. It was considered likely that a relationship would exist between masking thresholds and acceptability thresholds, thus a listening test was carried out to gather acceptability thresholds to compare with existing masking data collected under identical listening conditions. An analysis of the data revealed that a linear regression model could be used to predict acceptability thresholds, from only masking thresholds, with RMSE = 2.6 dB and R = 0.86. The same linear regression model was used to predict acceptability thresholds but with masking threshold predictions as the input. The results had RMSE = 4.2 dB and R = 0.88. Copyright © (2013) by the Audio Engineering Society.

  • Ashby T, Mason RD, Brookes T. (2013) 'Head movements in three-dimensional localisation'. Audio Engineering Society Preprint Proceedings of the 134th Audio Engineering Society Convention, Rome, Italy: 134th Audio Engineering Society Convention 8881

    Abstract

    Previous studies give contradicting evidence as to the importance of head movements in localisation. In this study head movements were shown to increase localisation response accuracy in elevation and azimuth. For elevation, it was found that head movement improved localisation accuracy in some cases and that when pinna cues were impeded the significance of head movement cues was increased. For azimuth localization, head movement reduced front-back confusions. There was also evidence that head movement can be used to enhance static cues for azimuth localisation. Finally, it appears that head movement can increase the accuracy of listeners’ responses by enabling an interaction between auditory and visual cues.

  • Francombe J, Mason RD, Dewhirst M, Bech S. (2012) 'Determining the threshold of acceptability for an interfering audio programme'. Audio Engineering Society Preprint Audio Engineering Society Preprint, Budapest, Hungary: 132nd Audio Engineering Society Convention 8639

    Abstract

    An experiment was performed in order to establish the threshold of acceptability for an interfering audio programme on a target audio programme, varying the following physical parameters: target programme, interferer programme, interferer location, interferer spectrum, and road noise level. Factors were varied in three levels in a Box-Behnken fractional factorial design. The experiment was performed in three scenarios: information gathering, entertainment, and reading/working. Nine listeners performed a method of adjustment task to determine the threshold values. Produced thresholds were similar in the information and entertainment scenarios, however there were significant differences between subjects, and factor levels also had a significant effect: interferer programme was the most important factor across the three scenarios, whilst interferer location was the least important.

  • Simon LSR, Mason R. (2011) 'Spaciousness rating of 8-channel stereophony-based microphone arrays'. London, UK : Audio Engineering Society Preprint, London, UK: 130th Audio Engineering Society Convention 8340

    Abstract

    In previous studies, the localisation accuracy and the spatial impression of 3-2 stereo microphone arrays were discussed. These showed that 3-2 stereo cannot produce stable images to the side and to the rear of the listener. An octagon loudspeaker array was therefore proposed. Microphone array design for this loudspeaker configuration was studied in terms of localisation accuracy, locatedness and sound image width. This paper describes an experiment conducted to evaluate the spaciousness of 10 different microphone arrays used in different acoustical environments. Spaciousness was analyzed as a function of sound signal, acoustical environment and microphone array’s characteristics. It showed that the height of the microphone array and the original acoustical environment are the two variables that have the most influence on the perceived spaciousness, but that microphone directivity and the position of sound sources is also important.

  • Ashby T, Mason R, Brookes T. (2011) 'Prediction of perceived elevation using multiple psuedo-binaural microphones'. London, UK : Audio Engineering Society Audio Engineering Society Preprint, London, UK: 130th Audio Engineering Society Convention

    Abstract

    Computational auditory models that predict the perceived location of sound sources in terms of azimuth are already available, yet little has been done to predict perceived elevation. Interaural time and level differences, the primary cues in horizontal localisation, do not resolve source elevation, resulting in the ‘Cone of Confusion’. In natural listening, listeners can make head movements to resolve such confusion. To mimic the dynamic cues provided by head movements, a multiple microphone sphere was created, and a hearing model was developed to predict source elevation from the signals captured by the sphere. The prototype sphere and hearing model proved effective in both horizontal and vertical localisation. The next stage of this research will be to rigorously test a more physiologically accurate capture device.

  • Simon LSR, Mason R. (2010) 'Verification of microphone techniques for a novel surround sound system'. Marseille, France : Marseille, France: Laboratory of Mechanics and Acoustics, Centre National de la Recherche Scientifique
  • Simon LSR, Mason R. (2010) 'Development of a novel surround sound format and associated microphone techniques'. King’s College, London :
  • Hummersone C, Mason R, Brookes T. (2010) 'A comparison of computational precedence models for source separation in reverberant environments'. Audio Engineering Society Audio Engineering Society Preprint, London, UK: 128th Audio Engineering Society Convention 7981
  • Simon LSR, Mason R. (2010) 'Time and level localisation curves for a regularly-spaced octagon loudspeaker array'. London, UK : Audio Engineering Society Audio Engineering Society Preprint, London, UK: 128th Audio Engineering Society Convention 8079

    Abstract

    Multichannel microphone array designs often use the localisation curves that have been derived for 2-0 stereophony. Previous studies showed that side and rear perception of phantom image locations require somewhat different curves. This paper describes an experiment conducted to determine localisation curves using an octagonal loudspeaker setup. Various signals with a range of interchannel time and level differences were produced between pairs of adjacent loudspeakers, and subjects were asked to evaluate the perceived sound event’s direction and its locatedness. The results showed that the curves for the side pairs of adjacent loudspeakers are significantly different to the front and rear pairs. The resulting curves can be used to derive suitable microphone techniques for this loudspeaker setup.

  • Kim C, Mason R, Brookes T. (2010) 'Investigation into and modelling of head movement for objective evaluation of the spatial impression of audio'. Boston, USA : Acoustical Society of America Journal of the Acoustical Society of America, Baltimore, USA: 159th Meeting of the Acoustical Society of America 127 (3), pp. 1886-1886.

    Abstract

    Research was undertaken to determine the nature of head movements made when judging spatial impression and to incorporate these into a system for measuring, in a perceptually relevant manner, the acoustic parameters which contribute to spatial impression: interaural time and level differences and interaural cross‐correlation coefficient. First, a subjective test was conducted that showed that (i) the amount of head movement was larger when evaluating source width and envelopment than when judging localization and timbre and (ii) the pattern of head movement resulted in ear positions that formed a sloped area. These findings led to the design of a binaural signal capture technique using a sphere with multiple microphones, mounted on a simulated torso. Evaluation of this technique revealed that it would be appropriate for the prediction of perceived spatial attributes including both source direction and aspects of spatial impression. Reliable derivation of these attributes across the range of ear positions determined from the earlier subjective test was shown to be possible with a limited number of microphones through an appropriate interpolation and calculation technique. A prototype capture system was suggested as a result, using a sphere with torso, with 21 omnidirectional microphones on each side. [Work supported by the Engineering and Physical Sciences Research Council (EPSRC), UK, Grant No. EP/D049253.]

  • Kim C, Mason R, Brookes T. (2010) 'Validation of a simple spherical head model as a signal capture device for head-movement-aware prediction of perceived spatial impression'. Audio Engineering Society Proceedings of the 40th International AES Conference, Tokyo, Japan: AES 40th International Conference (Spatial Audio: Sense the Sound of Space)

    Abstract

    In order to take head movement into account in objective evaluation of perceived spatial impression (including source direction), a suitable binaural capture device is required. A signal capture system was suggested that consisted of a head-sized sphere containing multiple pairs of microphones which, in comparison to a rotating head and torso simulator (HATS), has the potential for improved measurement speed and the capability to measure time varying systems, albeit at the expense of some accuracy. The error introduced by using a relatively simple sphere compared to a more physically accurate HATS was evaluated in terms of three binaural parameters related to perceived spatial impression – interaural time and level differences (ITD and ILD) and interaural cross-correlation coefficient (IACC). It was found that whilst the error in the IACC measurements was perceptually negligible when the sphere was mounted on a torso, the differences in measured ITD and ILD values between the sphere-with-torso and HATS were not perceptually negligible. However, it was found that the sphere-with-torso could give accurate predictions of source location based on ITD and ILD, through the use of a look-up table created from known ITD-ILD-direction mappings. Therefore the validity of the multi-microphone sphere-with-torso as a binaural signal capture device for perceptually relevant measurements of source direction (based on ITD and ILD) and spatial impression (based on IACC) was demonstrated.

  • Kim C, Mason R, Brookes T. (2010) 'A quasi-binaural approach to head-movement-aware evaluation of spatial acoustics'. Sidney : The International Congress on Acoustics (ICA) Proceedings of the International Symposium on Room Acoustics, Melbourne, Australia: International Symposium on Room Acoustics. A Satellite of the International Congress on Acoustics. General papers 4 (1), pp. 292-300.

    Abstract

    This research incorporates the nature of head movement made in listening activities, into the development of a quasibinaural acoustical measurement technique for the evaluation of spatial impression. A listening test was conducted where head movements were tracked whilst the subjects rated the perceived source width, envelopment, source direction and timbre of a number of stimuli. It was found that the extent of head movements was larger when evaluating source width and envelopment than when evaluating source direction and timbre. It was also found that the locus of ear positions corresponding to these head movements formed a bounded sloped path, higher towards the rear and lower towards the front. This led to the concept of a signal capture device comprising a torso-mounted sphere with multiple microphones. A prototype was constructed and used to measure three binaural parameters related to perceived spatial impression - interaural time and level differences (ITD and ILD) and interaural crosscorrelation coefficient (IACC). Comparison of the prototype measurements to those made with a rotating Head and Torso Simulator (HATS) showed that the prototype could be perceptually accurate for the prediction of source direction using ITD and ILD, and for the prediction of perceived spatial impression using IACC. Further investigation into parameter derivation and interpolation methods indicated that 21 pairs of discretely spaced microphones were sufficient to measure the three binaural parameters across the sloped range of ear positions identified in the listening test.

  • Kim C, Mason R, Brookes T. (2010) 'Development of a head-movement-aware signal capture system for the prediction of acoustical spatial impression'. Sidney : International Congress of Acoustics (ICA) Proceedings of the 20th International Congress on Acoustics, Sydney, Australia: 20th International Congress on Acoustics 4, pp. 2768-2775.

    Abstract

    This research introduces a novel technique for capturing binaural signals for objective evaluation of spatial impression; the technique allows for simulation of the head movement that is typical in a range of listening activities. A subjective listening test showed that the amount of head movement made was larger when listeners were rating perceived source width and envelopment than when rating source direction and timbre, and that the locus of ear positions corresponding to the pattern of head movement formed a bounded sloped path – higher towards the rear and lower towards the front. Based on these findings, a signal capture system was designed comprising a sphere with multiple microphones, mounted on a torso. Evaluation of its performance showed that a perceptual model incorporating this capture system is capable of perceptually accurate prediction of source direction based on interaural time and level differences (ITD and ILD), and of spatial impression based on interaural cross-correlation coefficient (IACC). Investigation into appropriate parameter derivation and interpolation techniques determined that 21 pairs of spaced microphones were sufficient to measure ITD, ILD and IACC across the sloped range of ear positions.

  • Simon LSR, Mason R, Rumsey F. (2009) 'Localisation curves for a regularly-spaced octagon loudspeaker array'. New York, USA : Audio Engineering Society Audio Engineering Society Preprint, New York, USA: 127th Audio Engineering Society Convention 7915

    Abstract

    Multichannel microphone array designs often use the localisation curves that have been derived for 2-0 stereophony. Previous studies showed that side and rear perception of phantom image locations require somewhat different curves. This paper describes an experiment conducted to evaluate localisation curves using an octagonal loudspeaker setup. Interchannel level differences were produced between the loudspeaker pairs forming each of the segments of the loudspeaker array, one at a time, and subjects were asked to evaluate the perceived sound event’s direction and its locatedness. The results showed that the localisation curves derived for 2-0 stereophony are not directly applicable, and that different localisation curves are required for each loudspeaker pair.

  • Kim C, Mason RD, Brookes T. (2009) 'The role of head movement in the analysis of spatial impression'. Engineering and Physical Sciences Research Council London, UK: EPSRC People in Systems Theme Day
  • Kim C, Mason R, Brookes TS. (2009) 'The role of head movement in the analysis of spatial impression'. Institute of Hearing Research: European Career Workshop for PhD Students in Hearing Research
  • Mason R, Kim C, Brookes T. (2009) 'Perception of head-position-dependent variations in interaural cross-correlation coefficient'. Munich, Germany : Audio Engineering Society Audio Engineering Society Preprint, Munich, Germany: 126th Audio Engineering Society Convention 7729

    Abstract

    Experiments were undertaken to elicit the perceived effects of head-position-dependent variations in the interaural cross-correlation coefficient of a range of signals. A graphical elicitation experiment showed that the variations in the IACC strongly affected the perceived width and depth of the reverberant environment, as well as the perceived width and distance of the sound source. A verbal experiment gave similar results, and also indicated that the head-position-dependent IACC variations caused changes in the perceived spaciousness and envelopment of the stimuli.

  • Mason RD, Kim C, Brookes T. (2008) 'Taking head movements into account in measurement of spatial attributes'. Institute of Acoustics Proceedings of the Institute of Acoustics Reproduced Sound Conference, Brighton, UK: Institute of Acoustics 24th Reproduced Sound Conference 30 (6), pp. 239-246.

    Abstract

    Measurements of the spatial attributes of auditory environments or sound reproduction systems commonly only consider a single receiver position. However, it is known that humans make use of head movement to help to make sense of auditory scenes, especially when the physical cues are ambiguous. Results are summarised from a three-year research project which aims to develop a practical binaural-based measurement system that takes head movements into account. Firstly, the head movements made by listeners in various situations were investigated, which showed that a wide range of head movements are made when evaluating source width and envelopment, and minimal head movements made when evaluating timbre. Secondly, the effect of using a simplified sphere model containing two microphones instead of a head and torso simulator was evaluated, and methods were derived to minimise the errors in measured cues for spatial perception that were caused by the simplification of the model. Finally, the results of the two earlier stages were combined to create a multi-microphone sphere that can be used to measure spatial attributes incorporating head movements in a perceptually-relevant manner, and which allows practical and rapid measurements to be made.

  • Kim C, Mason R, Brookes T. (2008) 'Improvements to a Spherical Binaural Capture Model for Objective Measurement of Spatial Impression with Consideration of Head Movements'. San Francisco, USA : Audio Engineering Society Audio Engineering Society Preprint, San Francisco, USA: 125th Audio Engineering Society Convention 7579

    Abstract

    This research aims, ultimately, to develop a system for the objective evaluation of spatial impression, incorporating the finding from a previous study that head movements are naturally made in its subjective evaluation. A spherical binaural capture model, comprising a head-sized sphere with multiple attached microphones, has been proposed. Research already conducted found significant differences in interaural time and level differences, and cross-correlation coefficient, between this spherical model and a head and torso simulator. It is attempted to lessen these differences by adding to the sphere a torso and simplified pinnae. Further analysis of the head movements made by listeners in a range of listening situations determines the range of head positions that needs to be taken into account. Analyses of these results inform the optimum positioning of the microphones around the sphere model.

  • Kim C, Mason R, Brookes T. (2008) 'Initial investigation of signal capture techniques for objective measurement of spatial impression considering head movement'. Audio Engineering Society Preprint Audio Engineering Society Preprint, Amsterdam, The Netherlands: 124th Audio Engineering Society Convention 7331

    Abstract

    In a previous study it was discovered that listeners normally make head movements attempting to evaluate source width and envelopment as well as source location. To accommodate this finding in the development of an objective measurement model for spatial impression, two capturing models were introduced and designed in this research, based on binaural technique: 1) rotating Head And Torso Simulator (HATS), and 2) a sphere with multiple microphones. As an initial study, measurements of interaural time difference (ITD), level difference (ILD) and cross-correlation coefficient (IACC) made with the HATS were compared with those made with a sphere containing two microphones. The magnitude of the differences was judged in a perceptually relevant manner by comparing them with the just-noticeable differences (JNDs) of these parameters. The results showed that the differences were generally not negligible, implying the necessity of enhancement of the sphere model, possibly by introducing equivalents of the pinnae or torso. An exception was the case of IACC, where the reference of JND specification affected the perceptual significance of its difference between the two models.

  • Lee H-K, Mason R, Rumsey F. (2007) 'Perceptually modelled effects of interchannel crosstalk in multichannel microphone technique'. New York, USA : Audio Engineering Society Audio Engineering Society Convention 123, New York, USA: 123rd Audio Engineering Society Conference 7200

    Abstract

    One of the most noticeable perceptual effects of interchannel crosstalk in multichannel microphone technique is an increase in perceived source width. The relationship between the perceived source-width-increasing effect and its physical causes was analysed using an IACC-based objective measurement model. A description of the measurement model is presented and the measured data obtained from stimuli created with crosstalk and those without crosstalk are analysed visually. In particular, frequency and envelope dependencies of the measured results and their relationship with the perceptual effect are discussed. The relationship between the delay time of the crosstalk signal and the effect of different frequency content on the perceived source width is also discussed in this paper.

  • Kim C, Mason R, Brookes T. (2007) 'An investigation into head movements made when evaluating various attributes of sound'. Vienna, Austria : Audio Engineering Society Audio Engineering Society Preprint, Vienna, Austria: 122nd Audio Engineering Society Convention 7031

    Abstract

    This research extends the study of head movements during listening by including various listening tasks where the listeners evaluate spatial impression and timbre, in addition to the more common task of judging source location. Subjective tests were conducted in which the listeners were allowed to move their heads freely whilst listening to various types of sound and asked to evaluate source location, apparent source width, envelopment, and timbre. The head movements were recorded with a head tracker attached to the listener’s head. From the recorded data, the maximum range of movement, mean position and speed, and maximum speed were calculated along each axis of translational and rotational movement. The effects of various independent variables, such as the attribute being evaluated, the stimulus type, the number of repetition, and the simulated source location were examined through statistical analysis. The results showed that whilst there were differences between the head movements of individual subjects, across all listeners the range of movement was greatest when evaluating source width and envelopment, less when localising sources, and least when judging timbre. In addition, the range and speed of head movement was reduced for transient signals compared to longer musical or speech phrases. Finally, in most cases for the judgement of spatial attributes, head movement was in the direction of source direction.

  • Mason R, Harrington S. (2007) 'Perception and detection of auditory offsets with single simple musical stimuli in a reverberant environment'. Proceedings of the AES International Conference,

    Abstract

    It is apparent that little research has been undertaken into the perception and automated detection of auditory offsets compared to auditory onsets. A study was undertaken which took a perceptually motivated approach to the detection of auditory offsets. Firstly, a subjective experiment was completed that investigated the effect of: the sound source temporal properties; the presence or absence of reverberation; the direct to reverberant level; and the presence or absence of binaural cues on the perceived auditory offset time. It was found in this case that: the sound source temporal properties had a small effect; the presence of reverberation caused the perceived auditory offset to be later in most cases; the direct to reverberant ratio had no significant effect; and the binaural cues had no significant effect on the perceived offset times. Measurements were conducted which showed that the -30dB threshold below the peak level of the slowest decaying frequency bands could be used as a reasonable predictor of the subjective results.

  • Mason R. (2006) 'Implementation and application of a binaural hearing model to the objective evaluation of spatial impression'. Pitea, Sweden : Sound: 28th Audio Engineering Society Internation Conference, pp. 331-342.

    Abstract

    A binaural hearing model has been developed over a number of years that predicts the perceived width and position of sounds, over frequency and over time. The most appropriate methods for applying this model to evaluations of spatial impression are considered, including suitable test signals. Examples of a range of measurements are shown in a range of situations.

  • Mason R, Rumsey F, Zielinski S. (2005) 'PCA-based down-mixing, Workshop on 5.1 Downmix in Practice'. New York, USA : Audio Engineering Society New York, USA: 119th Audio Engineering Society Convention
  • Mason R. (2005) 'Preference-based testing and reference-based testing'. New York: Workshop on Automotive Sound Systems Part II: Considerations in Methodology and Sound Quality Attributes for Subjective Evaluations, 119th Audio Engineering Society Convention
  • Mason R, Rumsey F, Zielinski SK. (2005) 'PCA-based down-mixing'. New York: Workshop on 5.1 Downmix in Practice, 119th Audio Engineering Society Convention
  • Mason R. (2005) 'Preference-based testing and reference-based testing, Workshop on Automotive Sound Systems Part II: Considerations in Methodology and Sound Quality Attributes for Subjective Evaluations'. New York, USA : Audio Engineering Society New York, USA: 119th Audio Engineering Society Convention
  • Mason R, Brookes T, Rumsey F. (2004) 'Spatial impression: measurement and perception of concert hall acoustics and reproduced sound'. Hyogo, Japan : Proceedings of the International Symposium on Room Acoustics, Hyogo, Japan: International Symposium on Room Acoustics: Design and Science

    Abstract

    Auditory width measurements based on the interaural cross-correlation coefficient (IACC) are often used in the field of concert hall acoustics. However, there are a number of problems with such measurements, including large variations around the centre of a room and a limited range of values at low frequencies. This paper explores how some of these problems can be solved by applying the IACC in a more perceptually valid manner and using it as part of a more complete hearing model. It is proposed that measurements based on the IACC may match the perceived width of stimuli more accurately if a source signal is measured rather than an impulse response, and when factors such as frequency and loudness are taken into account. Further developments are considered, including methods to integrate the results calculated in different frequency bands, and the temporal response of spatial perception

  • Mason R, Brookes T, Rumsey F. (2004) 'Evaluation of an auditory source width prediction model based on the interaural cross-correlation coefficient'. San Diego, California : Journal of the Acoustical Society of America, Sound: 148th meeting of the Acoustical Society of America 116

    Abstract

    A model based on the interaural cross-correlation coefficient (IACC) has been developed that aims to predict the perceived source width of a wide range of sounds. The following factors differentiate it from more commonly used IACC-based measurements: the use of a running measurement to quantify variations in width over time; half-wave rectification and low pass filtering of the input signal to mimic the breakdown of phase locking in the ear; compensation for the frequency and loudness dependency of perceived width; combination of a model of perceived location with a model of perceived width; and conversion of the results to an intuitive scale. Objective and subjective methods have been used to evaluate the accuracy and limitations of the resulting measurement model.

  • Mason R, Brookes T, Rumsey F. (2004) 'Integration of measurements of interaural cross-correlation coefficient and interaural time difference within a single model of perceived source width'. San Francisco, USA : Audio Engineering Society Preprint, San Francisco: 117th Audio Engineering Society Convention 6137

    Abstract

    A measurement model based on the interaural cross-correlation coefficient (IACC) that attempts to predict the perceived source width of a range of auditory stimuli is currently under development. It is necessary to combine the predictions of this model with measurements of interaural time difference (ITD) to allow the model to provide its output on a meaningful scale and to allow integration of results across frequency. A detailed subjective experiment was undertaken using narrow-band stimuli with a number of centre frequencies, IACCs and ITDs. Subjects were asked to indicate the perceived position of the left and right boundaries of a number of these stimuli by altering the ITD of a pair of white noise comparison stimuli. It is shown that an existing IACC-based model provides a poor prediction of the subjective results but that modifications to the model significantly increase its accuracy.

  • Brookes T, Mason R. (2004) 'Perceptually Motivated Measurement and Control of Digital Music'. York : Sound: The Future of Audio: Digital Music in 2010 (DMRN Conference)
  • Mason R, Brookes T. (2004) 'Perception, measurement and synthesis of spatial impression'. London : Sound: Audio Engineering Society British Section Lecture
  • Mason R, Brookes T, Rumsey F. (2004) 'Development of the interaural cross-correlation coefficient into a more complete auditory width prediction model'. Kyoto, Japan : International Congress on Acoustics Proceedings of the 18th International Congress on Acoustics, Kyoto, Japan: 18th International Congress on Acoustics IV, pp. 2453-2456.

    Abstract

    Auditory width measurements based on the interaural cross-correlation coefficient (IACC) are often used in the field of concert hall acoustics. However, there are a number of problems with such measurements, including large variations around the centre of a room and a limited range of values at low frequencies. This paper explores how some of these problems can be solved by applying the IACC in a more perceptually valid manner and using it as part of a more complete hearing model. It is proposed that measurements based on the IACC may match the perceived width of stimuli more accurately if a source signal is measured rather than an impulse response, and when factors such as frequency and loudness are taken into account. Further developments are considered, including methods to integrate the results calculated in different frequency bands, and the temporal response of spatial perception

  • Mason R, Brookes T, Rumsey F. (2003) 'Creation and verification of a controlled experimental stimulus for investigating selected perceived spatial attributes'. Amsterdam : Audio Engineering Society Preprint, Amsterdam: 114th Audio Engineering Society Convention 5771

    Abstract

    In order to undertake controlled investigations into perceptual effects that relate to the interaural cross-correlation coefficient, experiment stimuli that meet a tight set of criteria are required. The requirements of each stimulus are that it is narrow band, normally has a constant cross-correlation coefficient over time, and can be altered to cover the full range of values of cross-correlation coefficient, including specified variations over time if required. Stimuli created using a technique based on amplitude modulation are found to meet these criteria, and their use in a number of subjective experiments is described.

  • Mason R, Rumsey F. (2002) 'A comparison of objective measurements for predicting selected subjective spatial attributes'. Audio Engineering Society Audio Engineering Society Preprint, Munich, Germany: 112th Audio Engineering Society Convention 5591

    Abstract

    A controlled subjective experiment was undertaken to evaluate the relative merits of objective measurement techniques for predicting selected perceived spatial attributes of reproduced sound. The stimuli consisted of a number of anechoic recordings of single sound sources that were reproduced in a simulated concert hall and captured using a number of simulated multichannel microphone techniques. These were reproduced in a listening room and the subjects were asked to judge the perceived source width and perceived environment width of each stimulus. A number of objective measurements were made at the listening position and these were then compared with the subjective judgements. The results showed that a perceptually-grouped measurement of the experimental stimuli using a technique based on the interaural cross-correlation coefficient matched the subjective judgements most accurately, though the difference between this measurement and a number of other types was small.

  • Mason R, Brookes T, Rumsey F. (2002) 'The perceptual relevance of extant techniques for the objective measurement of spatial impression'. London : Proceedings of the Institute of Acoustics, London: Auditorium Acoustics 2002 Conference 24
  • Mason R. (2001) 'Physical measures related to spatial attributes, Workshop on Evaluation of Spatial Sound Reproduction'. Amsterdam, The Netherlands : Audio Engineering Society Amsterdam, Netherlands: 110th Audio Engineering Society Convention
  • Mason R, Rumsey F, de Bruyn B. (2001) 'An investigation of interaural time difference fluctuations, part 2: dependence of the subjective effect on audio frequency.'. Amsterdam : Audio Engineering Society Preprint, Amsterdam: 110th Audio Engineering Society Convention 5389

    Abstract

    The effect of the audio frequency of narrow-band noise signals with a sinusoidal ITD fluctuation was investigated. To examine this, a subjective experiment was carried out using a match to sample method and stimuli delivered over headphones. It was found that the magnitude of the subjective effect is dependent on audio frequency and that the relationship between the audio frequency and a constant subjective effect appears to be based on equal maximum phase difference fluctuations.

  • Mason R, Rumsey F. (2001) 'Interaural time difference fluctuations: their measurement, subjective perceptual effect and application in sound reproduction.'. Schloss Elmau Germany : Proceedings of the 19th International Audio Engineering Society Conference, Elmau, Germany: 19th International Audio Engineering Society Conference, pp. 252-271.

    Abstract

    Two objective measurement techniques have been proposed that relate the fluctuations in interaural time difference to one or more attributes of subjective spatial perception. This paper reviews these measurements, discusses how these fluctuations may be created in a real acoustical environment, summarises the experiments carried out to elicit the subjective effect of the fluctuations, and suggests ways in which this research can be applied to sound reproduction.

  • Mason R, Rumsey F, de Bruyn B. (2001) 'An investigation of interaural time difference fluctuations, part 1: the subjective spatial effect of fluctuations delivered over headphones.'. Audio Engineering Society Audio Engineering Society Preprint, Amsterdam, Netherlands: 110th Audio Engineering Society Convention 5383

    Abstract

    The subjective spatial effect of noise signals with sinusoidal ITD fluctuations was investigated. Both verbal and non-verbal elicitation experiments were carried out to examine the subjective effect of the ITD fluctuations with a number of fluctuation frequencies and fluctuation magnitudes. It was found that the predominant effect of increasing the fluctuation magnitude was an increase in the perceived width of the sound.

  • Mason R, Rumsey F, de Bruyn B. (2001) 'An investigation of interaural time difference fluctuations, part 3: the subjective effect of fluctuations in continuous stimuli delivered over loudspeakers'. Audio Engineering Society Preprint, New York, USA: 111th Audio Engineering Society Convention 5457

    Abstract

    The subjective spatial effect of continuous noise signals with interaural time difference fluctuations was investigated. These fluctuations were created by sinusoidal interchannel time difference fluctuations between signals that were presented over loudspeakers. Both verbal and non-verbal elicitation techniques were applied to examine the subjective effect. It was found that the predominant effect of increasing the fluctuation magnitude was an increase in the apparent width of the perceived sound source.

  • Mason R, Rumsey F, de Bruyn B. (2001) 'An investigation of interaural time difference fluctuations, part 4: the subjective effect of fluctuations in decaying stimuli delivered over loudspeakers'. Audio Engineering Society Preprint, New York: 111th Audio Engineering Society Convention 5458

    Abstract

    The subjective spatial effect of decaying noise signals with interaural time difference fluctuations was investigated. These fluctuations were created by sinusoidal interchannel time difference fluctuations between signals which were presented over loudspeakers. Both verbal and non-verbal elicitation techniques were applied to examine the subjective effect. It was found that the predominant effect of increasing the fluctuation magnitude was an increase in the apparent width of the acoustical environment whilst the apparent size of the perceived sound source did not change.

  • Mason R, Rumsey F. (2000) 'An assessment of spatial performance of virtual home theatre algorithms by subjective and objective methods'. Audio Engineering Society Audio Engineering Society Preprint, Paris, France: 108th Audio Engineering Society Convention 5137

    Abstract

    A controlled subjective test was carried out to assess selected spatial qualities of three virtual home theatre processors. The subjective results were used to evaluate a number of objective measurements based on the interaural cross-correlation coefficient (IACC). A novel implementation of the IACC was found which appears to correlate well with the subjective data.

  • Mason R, Ford N, Rumsey F, de Bruyn B. (2000) 'Verbal and non-verbal elicitation techniques in the subjective assessment of spatial sound reproduction'. Audio Engineering Society Preprint, Los Angeles, USA: 109th Audio Engineering Society Convention 5225

    Abstract

    Current research into spatial audio has shown an increasing interest in the way subjective attributes of reproduced sound are elicited from listeners. The emphasis at present is on verbal semantics, however studies suggest that non-verbal methods of elicitation could be beneficial. Research into the relative merits of these methods has found that non-verbal responses may result in different elicited attributes compared to verbal techniques. Non-verbal responses may be closer to the perception of the stimuli than the verbal interpretation of this perception. There is evidence that drawing is not as accurate as other non-verbal methods of elicitation when it comes to reporting the localisation of auditory images. However, the advantage of drawing is its ability to describe the whole auditory space rather than a single dimension.

  • Mason R, Rumsey F. (1999) 'An investigation of microphone techniques for ambient sound in surround sound systems'. Munich, Germany : Audio Engineering Society Audio Engineering Society Preprint, Munich, Germany: 106th Audio Engineering Society Convention 4912

    Abstract

    A controlled subjective test was carried out to assess selected qualities of three ambient microphone techniques for surround sound The effects of signal delay and microphone distance were explored. The tests indicate that the perceived results are programme dependent, but that a compromise can be found using delayed close microphones, giving similar quality for the range of programme items used.

  • Mason R. (1999) 'Microphone techniques for multichannel surround sound'. London, UK : Audio Engineering Society Proceedings of the 1999 AES UK conference, Audio: the second century, London, UK: Audio Engineering Society UK Conference, Audio: the second century, pp. 15-24.

    Abstract

    A controlled subjective test was carried out to assess selected qualities of three microphone techniques for capturing the ambient sound of a concert hall surround sound. The effects of signal delay between the front and rear channels and microphone distance were explored. The tests indicate that the perceived results are programme-dependent, but that a compromise can be found using delayed close microphones, giving similar quality for the range of programme items used.

Posters

  • Pike C, Mason RD, Brookes TS. (2014) Auditory adaptation to static spectra. UKSpeech Conference, Edinburgh

    Abstract

    Auditory adaptation is thought to reduce the perceptual impact of static spectral energy and increase sensitivity to spectral change. Research suggests that this adaptation helps listeners to extract stable speech cues across different talkers, despite inter-talker spectral variations caused by differing vocal tract acoustics. This adaptation may also be involved in compensation for distortions caused by transmission channels more generally (e.g. distortions caused by the room or loudspeaker through which a sound has passed). The magnitude of this adaptation and its ecological importance has not been established. The physiological and psychological mechanisms behind adaptation are also not well understood. The current research aimed to confirm that adaptation to transmission channel spectrum occurs when listening to speech produced though two types of transmission channel: loudspeakers and rooms. The loudspeaker is analogous to the vocal tract of a talker, imparting resonances onto a sound source which reaches the listener both directly and via reflections. The room-affected speech however, reaches the listener only via reflections – there is no direct path. Larger adaptation to the spectrum of the room was found, compared to adaptation to the spectrum of the loudspeaker. It appears that when listening to speech, mechanisms of adaptation to room reflections, and adaptation to loudspeaker/vocal tract spectrum, may be different.

  • Evans W, Mason R, Brookes T. (2010) A system for the auralisation of synthetic sound scenes using headphones and head-tracking. University of Surrey Postgraduate Research Conference

    Abstract

    Auralisation is the process of rendering virtual sound fields. It is used in areas including: acoustic design, defence, gaming and audio research. As part of a PhD project concerned with the influence of loudspeaker directivity on the perception of reproduced sound, a fully-computed auralisation system has been developed. For this, acoustic modelling software is used to synthesise and extract binaural impulse responses of virtual rooms. The resulting audio is played over headphones and allows listeners to experience the excerpt being reproduced within the synthesised environment. The main advance with this system is that impulse responses are calculated for a number of head positions, which allows the listeners to move when listening to the recreated sounds. This allows for a much more realistic simulation, and makes it especially useful for conducting subjective experiments on sound reproduction systems and/or acoustical environments which are either not available or are even impractical to create. Hence, it greatly increases the range and type of experiments that can be undertaken at Surrey. The main components of the system are described, together with the results from a validation experiment which demonstrate that this system provides similar results to experiments conducted previously using loudspeakers in an anechoic chamber.

  • Simon LSR, Mason R. (2010) Development of a novel surround sound format and associated microphone techniques. University of Surrey Postgraduate Research Conference

    Abstract

    The most common surround sound format (often known as 5.1) does not enable accurate positioning of sounds to the side or the rear. Based on a detailed analysis of the binaural hearing cues used by humans, a new surround sound loudspeaker format has been developed using 8 loudspeakers arranged in a regular octagon. Listening tests have been conducted to demonstrate the superiority of this setup compared to 5.1 in terms of accurate sound positioning around a listener. In order to enable development of microphone techniques to capture soundfields for this reproduction system, localisation curves needed to be derived that map the relationship between a range of interchannel time and levels differences of signals (ICTDs and ICLDs respectively) and the perceived sound location. Various signals with a range of ICLDs and ICTDs were produced between pairs of adjacent loudspeakers, and listeners were asked to evaluate the perceived sound's direction and its locatedness. The results showed that the curves for the side pairs of adjacent loudspeakers are significantly different to the front and rear pairs. The resulting curves have been used to derive suitable microphone techniques for this loudspeaker setup.

  • Hummersone C, Mason R, Brookes T. (2010) A perceptually–inspired approach to machine sound source separation in real rooms. University of Surrey Postgraduate Research Conference

Theses and dissertations

  • Mason R. (2002) Elicitation and measurement of auditory spatial attributes in reproduced sound.

    Abstract

    This thesis has investigated objective measurements that relate to the perceived spatial attributes of reproduced sound. Research has been conducted into extant measurements that aim to quantify the perceived spatial attributes of concert hall acoustics, and those that are most likely to be successful for measuring the properties of reproduced sound have been identified. A relatively new measurement technique that may relate to the spatial perception of reproduced sound has also been analysed. This measurement is based on the quantification of the magnitude of fluctuations in interaural time and level difference. This has been investigated in detail, and the subjective effect that this measurement relates to has been elicited in a number of subjective experiments. The experiments used various types of noise stimuli that contained a range of fluctuations in interaural time difference. It was found that when the fluctuations are contained within a part of the signal that is perceived to be a sound source, a variation in the magnitude of the fluctuations alters the perceived width of that source. When the fluctuations are contained within a part of the signal that is perceived to be reverberation, a variation in the magnitude of the fluctuations alters the perceived width of the acoustical environment. This research has been applied to the development of novel objective measurement techniques, and to the specification of the subjective attributes that relate to these techniques. A final evaluation experiment has found that listeners can relate to the attribute descriptors that have been elicited, and that the novel objective measurement techniques that have been developed match the subjective data at least as well as the extant measurement techniques.

Teaching

Current teaching responsibilities:

  • Audio Engineering 1
  • Audio Engineering 2
  • Audio Engineering 3
  • Technical Projects

Page Owner: mus1rm
Page Created: Monday 21 November 2016 09:33:52 by rxserver
Last Modified: Monday 21 November 2016 10:17:24 by pj0010
Assembly date: Tue Mar 28 09:35:45 BST 2017
Content ID: 168203
Revision: 1
Community: 1201

Rhythmyx folder: //Sites/surrey.ac.uk/DMM/People
Content type: rx:StaffProfile