My research on the S3A project is focussed on the listener experience of spatial audio reproduction. I am working on determining the perceptual characteristics of spatial audio reproduction that are important to listeners. Knowledge of these important perceptual characteristics will enable the creation of relevant predictive models, which could be used by content producers in meters, system designers for perceptual evaluation, or as part of reproduction algorithms for producing perceptually optimal solutions.
My research on the POSZ project (which aimed to produce multiple listening regions (sound zones) over loudspeakers, and to evaluate and optimise these sound zones in a perceptually-relevant manner) focussed on qualitative and quantitative evaluation of interference between multiple audio programmes. This included determining the perceptually relevant attributes of the listener experience in an audio-on-audio interference situation, and subsequently the development of a predictive model of distraction. I also worked on integrating the distraction model into sound zone production algorithms.
Technical ear training (1st year Music and Sound Recording). I deliver small group seminars including theory and audio examples covering areas such as frequency equalisation, audio artefacts, reverberation, recording balance, spatial audio, and digital audio.
Find me on campus Room: 09 BC 03
There are a wide variety of spatial audio reproduction systems available, from a single loudspeaker to many spatially distributed loudspeakers. An important factor in the selection, development, or optimization of such systems is listener preference, and the important perceptual characteristics that contribute to this. An experiment was performed to determine the attributes that contribute to listener preference for a range of spatial audio reproduction methods. Experienced and inexperienced listeners made preference ratings for combinations of seven program items replayed over eight reproduction systems, and reported the reasons for their judgments. Automatic text clustering reduced redundancy in the responses by approximately 90%, facilitating subsequent group discussions that produced clear attribute labels, descriptions, and scale end-points. Twenty-seven and twenty-four attributes contributed to preference for the experienced and inexperienced listeners respectively. The two sets of attributes contain a degree of overlap (ten attributes from the two sets were closely related); however, the experienced listeners used more technical terms whilst the inexperienced listeners used more broad descriptive categories.
It is desirable to determine which of the many different spatial audio reproduction systems listeners prefer, and the perceptual attributes that are most important to listener experience, so that future systems can be perceptually optimized. A paired comparison preference rating experiment was performed alongside a free elicitation task for eight reproduction methods (consumer and professional systems with a wide range of expected quality) and seven program items (representative of potential broadcast material). The experiment was performed by groups of experienced and inexperienced listeners. Thurstone Case V modeling was used to produce preference scales. Both listener groups preferred systems with increased spatial content; nineand five-channel systems were most preferred. The use of elicited attributes was analyzed alongside the preference ratings, resulting in an approximate hierarchy of attribute importance: three attributes (amount of distortion, output quality, and bandwidth) were found to be important for differentiating systems where there was a large preference difference; sixteen were always important (most notably enveloping and horizontal width); and seven were used alongside small preference differences.
Sound zone systems aim to produce regions within a room where listeners may consume separate audio programs with minimal acoustical interference. Often, there is a trade-off between the acoustic contrast achieved between the zones, and the fidelity of the reproduced audio program (the target quality). An open question is whether reducing contrast (i.e. allowing greater interference) can improve target quality. The planarity control sound zoning method can be used to improve spatial reproduction, though at the expense of decreased contrast. Hence, this can be used to investigate the relationship between target quality (which is affected by the spatial presentation) and distraction (which is related to the perceived effect of interference). An experiment was conducted investigating target quality and distraction, and examining their relationship with overall quality within sound zones. Sound zones were reproduced using acoustic contrast control, planarity control and pressure matching applied to a circular loudspeaker array. Overall quality was related to target quality and distraction, each having a similar magnitude of effect; however, the result was dependent upon program combination. The highest mean overall quality was a compromise between distraction and target quality, with energy arriving from up to 15 degrees either side of the target direction.
As devices that produce audio become more commonplace and increasingly portable, situations in which two competing audio programmes are present occur more regularly. In order to support the design of systems intended to mitigate the effects of interfering audio (including sound field control, noise cancellation or source separation systems), it is desirable to model the perceived distraction in such situations. Distraction ratings were collected for a range of audio-on-audio interference situations including various target and interferer programmes at three interferer levels, with and without road noise. Time-frequency target-to-interferer ratio (TIR) maps of the stimuli were created using a simple auditory model. A number of feature sets were extracted from the TIR maps, including combinations of mean, standard deviation, minimum and maximum TIR taken across the duration of the programme item. In order to predict distraction ratings from the features, linear regression models were produced. The models were evaluated for goodness-of-fit (RMSE) and generalizability (using a K-fold cross-validation procedure). The best model performed well, with almost all predictions falling within the 95% confidence intervals of the perceptual data. A validation data set was used to test the model, suggesting areas for future improvement. © 2013 Acoustical Society of America.
There are many spatial audio reproduction systems currently in domestic use (e.g. mono, stereo, surround sound, sound bars, and headphones). In an experiment, pairwise pref-erence magnitude ratings for a range of such systems were collected from trained and untrained listeners. The ratings were analysed using internal preference mapping to: (i) uncover the principal perceptual dimensions of listener preference; (ii) label the dimensions based on the important perceptual attributes; and (iii) observe differences between trained and untrained listeners. To aid with labelling the dimensions, perceptual attributes were elicited alongside the preference ratings and were analysed by: (i) considering a metric derived from the frequency of use of each attribute and the magnitude of the related preference judgements; and (ii) observing attribute use for comparisons between speciﬁc methods. The ﬁrst preference dimension accounted for the vast majority of the variance in ratings; it was related to multiple important attributes, including those associated with spatial capability and freedom from distortion. All participants exhibited a preference for reproduction methods that were positively correlated with the ﬁrst dimension (most notably 5-, 9-, and 22-channel surround sound). The second dimension accounted for only a very small proportion of the variance, and appeared to separate the headphone method from the other methods. The trained and untrained listeners generally showed opposite preferences in the second dimension, suggesting that trained listeners have a higher preference for headphone reproduction than untrained listeners.
For many audio applications, availability of recorded multi-channel room impulse responses (MC-RIRs) is fundamental. They enable development and testing of acoustic systems for reflective rooms. We present multiple MC-RIR datasets recorded in diverse rooms, using up to 60 loudspeaker positions and various uniform compact microphone arrays. These datasets complement existing RIR libraries and have dense spatial sampling of a listening position. To reveal the encapsulated spatial information, several state of the art room visualization methods are presented. Results confirm the measurement fidelity and graphically depict the geometry of the recorded rooms. Further investigation of these recordings and visualization methods will facilitate object-based RIR encoding, integration of audio with other forms of spatial information, and meaningful extrapolation and manipulation of recorded compact microphone array RIRs.
For subjective experimentation on 3D audio systems, suitable programme material is needed. A large-scale recording session was performed in which four ensembles were recorded with a range of existing microphone techniques (aimed at mono, stereo, 5.0, 9.0, 22.0, ambisonic, and headphone reproduction) and a novel 48-channel circular microphone array. Further material was produced by remixing and augmenting pre-existing multichannel content. To mix and monitor the programme items (which included classical, jazz, pop and experimental music, and excerpts from a sports broadcast and a lm soundtrack), a flexible 3D audio reproduction environment was created. Solutions to the following challenges were found: level calibration for different reproduction formats; bass management; and adaptable signal routing from different software and fille formats.
Whilst sound zoning methods have typically been studied under anechoic conditions, it is desirable to evaluate the performance of various methods in a real room. Three control methods were implemented (delay and sum, DS; acoustic contrast control, ACC; and pressure matching, PM) on two regular 24-element loudspeaker arrays (line and circle). The acoustic contrast between two zones was evaluated and the reproduced sound fields compared for uniformity of energy distribution. ACC generated the highest contrast, whilst PM produced a uniform bright zone. Listening tests were also performed using monophonic auralisations from measured system responses to collect ratings of perceived distraction due to the alternate audio programme. Distraction ratings were affected by control method and programme material. Copyright © (2013) by the Audio Engineering Society.
Sound eld control methods can be used to create multiple zones of audio in the same room. Separation achieved by such systems has classically been evaluated using physical metrics including acoustic contrast and target-to-interferer ratio (TIR). However, to optimise the experience for a listener it is desirable to consider perceptual factors. A search procedure was used to select 5 loudspeakers for production of 2 sound zones using acoustic contrast control. Comparisons were made between searches driven by physical (programme-independent TIR) and perceptual (distraction predictions from a statistical model) cost func- Tions. Performance was evaluated on TIR and predicted distraction in addition to subjective ratings. The perceptual cost function showed some benefits over physical optimisation, although the model used needs further work. Copyright © (2013) by the Audio Engineering Society.
An experiment was performed in order to establish the threshold of acceptability for an interfering audio programme on a target audio programme, varying the following physical parameters: target programme, interferer programme, interferer location, interferer spectrum, and road noise level. Factors were varied in three levels in a Box-Behnken fractional factorial design. The experiment was performed in three scenarios: information gathering, entertainment, and reading/working. Nine listeners performed a method of adjustment task to determine the threshold values. Produced thresholds were similar in the information and entertainment scenarios, however there were significant differences between subjects, and factor levels also had a significant effect: interferer programme was the most important factor across the three scenarios, whilst interferer location was the least important.
Page Owner: jf0038
Page Created: Monday 21 November 2016 09:33:24 by rxserver
Last Modified: Monday 21 November 2016 10:36:14 by pj0010
Expiry Date: Saturday 19 January 2013 11:23:54
Assembly date: Tue Mar 21 09:30:35 GMT 2017
Content ID: 168111