Dr Philip Coleman

Lecturer in Audio

Qualifications: MSc, PhD

Email:
Phone: Work: 01483 68 6622
Room no: 08A BC 03

Further information

Biography

I joined the Institute of Sound Recording as a Lecturer in 2017. Previously, I worked in the Centre for Vision, Speech and Signal Processing (CVSSP) as a Research Fellow on the S3A project: future spatial audio for an immersive listening experience at home. My research expertise are in the areas of loudspeaker/microphone array processing, sound zones, object-based audio, reverberation, and production tools for 3D spatial audio.

I received my PhD from CVSSP in 2014 under the supervision of Dr Philip Jackson, as part of the Perceptually Optimized Sound Zones (POSZ) project. Previously, I received an MSc in Multimedia Signal Processing and Communications from the University of Surrey, and a BEng degree in Electronic Engineering with Music Technology Systems at the University of York.

Publications

Highlights

  • Baykaner K, Coleman P, Mason R, Jackson PJB, Francombe J, Olik M, Bech S. (2015) 'The relationship between target quality and interference in sound zones'. Journal of the Audio Engineering Society, 63 (1/2), pp. 78-89.

    Abstract

    Sound zone systems aim to produce regions within a room where listeners may consume separate audio programs with minimal acoustical interference. Often, there is a trade-off between the acoustic contrast achieved between the zones, and the fidelity of the reproduced audio program (the target quality). An open question is whether reducing contrast (i.e. allowing greater interference) can improve target quality. The planarity control sound zoning method can be used to improve spatial reproduction, though at the expense of decreased contrast. Hence, this can be used to investigate the relationship between target quality (which is affected by the spatial presentation) and distraction (which is related to the perceived effect of interference). An experiment was conducted investigating target quality and distraction, and examining their relationship with overall quality within sound zones. Sound zones were reproduced using acoustic contrast control, planarity control and pressure matching applied to a circular loudspeaker array. Overall quality was related to target quality and distraction, each having a similar magnitude of effect; however, the result was dependent upon program combination. The highest mean overall quality was a compromise between distraction and target quality, with energy arriving from up to 15 degrees either side of the target direction.

  • Coleman P, Jackson P, Olik M, Pedersen JA. (2014) 'Personal audio with a planar bright zone'. Journal of the Acoustical Society of America, 136 (4), pp. 1725-1735.

    Abstract

    Reproduction of multiple sound zones, in which personal audio programs may be consumed without the need for headphones, is an active topic in acoustical signal processing. Many approaches to sound zone reproduction do not consider control of the bright zone phase, which may lead to self-cancellation problems if the loudspeakers surround the zones. Conversely, control of the phase in a least-squares sense comes at a cost of decreased level difference between the zones and frequency range of cancellation. Single-zone approaches have considered plane wave reproduction by focusing the sound energy in to a point in the wavenumber domain. In this article, a planar bright zone is reproduced via planarity control, which constrains the bright zone energy to impinge from a narrow range of angles via projection in to a spatial domain. Simulation results using a circular array surrounding two zones show the method to produce superior contrast to the least-squares approach, and superior planarity to the contrast maximization approach. Practical performance measurements obtained in an acoustically treated room verify the conclusions drawn under free-field conditions.

  • Coleman P, Jackson PJB, Olik M, Møller M, Olsen M, Pedersen JA. (2014) 'Acoustic contrast, planarity and robustness of sound zone methods using a circular loudspeaker array'. Journal of the Acoustical Society of America, 135 (4), pp. 1929-1940.

    Abstract

    Since the mid 1990s, acoustics research has been undertaken relating to the sound zone problem—using loudspeakers to deliver a region of high sound pressure while simultaneously creating an area where the sound is suppressed—in order to facilitate independent listening within the same acoustic enclosure. The published solutions to the sound zone problem are derived from areas such as wave field synthesis and beamforming. However, the properties of such methods differ and performance tends to be compared against similar approaches. In this study, the suitability of energy focusing, energy cancelation, and synthesis approaches for sound zone reproduction is investigated. Anechoic simulations based on two zones surrounded by a circular array show each of the methods to have a characteristic performance, quantified in terms of acoustic contrast, array control effort and target sound field planarity. Regularization is shown to have a significant effect on the array effort and achieved acoustic contrast, particularly when mismatched conditions are considered between calculation of the source weights and their application to the system.

Journal articles

  • Zhu Q, Coleman P , Wu M, Yang J. (2017) 'Robust reproduction of sound zones with local sound orientation'. The Journal of the Acoustical Society of America, 142 (1), pp. EL118-EL122.

    Abstract

    Pressure matching (PM) and planarity control (PC) methods can be used to re- produce local sound with a certain orientation at the listening zone, while suppressing the sound energy at the quiet zone. In this letter, regularized PM and PC, incorporating coarse error estimation, are introduced to increase the robustness in non-ideal reproduction scenarios. Facilitated by this, the interaction between regularization, robustness, (tuned) personal audio optimization and local directional performance is explored. Simulations show that under certain conditions, PC and weighted PM achieve comparable performance, while PC is more robust to a poorly selected regularization parameter.

  • Zhu Q, Coleman P , Wu M, Yang J. (2017) 'Robust Acoustic Contrast Control with Reduced In-situ Measurement by Acoustic Modelling'. Journal of the Audio Engineering Society, 65 (6), pp. pp. 460-473.

    Abstract

    Personal audio systems generate a local sound field for a listener while attenuating the sound energy at pre-defined quiet zones. In practice, system performance is sensitive to errors in the acoustic transfer functions between the sources and the zones. Regularization is commonly used to improve robustness, however, selecting a regularization parameter is not always straightforward. In this paper, a design framework for robust reproduction is proposed, combining transfer function and error modelling. The framework allows a physical perspective on the regularization required for a system, based on the bound of assumed additive or multiplicative errors, which is obtained by acoustic modelling. Acoustic contrast control is separately combined with worst-case and probability-model optimization, exploiting limited knowledge of the potential error distribution. Monte-Carlo simulations show that these approaches give increased system robustness compared to the state of the art approaches for regularization parameter estimation, and experimental results verify that robust sound zone control is achieved in the presence of loudspeaker gain errors. Furthermore, by applying the proposed framework, in-situ transfer function measurements were reduced to a single measurement per loudspeaker, per zone, with limited acoustic contrast degradation of less than 2 dB over 100–3000 Hz compared to the fully measured regularized case.

  • Coleman P, Franck A, Jackson P, Hughes R, Remaggi L, Melchior F. (2017) 'Object-Based Reverberation for Spatial Audio'. Journal of the Audio Engineering Society, 65 (1/2), pp. 66-77.

    Abstract

    Object-based audio is gaining momentum as a means for future audio content to be more immersive, interactive, and accessible. Recent standardization developments make recommendations for object formats, however, the capture, production and reproduction of reverberation is an open issue. In this paper, parametric approaches for capturing, representing, editing, and rendering reverberation over a 3D spatial audio system are reviewed. A framework is proposed for a Reverberant Spatial Audio Object (RSAO), which synthesizes reverberation inside an audio object renderer. An implementation example of an object scheme utilising the RSAO framework is provided, and supported with listening test results, showing that: the approach correctly retains the sense of room size compared to a convolved reference; editing RSAO parameters can alter the perceived room size and source distance; and, format-agnostic rendering can be exploited to alter listener envelopment.

  • Remaggi L, Jackson PJB, Coleman P , Wang W . (2017) 'Acoustic Reflector Localization: Novel Image Source Reversion and Direct Localization Methods'. IEEE Transactions on Audio, Speech and Language Processing, 25 (2), pp. pp. 296-309.

    Abstract

    Acoustic reflector localization is an important issue in audio signal processing, with direct applications in spatial audio, scene reconstruction, and source separation. Several methods have recently been proposed to estimate the 3D positions of acoustic reflectors given room impulse responses (RIRs). In this article, we categorize these methods as “image-source reversion”, which localizes the image source before finding the reflector position, and “direct localization”, which localizes the reflector without intermediate steps. We present five new contributions. First, an onset detector, called the clustered dynamic programming projected phase-slope algorithm, is proposed to automatically extract the time of arrival for early reflections within the RIRs of a compact microphone array. Second, we propose an image-source reversion method that uses the RIRs from a single loudspeaker. It is constructed by combining an image source locator (the image source direction and range (ISDAR) algorithm), and a reflector locator (using the loudspeaker-image bisection (LIB) algorithm). Third, two variants of it, exploiting multiple loudspeakers, are proposed. Fourth, we present a direct localization method, the ellipsoid tangent sample consensus (ETSAC), exploiting ellipsoid properties to localize the reflector. Finally, systematic experiments on simulated and measured RIRs are presented, comparing the proposed methods with the state-of-the-art. ETSAC generates errors lower than the alternative methods compared through our datasets. Nevertheless, the ISDAR-LIB combination performs well and has a run time 200 times faster than ETSAC.

  • Baykaner K, Coleman P, Mason R, Jackson PJB, Francombe J, Olik M, Bech S. (2015) 'The relationship between target quality and interference in sound zones'. Journal of the Audio Engineering Society, 63 (1/2), pp. 78-89.

    Abstract

    Sound zone systems aim to produce regions within a room where listeners may consume separate audio programs with minimal acoustical interference. Often, there is a trade-off between the acoustic contrast achieved between the zones, and the fidelity of the reproduced audio program (the target quality). An open question is whether reducing contrast (i.e. allowing greater interference) can improve target quality. The planarity control sound zoning method can be used to improve spatial reproduction, though at the expense of decreased contrast. Hence, this can be used to investigate the relationship between target quality (which is affected by the spatial presentation) and distraction (which is related to the perceived effect of interference). An experiment was conducted investigating target quality and distraction, and examining their relationship with overall quality within sound zones. Sound zones were reproduced using acoustic contrast control, planarity control and pressure matching applied to a circular loudspeaker array. Overall quality was related to target quality and distraction, each having a similar magnitude of effect; however, the result was dependent upon program combination. The highest mean overall quality was a compromise between distraction and target quality, with energy arriving from up to 15 degrees either side of the target direction.

  • Olik M, Jackson PJ, Coleman P, Pedersen JA. (2014) 'Optimal source placement for sound zone reproduction with first order reflections.'. J Acoust Soc Am, United States: 136 (6)

    Abstract

    The problem of delivering personal audio content to listeners sharing the same acoustic space has recently attracted attention. It has been shown that a perceptually acceptable level of acoustic separation between the listening zones is difficult to achieve with active control in non-anechoic conditions. A common problem of strong first order reflections has not been examined in detail for systems with practical constraints. Acoustic contrast maximization combined with optimization of source positions is identified as a potentially effective control strategy when strong individual reflections occur. An analytic study is carried out to describe the relationship between the performance of a 2 × 2 (two sources and two control sensors) system and its geometry in a single-reflection scenario. The expression for acoustic contrast is used to formulate guidelines for optimizing source positions, based on three distinct techniques: Null-Split, Far-Align, and Near-Align. The applicability of the techniques to larger systems with up to two reflections is demonstrated using numerical optimization. Simulation results show that optimized systems produce higher acoustic contrast than non-optimized source arrangements and an alternative method for reducing the impact of reflections (sound power minimization).

  • Coleman P, Jackson P, Olik M, Pedersen JA. (2014) 'Personal audio with a planar bright zone'. Journal of the Acoustical Society of America, 136 (4), pp. 1725-1735.

    Abstract

    Reproduction of multiple sound zones, in which personal audio programs may be consumed without the need for headphones, is an active topic in acoustical signal processing. Many approaches to sound zone reproduction do not consider control of the bright zone phase, which may lead to self-cancellation problems if the loudspeakers surround the zones. Conversely, control of the phase in a least-squares sense comes at a cost of decreased level difference between the zones and frequency range of cancellation. Single-zone approaches have considered plane wave reproduction by focusing the sound energy in to a point in the wavenumber domain. In this article, a planar bright zone is reproduced via planarity control, which constrains the bright zone energy to impinge from a narrow range of angles via projection in to a spatial domain. Simulation results using a circular array surrounding two zones show the method to produce superior contrast to the least-squares approach, and superior planarity to the contrast maximization approach. Practical performance measurements obtained in an acoustically treated room verify the conclusions drawn under free-field conditions.

  • Coleman P, Jackson PJB, Olik M, Møller M, Olsen M, Pedersen JA. (2014) 'Acoustic contrast, planarity and robustness of sound zone methods using a circular loudspeaker array'. Journal of the Acoustical Society of America, 135 (4), pp. 1929-1940.

    Abstract

    Since the mid 1990s, acoustics research has been undertaken relating to the sound zone problem—using loudspeakers to deliver a region of high sound pressure while simultaneously creating an area where the sound is suppressed—in order to facilitate independent listening within the same acoustic enclosure. The published solutions to the sound zone problem are derived from areas such as wave field synthesis and beamforming. However, the properties of such methods differ and performance tends to be compared against similar approaches. In this study, the suitability of energy focusing, energy cancelation, and synthesis approaches for sound zone reproduction is investigated. Anechoic simulations based on two zones surrounded by a circular array show each of the methods to have a characteristic performance, quantified in terms of acoustic contrast, array control effort and target sound field planarity. Regularization is shown to have a significant effect on the array effort and achieved acoustic contrast, particularly when mismatched conditions are considered between calculation of the source weights and their application to the system.

  • Coleman P, Mo̸ller M, Olsen M, Olik M, Jackson P, Pedersen JA. (2012) 'Performance of optimized sound field control techniques in simulated and real acoustic environments.'. J Acoust Soc Am, 131 (4)

    Abstract

    It is of interest to create regions of increased and reduced sound pressure ('sound zones') in an enclosure such that different audio programs can be simultaneously delivered over loudspeakers, thus allowing listeners sharing a space to receive independent audio without physical barriers or headphones. Where previous comparisons of sound zoning techniques exist, they have been conducted under favorable acoustic conditions, utilizing simulations based on theoretical transfer functions or anechoic measurements. Outside of these highly specified and controlled environments, real-world factors including reflections, measurement errors, matrix conditioning and practical filter design degrade the realizable performance. This study compares the performance of sound zoning techniques when applied to create two sound zones in simulated and real acoustic environments. In order to compare multiple methods in a common framework without unduly hindering performance, an optimization procedure for each method is first used to select the best loudspeaker positions in terms of robustness, efficiency and the acoustic contrast deliverable to both zones. The characteristics of each control technique are then studied, noting the contrast and the impact of acoustic conditions on performance.

Conference papers

  • Blanco Galindo M, Jackson PJB, Coleman P , Remaggi L . (2017) 'Microphone array design for spatial audio object early reflection parametrisation from room impulse responses'. International Institute of Acoustics and Vibration (IIAV) ICSV 24 Proceedings, London, UK: 24th International Congress on Sound and Vibration (ICSV24)
    [ Status: Accepted ]

    Abstract

    Room Impulse Responses (RIRs) measured with microphone arrays capture spatial and nonspatial information, e.g. the early reflections’ directions and times of arrival, the size of the room and its absorption properties. The Reverberant Spatial Audio Object (RSAO) was proposed as a method to encode room acoustic parameters from measured array RIRs. As the RSAO is object-based audio compatible, its parameters can be rendered to arbitrary reproduction systems and edited to modify the reverberation characteristics, to improve the user experience. Various microphone array designs have been proposed for sound field and room acoustic analysis, but a comparative performance evaluation is not available. This study assesses the performance of five regular microphone array geometries (linear, rectangular, circular, dual-circular and spherical) to capture RSAO parameters for the direct sound and early reflections of RIRs. The image source method is used to synthesise RIRs at the microphone positions as well as at the centre of the array. From the array RIRs, the RSAO parameters are estimated and compared to the reference parameters at the centre of the array. A performance comparison among the five arrays is established as well as the effect of a rigid spherical baffle for the circular and spherical arrays. The effects of measurement uncertainties, such as microphone misplacement and sensor noise errors, are also studied. The results show that planar arrays achieve the most accurate horizontal localisation whereas the spherical arrays perform best in elevation. Arrays with smaller apertures achieve a higher number of detected reflections, which becomes more significant for the smaller room with higher reflection density.

  • Coleman P , Blanco Galindo M, Jackson PJB. (2017) 'Comparison of microphone array geometries for multi-point sound field reproduction'. International Institute of Acoustics and Vibration (IIAV) ICSV 24 Proceedings, London, UK: 24th International Congress on Sound and Vibration (ICSV24)
    [ Status: Accepted ]

    Abstract

    Multi-point approaches for sound field control generally sample the listening zone(s) with pressure microphones, and use these measurements as an input for an optimisation cost function. A number of techniques are based on this concept, for single-zone (e.g. least-squares pressure matching (PM), brightness control, planarity panning) and multi-zone (e.g. PM, acoustic contrast control, planarity control) reproduction. Accurate performance predictions are obtained when distinct microphone positions are employed for setup versus evaluation. While, in simulation, one can afford a dense sampling of virtual microphones, it is desirable in practice to have a microphone array which can be positioned once in each zone to measure the setup transfer functions between each loudspeaker and that zone. In this contribution, we present simulation results over a fixed dense set of evaluation points comparing the performance of several multi-point optimisation approaches for 2D reproduction with a 60 channel circular loudspeaker arrangement. Various regular setup microphone arrays are used to calculate the sound zone filters: circular grid, circular, dual-circular, and spherical arrays, each with different numbers of microphones. Furthermore, the effect of a rigid spherical baffle is studied for the circular and spherical arrangements. The results of this comparative study show how the directivity and effective frequency range of multi-point optimisation techniques depend on the microphone array used to sample the zones. In general, microphone arrays with dense spacing around the boundary give better angular discrimination, leading to more accurate directional sound reproduction, while those distributed around the whole zone enable more accurate prediction of the reproduced target sound pressure level.

  • Coleman P , Franck A, Menzies D, Jackson PJB. (2017) 'Object-based reverberation encoding from first-order Ambisonic RIRs'. Audio Engineering Society Proceedings of 142nd AES International Convention, Berlin, Germany: 142nd International Convention; AES Berlin 2017
    [ Status: Accepted ]

    Abstract

    Recent work on a reverberant spatial audio object (RSAO) encoded spatial room impulse responses (RIRs) as object-based metadata which can be synthesized in an object-based renderer. Encoding reverberation into metadata presents new opportunities for end users to interact with and personalize reverberant content. The RSAO models an RIR as a set of early re ections together with a late reverberation filter. Previous work to encode the RSAO parameters was based on recordings made with a dense array of omnidirectional microphones. This paper describes RSAO parameterization from first-order Ambisonic (B-Format) RIRs, making the RSAO compatible with existing spatial reverb libraries. The object-based implementation achieves reverberation time, early decay time, clarity and interaural cross-correlation similar to direct Ambisonic rendering of 13 test RIRs.

  • Coleman P , Jackson PJB. (2017) 'Planarity analysis of room acoustics for object-based reverberation'. The International Institute of Acoustics and Vibration (IIAV) ICSV24 Proceedings, London, UK: 24th International Congress on Sound and Vibration
    [ Status: Accepted ]

    Abstract

    Recent work into 3D audio reproduction has considered the definition of a set of parameters to encode reverberation into an object-based audio scene. The reverberant spatial audio object (RSAO) describes the reverberation in terms of a set of localised, delayed and filtered (early) reflections, together with a late energy envelope modelling the diffuse late decay. The planarity metric, originally developed to evaluate the directionality of reproduced sound fields, is used to analyse a set of multichannel room impulse responses (RIRs) recorded at a microphone array. Planarity describes the spatial compactness of incident sound energy, which tends to decrease as the reflection density and diffuseness of the room response develop over time. Accordingly, planarity complements intensity-based diffuseness estimators, which quantify the degree to which the sound field at a discrete frequency within a particular time window is due to an impinging coherent plane wave. In this paper, we use planarity as a tool to analyse the sound field in relation to the RSAO parameters. Specifically, we use planarity to estimate two important properties of the sound field. First, as high planarity identifies the most localised reflections along the RIR, we estimate the most planar portions of the RIR, corresponding to the RSAO early reflection model and increasing the likelihood of detecting prominent specular reflections. Second, as diffuse sound fields give a low planarity score, we investigate planarity for data-based mixing time estimation. Results show that planarity estimates on measured multichannel RIR datasets represent a useful tool for room acoustics analysis and RSAO parameterisation.

  • Coleman P , Jackson PJB. (2016) 'Planarity-based sound field optimization for multi-listener spatial audio'. AES Sound Field Control Conference Proceedings, Guildford, UK: AES Sound Field Control Conference.

    Abstract

    Planarity panning (PP) and planarity control (PC) have previously been shown to be efficient methods for focusing directional sound energy into listening zones. In this paper, we consider sound field control for two listeners. First, PP is extended to create spatial audio for two listeners consuming the same spatial audio content. Then, PC is used to create highly directional sound and cancel interfering audio. Simulation results compare PP and PC against pressure matching (PM) solutions. For multiple listeners listening to the same content, PP creates directional sound at lower effort than the PM counterpart. When listeners consume different audio, PC produces greater acoustic contrast than PM, with excellent directional control except for frequencies where grating lobes generate problematic interference patterns.

  • Woodcock J, Pike C, Melchior F, Coleman P , Franck A, Hilton ADM. (2016) 'Presenting the S3A Object-Based Audio Drama dataset'. Audio Engineering Society AES E-library, Paris, France: 140th AES Convention

    Abstract

    This engineering brief reports on the production of 3 object-based audio drama scenes, commissioned as part of the S3A project. 3D reproduction and an object-based workflow were considered and implemented from the initial script commissioning through to the final mix of the scenes. The scenes are being made available as Broadcast Wave Format files containing all objects as separate tracks and all metadata necessary to render the scenes as an XML chunk in the header conforming to the Audio Definition Model specification (Recommendation ITU-R BS.2076 [1]). It is hoped that these scenes will find use in perceptual experiments and in the testing of 3D audio systems. The scenes are available via the following link: http://dx.doi.org/10.17866/rd.salford.3043921.

  • Coleman P, Franck A, Jackson PJB, Hughes R, Remaggi L, Melchior F. (2016) 'On object based audio with reverberation'. AES 60TH INTERNATIONAL CONFERENCE, Leuven, Belgium, 2016 February 3–5, Leuven, Belgium: AES 60th International Conference

    Abstract

    Object-based audio is gaining momentum as a means for future audio productions to be format-agnostic and interactive. Recent standardization developments make recommendations for object formats, however the capture, production and reproduction of reverberation is an open issue. In this paper, we review approaches for recording, transmitting and rendering reverberation over a 3D spatial audio system. Techniques include channel-based approaches where room signals intended for a specific reproduction layout are transmitted, and synthetic reverberators where the room effect is constructed at the renderer. We consider how each approach translates into an object-based context considering the end-to-end production chain of capture, representation, editing, and rendering. We discuss some application examples to highlight the implications of the various approaches.

  • Zhu Q, Coleman P , Wu M, Yang J. (2016) 'Robust personal audio reproduction based on acoustic transfer function modelling'. Audio Engineering Society AES Sound Field Control Conference Proceedings, Guildford, UK: AES Sound Field Control Conference
    [ Status: Accepted ]

    Abstract

    Personal audio systems generate a local sound field for a listener while attenuating the sound energy at pre-defined quiet zones. Their performance can be sensitive to errors in the acoustic transfer functions between the sources and the zones. In this paper, we model the acoustic transfer functions as a superposition of multipoles with a term to describe errors in the actual gain and phase. We then propose a design framework for robust reproduction, incorporating additional prior knowledge about the error distribution where available. We combine acoustic contrast control with worst-case and probability-model optimization, exploiting limited knowledge of the error distribution. Monte-Carlo simulations over 10000 test cases show that the method increases system robustness when errors are present in the assumed transfer functions.

  • Remaggi L, Jackson PJB, Coleman P, Francombe J. (2015) 'Visualization of compact microphone array room impulse responses'. Audio Engineering Society Proc. AES 139th Int. Convention, New York NY, New York, USA: AES 139th Int. Convention, New York NY, pp. 4-4.

    Abstract

    For many audio applications, availability of recorded multi-channel room impulse responses (MC-RIRs) is fundamental. They enable development and testing of acoustic systems for reflective rooms. We present multiple MC-RIR datasets recorded in diverse rooms, using up to 60 loudspeaker positions and various uniform compact microphone arrays. These datasets complement existing RIR libraries and have dense spatial sampling of a listening position. To reveal the encapsulated spatial information, several state of the art room visualization methods are presented. Results confirm the measurement fidelity and graphically depict the geometry of the recorded rooms. Further investigation of these recordings and visualization methods will facilitate object-based RIR encoding, integration of audio with other forms of spatial information, and meaningful extrapolation and manipulation of recorded compact microphone array RIRs.

  • Remaggi L, Jackson PJB, Coleman P. (2015) 'Source, sensor and reflector position estimation from acoustical room impulse responses'. Florence, Italy: 22nd International Congress on Sound and Vibration

    Abstract

    The acoustic environment affects the properties of the audio signals recorded. Generally, given room impulse responses (RIRs), three sets of parameters have to be extracted in order to create an acoustic model of the environment: sources, sensors and reflector positions. In this paper, the cross-correlation based iterative sensor position estimation (CISPE) algorithm is presented, a new method to estimate a microphone configuration, together with source and reflector position estimators. A rough measurement of the microphone positions initializes the process; then a recursive algorithm is applied to improve the estimates, exploiting a delay-and-sum beamformer. Knowing where the microphones lie in the space, the dynamic programming projected phase slope algorithm (DYPSA) extracts the times of arrival (TOAs) of the direct sounds from the RIRs, and multiple signal classification (MUSIC) extracts the directions of arrival (DOAs). A triangulation technique is then applied to estimate the source positions. Finally, exploiting properties of 3D quadratic surfaces (namely, ellipsoids), reflecting planes are localized via a technique ported from image processing, by random sample consensus (RANSAC). Simulation tests were performed on measured RIR datasets acquired from three different rooms located at the University of Surrey, using either a uniform circular array (UCA) or uniform rectangular array (URA) of microphones. Results showed small improvements with CISPE pre-processing in almost every case.

  • Coleman P, Jackson PJB, Francombe J. (2015) 'Audio Object Separation Using Microphone Array Beamforming'. Audio Engineering Society Warsaw: 138th Convention of the Audio Engineering Society

    Abstract

    Audio production is moving toward an object-based approach, where content is represented as audio together with metadata that describe the sound scene. From current object definitions, it would usually be expected that the audio portion of the object is free from interfering sources. This poses a potential problem for object-based capture, if microphones cannot be placed close to a source. This paper investigates the application of microphone array beamforming to separate a mixture into distinct audio objects. Real mixtures recorded by a 48-channel microphone array in reflective rooms were separated, and the results were evaluated using perceptual models in addition to physical measures based on the beam pattern. The effect of interfering objects was reduced by applying the beamforming techniques.

  • Remaggi L, Jackson PJB, Coleman P . (2015) 'Estimation of Room Reflection Parameters for a Reverberant Spatial Audio Object'. Proc. AES 138th Int. Convention, Warsaw, Poland, Warsaw: 138th Convention of the Audio Engineering Society

    Abstract

    Estimating and parameterizing the early and late reflections of an enclosed space is an interesting topic in acoustics. With a suitable set of parameters, the current concept of a spatial audio object (SAO), which is typically limited to either direct (dry) sound or diffuse field components, could be extended to afford an editable spatial description of the room acoustics. In this paper we present an analysis/synthesis method for parameterizing a set of measured room impulse responses (RIRs). RIRs were recorded in a medium-sized auditorium, using a uniform circular array of microphones representing the perspective of a listener in the front row. During the analysis process, these RIRs were decomposed, in time, into three parts: the direct sound, the early reflections, and the late reflections. From the direct sound and early reflections, parameters were extracted for the length, amplitude, and direction of arrival (DOA) of the propagation paths by exploiting the dynamic programming projected phase-slope algorithm (DYPSA) and classical delay-and-sum beamformer (DSB). Their spectral envelope was calculated using linear predictive coding (LPC). Late reflections were modeled by frequency-dependent decays excited by band-limited Gaussian noise. The combination of these parameters for a given source position and the direct source signal represents the reverberant or “wet” spatial audio object. RIRs synthesized for a specified rendering and reproduction arrangement were convolved with dry sources to form reverberant components of the sound scene. The resulting signals demonstrated potential for these techniques, e.g., in SAO reproduction over a 22.2 surround sound system.

  • Francombe J, Brookes T, Mason R, Flindt R, Coleman P, Liu Q, Jackson PJB. (2015) 'Production and reproduction of programme material for a variety of spatial audio formats'. Proc. AES 138th Int. Conv. (e-Brief), Warsaw, Warsaw: 138th Audio Engineering Society Convention, pp. 4-4.

    Abstract

    For subjective experimentation on 3D audio systems, suitable programme material is needed. A large-scale recording session was performed in which four ensembles were recorded with a range of existing microphone techniques (aimed at mono, stereo, 5.0, 9.0, 22.0, ambisonic, and headphone reproduction) and a novel 48-channel circular microphone array. Further material was produced by remixing and augmenting pre-existing multichannel content. To mix and monitor the programme items (which included classical, jazz, pop and experimental music, and excerpts from a sports broadcast and a lm soundtrack), a flexible 3D audio reproduction environment was created. Solutions to the following challenges were found: level calibration for different reproduction formats; bass management; and adaptable signal routing from different software and fille formats.

  • Remaggi L, Jackson PJB, Coleman P, Wang W. (2014) 'Room boundary estimation from acoustic room impulse responses'. Edinburgh, UK : IEEE Proc. Sensor Signal Processing for Defence (SSPD 2014), Edinburgh: Sensor Signal Processing for Defence (SSPD 2014), pp. 1-5.

    Abstract

    Boundary estimation from an acoustic room impulse response (RIR), exploiting known sound propagation behavior, yields useful information for various applications: e.g., source separation, simultaneous localization and mapping, and spatial audio. The baseline method, an algorithm proposed by Antonacci et al., uses reflection times of arrival (TOAs) to hypothesize reflector ellipses. Here, we modify the algorithm for 3-D environments and for enhanced noise robustness: DYPSA and MUSIC for epoch detection and direction of arrival (DOA) respectively are combined for source localization, and numerical search is adopted for reflector estimation. Both methods, and other variants, are tested on measured RIR data; the proposed method performs best, reducing the estimation error by 30%.

  • Coleman P, Jackson PJB. (2014) 'Planarity panning for listener-centered spatial audio'. Proc. AES 55th Int. Conf., Helsinki, , pp. 8-8.

    Abstract

    Techniques such as multi-point optimization, wave field synthesis and ambisonics attempt to create spatial effects by synthesizing a sound field over a listening region. In this paper, we propose planarity panning, which uses superdirective microphone array beamforming to focus the sound from the specified direction, as an alternative approach. Simulations compare performance against existing strategies, considering the cases where the listener is central and non-central in relation to a 60 channel circular loudspeaker array. Planarity panning requires low control effort and provides high sound field planarity over a large frequency range, when the zone positions match the target regions specified for the filter calculations. Future work should implement and validate the perceptual properties of the method.

  • Coleman P, Jackson PJB, Olik M, Pedersen JA. (2014) 'Stereophonic personal audio reproduction using planarity control optimization'. Beijing, China: International Congress on Sound and Vibration

    Abstract

    Sound field control to create multiple personal audio spaces (sound zones) in a shared listening environment is an active research topic. Typically, sound zones in the literature have aimed to reproduce monophonic audio programme material. The planarity control optimization approach can reproduce sound zones with high levels of acoustic contrast, while constraining the energy flux distribution in the target zone to impinge from a certain range of azimuths. Such a constraint has been shown to reduce problematic self-cancellation artefacts such as uneven sound pressure levels and complex phase patterns within the target zone. Furthermore, multichannel reproduction systems have the potential to reproduce spatial audio content at arbitrary listening positions (although most exclusively target a `sweet spot'). By designing the planarity control to constrain the impinging energy rather tightly, a sound field approximating a plane-wave can be reproduced for a listener in an arbitrarily-placed target zone. In this study, the application of planarity control for stereo reproduction in the context of a personal audio system was investigated. Four solutions, to provide virtual left and right channels for two audio programmes, were calculated and superposed to achieve the stereo effect in two separate sound zones. The performance was measured in an acoustically treated studio using a 60 channel circular array, and compared against a least-squares pressure matching solution whereby each channel was reproduced as a plane wave field. Results demonstrate that planarity control achieved 6 dB greater mean contrast than the least-squares case over the range 250-2000 Hz. Based on the principal directions of arrival across frequency, planarity control produced azimuthal RMSE of 4.2/4.5 degrees for the left/right channels respectively (least-squares 2.8/3.6 degrees). Future work should investigate the perceived spatial quality of the implemented system with respect to a reference stereophonic setup.

  • Coleman P, Jackson PJB, Olik M, Pedersen JA. (2014) 'Numerical optimization of loudspeaker configuration for sound zone reproduction'. Beijing, China: International Congress on Sound and Vibration

    Abstract

    The topic of sound zone reproduction, whereby listeners sharing an acoustic space can receive personalized audio content, has been researched for a number of years. Recently, a number of sound zone systems have been realized, moving the concept towards becoming a practical reality. Current implementations of sound zone systems have relied upon conventional loudspeaker geometries such as linear and circular arrays. Line arrays may be compact, but do not necessarily give the system the opportunity to compensate for room reflections in real-world environments. Circular arrays give this opportunity, and also give greater flexibility for spatial audio reproduction, but typically require large numbers of loudspeakers in order to reproduce sound zones over an acceptable bandwidth. Therefore, one key area of research standing between the ideal capability and the performance of a physical system is that of establishing the number and location of the loudspeakers comprising the reproduction array. In this study, the topic of loudspeaker configurations was considered for two-zone reproduction, using a circular array of 60 loudspeakers as the candidate set for selection. A numerical search procedure was used to select a number of loudspeakers from the candidate set. The novel objective function driving the search comprised terms relating to the acoustic contrast between the zones, array effort, matrix condition number, and target zone planarity. The performance of the selected sets using acoustic contrast control was measured in an acoustically treated studio. Results demonstrate that the loudspeaker selection process has potential for maximising the contrast over frequency by increasing the minimum contrast over the frequency range 100--4000 Hz. The array effort and target planarity can also be optimised, depending on the formulation of the objective function. Future work should consider greater diversity of candidate locations.

  • Coleman P, Jackson PJB, Olik M, Pedersen JA. (2013) 'Optimizing the planarity of sound zones'. Proceedings of the AES International Conference, , pp. 204-213.

    Abstract

    Reproduction of personal sound zones can be attempted by sound field synthesis, energy control, or a combination of both. Energy control methods can create an unpredictable pressure distribution in the listening zone. Sound field synthesis methods may be used to overcome this problem, but tend to produce a lower acoustic contrast between the zones. Here, we present a cost function to optimize the cancellation and the plane wave energy over a range of incoming azimuths, producing a planar sound field without explicitly specifying the propagation direction. Simulation results demonstrate the performance of the methods in comparison with the current state of the art. The method produces consistent high contrast and a consistently planar target sound zone across the frequency range 80-7000Hz. Copyright © (2013) by the Audio Engineering Society.

  • Francombe J, Coleman P, Olik M, Baykaner K, Jackson PJB, Mason R, Dewhirst M, Bech S, Pedersen JA. (2013) 'Perceptually optimised loudspeaker selection for the creation of personal sound zones'. Proceedings of the AES International Conference, , pp. 169-178.

    Abstract

    Sound eld control methods can be used to create multiple zones of audio in the same room. Separation achieved by such systems has classically been evaluated using physical metrics including acoustic contrast and target-to-interferer ratio (TIR). However, to optimise the experience for a listener it is desirable to consider perceptual factors. A search procedure was used to select 5 loudspeakers for production of 2 sound zones using acoustic contrast control. Comparisons were made between searches driven by physical (programme-independent TIR) and perceptual (distraction predictions from a statistical model) cost func- Tions. Performance was evaluated on TIR and predicted distraction in addition to subjective ratings. The perceptual cost function showed some benefits over physical optimisation, although the model used needs further work. Copyright © (2013) by the Audio Engineering Society.

  • Olik M, Francombe J, Coleman P, Jackson PJB, Olsen M, Møller M, Mason R, Bech S. (2013) 'A comparative performance study of sound zoning methods in a reflective environment'. Proceedings of the AES International Conference, , pp. 214-223.

    Abstract

    Whilst sound zoning methods have typically been studied under anechoic conditions, it is desirable to evaluate the performance of various methods in a real room. Three control methods were implemented (delay and sum, DS; acoustic contrast control, ACC; and pressure matching, PM) on two regular 24-element loudspeaker arrays (line and circle). The acoustic contrast between two zones was evaluated and the reproduced sound fields compared for uniformity of energy distribution. ACC generated the highest contrast, whilst PM produced a uniform bright zone. Listening tests were also performed using monophonic auralisations from measured system responses to collect ratings of perceived distraction due to the alternate audio programme. Distraction ratings were affected by control method and programme material. Copyright © (2013) by the Audio Engineering Society.

  • Jackson PJ, Jacobsen F, Coleman P, Pedersen JA. (2013) 'Sound field planarity characterized by superdirective beamforming'. Proceedings of Meetings on Acoustics, 19

    Abstract

    The ability to replicate a plane wave represents an essential element of spatial sound field reproduction. In sound field synthesis, the desired field is often formulated as a plane wave and the error minimized; for other sound field control methods, the energy density or energy ratio is maximized. In all cases and further to the reproduction error, it is informative to characterize how planar the resultant sound field is. This paper presents a method for quantifying a region's acoustic planarity by superdirective beamforming with an array of microphones, which analyzes the azimuthal distribution of impinging waves and hence derives the planarity. Estimates are obtained for a variety of simulated sound field types, tested with respect to array orientation, wavenumber, and number of microphones. A range of microphone configurations is examined. Results are compared with delay-and-sum beamforming, which is equivalent to spatial Fourier decomposition. The superdirective beamformer provides better characterization of sound fields, and is effective with a moderate number of omni-directional microphones over a broad frequency range. Practical investigation of planarity estimation in real sound fields is needed to demonstrate its validity as a physical sound field evaluation measure. © 2013 Acoustical Society of America.

  • Coleman P, Jackson PJ, Olik M, Olsen M, Møller M, Pedersen JA. (2013) 'The influence of regularization on anechoic performance and robustness of sound zone methods'. Proceedings of Meetings on Acoustics, 19

    Abstract

    Recent attention to the problem of controlling multiple loudspeakers to create sound zones has been directed towards practical issues arising from system robustness concerns. In this study, the effects of regularization are analyzed for three representative sound zoning methods. Regularization governs the control effort required to drive the loudspeaker array, via a constraint in each optimization cost function. Simulations show that regularization has a significant effect on the sound zone performance, both under ideal anechoic conditions and when systematic errors are introduced between calculation of the source weights and their application to the system. Results are obtained for speed of sound variations and loudspeaker positioning errors with respect to the source weights calculated. Judicious selection of the regularization parameter is shown to be a primary concern for sound zone system designers - the acoustic contrast can be increased by up to 50dB with proper regularization in the presence of errors. A frequency-dependent minimum regularization parameter is determined based on the conditioning of the matrix inverse. The regularization parameter can be further increased to improve performance depending on the control effort constraints, expected magnitude of errors, and desired sound field properties of the system. © 2013 Acoustical Society of America.

  • Olik M, Jackson PJ, Coleman P. (2013) 'Influence of low-order room reflections on sound zone system performance'. Proceedings of Meetings on Acoustics, 19

    Abstract

    Studies on sound field control methods able to create independent listening zones in a single acoustic space have recently been undertaken due to the potential of such methods for various practical applications, such as individual audio streams in home entertainment. Existing solutions to the problem have shown to be effective in creating high and low sound energy regions under anechoic conditions. Although some case studies in a reflective environment can also be found, the capabilities of sound zoning methods in rooms have not been fully explored. In this paper, the influence of low-order (early) reflections on the performance of key sound zone techniques is examined. Analytic considerations for small-scale systems reveal strong dependence of performance on parameters such as source positioning with respect to zone locations and room surfaces, as well as the parameters of the receiver configuration. These dependencies are further investigated through numerical simulation to determine system configurations which maximize the performance in terms of acoustic contrast and array control effort. The design rules for source and receiver positioning are suggested, for improved performance under a given set of constraints such as a number of available sources, zone locations and the direction of the dominant reflection. © 2013 Acoustical Society of America.

Theses and dissertations

  • Coleman P. Loudspeaker array processing for personal sound zone reproduction.

    Abstract

    Sound zone reproduction facilitates listeners wishing to consume personal audio content within the same acoustic enclosure by filtering loudspeaker signals to create constructive and destructive interference in different spatial regions. Published solutions to the sound zone problem are derived from areas such as sound field synthesis and beamforming. The first contribution of this thesis is a comparative study of multi-point approaches. A new metric of planarity is adopted to analyse the spatial distribution of energy in the target zone, and the well-established metrics of acoustic contrast and control effort are also used. Simulations and experimental results demonstrate the advantages and disadvantages of the approaches. Energy cancellation produces good acoustic contrast but allows very little control over the target sound field; synthesis-derived approaches precisely control the target sound field but produce less contrast. Motivated by the limitations of the existing optimization methods, the central contribution of this thesis is a proposed optimization cost function ‘planarity control’, which maximizes the acoustic contrast between the zones while controlling sound field planarity by projecting the target zone energy into a spatial domain. Planarity control is shown to achieve good contrast and high target zone planarity over a large frequency range. The method also has potential for reproducing stereophonic material in the context of sound zones. The remaining contributions consider two further practical concerns. First, judicious choice of the regularization parameter is shown to have a significant effect on the contrast, effort and robustness. Second, attention is given to the problem of optimally positioning the loudspeakers via a numerical framework and objective function. The simulation and experimental results presented in this thesis represent a significant addition to the literature and will influence the future choices of control methods, regularization and loudspeaker placement for personal audio. Future systems may incorporate 3D rendering and listener tracking.

Page Owner: pc0034
Page Created: Tuesday 8 July 2014 13:25:42 by pg0016
Last Modified: Friday 22 September 2017 10:13:46 by pj0010
Expiry Date: Friday 28 December 2012 15:48:35
Assembly date: Tue Oct 10 09:17:25 BST 2017
Content ID: 128137
Revision: 8
Community: 1379

Rhythmyx folder: //Sites/surrey.ac.uk/DMM/People
Content type: rx:StaffProfile