Kim C, Kondoz A, Shi X (2013) Investigation into spatial audio quality of experience in the presence of accompanying video cues with spatial mismatch,
This study investigates the metrics for spatial audio quality of experience (QoE) prediction, particularly considering the existence of video whose viewpoint may not match that of the auditory scene. Subjective tests were conducted for 5.1-channel audio quality evaluation following a previously developed testing and prediction methodology, with the addition of accompanying video cues and with the spatial correlation between audio and video as a new variable for the QoE prediction. The first experiment with a synthesized visual cue showed that the spatial mismatch between audio and video affects the perceived audio QoE. A prediction model of the QoE score was suggested through statistical analysis, using the audio-video angular mismatch as well as other known measurable low-level parameters. The model was validated through another set of subjective test using a captured real-life audiovisual content.
Kim C, Mason R, Brookes TS (2009) The role of head movement in the analysis of spatial impression,
Kim C, Mason R, Brookes T (2007) An investigation into head movements made when evaluating various attributes of sound, Audio Engineering Society Preprint7031 Audio Engineering Society
This research extends the study of head movements during listening by including various listening tasks where the listeners evaluate spatial impression and timbre, in addition to the more common task of judging source location. Subjective tests were conducted in which the listeners were allowed to move their heads freely whilst listening to various types of sound and asked to evaluate source location, apparent source width, envelopment, and timbre. The head movements were recorded with a head tracker attached to the listener?s head. From the recorded data, the maximum range of movement, mean position and speed, and maximum speed were calculated along each axis of translational and rotational movement. The effects of various independent variables, such as the attribute being evaluated, the stimulus type, the number of repetition, and the simulated source location were examined through statistical analysis. The results showed that whilst there were differences between the head movements of individual subjects, across all listeners the range of movement was greatest when evaluating source width and envelopment, less when localising sources, and least when judging timbre. In addition, the range and speed of head movement was reduced for transient signals compared to longer musical or speech phrases. Finally, in most cases for the judgement of spatial attributes, head movement was in the direction of source direction.
In order to take head movement into account in objective evaluation of perceived spatial impression (including source direction), a suitable binaural capture device is required. A signal capture system was suggested that consisted of a head-sized sphere containing multiple pairs of microphones which, in comparison to a rotating head and torso simulator (HATS), has the potential for improved measurement speed and the capability to measure time varying systems, albeit at the expense of some accuracy. The error introduced by using a relatively simple sphere compared to a more physically accurate HATS was evaluated in terms of three binaural parameters related to perceived spatial impression ? interaural time and level differences (ITD and ILD) and interaural cross-correlation coefficient (IACC). It was found that whilst the error in the IACC measurements was perceptually negligible when the sphere was mounted on a torso, the differences in measured ITD and ILD values between the sphere-with-torso and HATS were not perceptually negligible. However, it was found that the sphere-with-torso could give accurate predictions of source location based on ITD and ILD, through the use of a look-up table created from known ITD-ILD-direction mappings. Therefore the validity of the multi-microphone sphere-with-torso as a binaural signal capture device for perceptually relevant measurements of source direction (based on ITD and ILD) and spatial impression (based on IACC) was demonstrated.
Kim C, Ahn SC, Kim I, Kim H (2005) 3-dimensional voice communication system for two user groups, Advanced Communication Technology, 2005, ICACT 2005. The 7th International Conference on
This paper proposes a 3-dimensional (3D) voice over IP (VoIP) system for two user groups with the smallest device requirements. It also presents design issues of the system and some experience in implementation of the system. The proposed system requires only a desktop computer with a multi-channel soundcard installed, USB microphones as many as the number of users, and 3D sound rendering system such as 5.1ch or 7.1ch loudspeaker system. It not only enables multiple users to communicate with a remote group using a single desktop computer, but also enriches the voice of each user with 3D spatial effect. Using the system, the participants can hear the voice of remote users through a 3D sound rendering system, as if each remote user speaks at his or her corresponding position. This system can be used for immersive teleconference system.
This research introduces a novel technique for capturing binaural signals for objective evaluation of spatial impression; the technique allows for simulation of the head movement that is typical in a range of listening activities. A subjective listening test showed that the amount of head movement made was larger when listeners were rating perceived source width and envelopment than when rating source direction and timbre, and that the locus of ear positions corresponding to the pattern of head movement formed a bounded sloped path ? higher towards the rear and lower towards the front. Based on these findings, a signal capture system was designed comprising a sphere with multiple microphones, mounted on a torso. Evaluation of its performance showed that a perceptual model incorporating this capture system is capable of perceptually accurate prediction of source direction based on interaural time and level differences (ITD and ILD), and of spatial impression based on interaural cross-correlation coefficient (IACC). Investigation into appropriate parameter derivation and interpolation techniques determined that 21 pairs of spaced microphones were sufficient to measure ITD, ILD and IACC across the sloped range of ear positions.
Mason R, Kim C, Brookes T (2008) Taking head movements into account in measurement of spatial attributes, Institute of Acoustics - 24th Reproduced Sound Conference 2008, Reproduced Sound 2008: Immersive Audio, Proceedings of the Institute of Acoustics30(PART 6)pp. 239-246
Measurements of the spatial attributes of auditory environments or sound reproduction systems commonly only consider a single receiver position. However, it is known that humans make use of head movement to help to make sense of auditory scenes, especially when the physical cues are ambiguous. Results are summarised from a three-year research project which aims to develop a practical binaural-based measurement system that takes head movements into account. Firstly, the head movements made by listeners in various situations were investigated, which showed that a wide range of head movements are made when evaluating source width and envelopment, and minimal head movements made when evaluating timbre. Secondly, the effect of using a simplified sphere model containing two microphones instead of a head and torso simulator was evaluated, and methods were derived to minimise the errors in measured cues for spatial perception that were caused by the simplification of the model. Finally, the results of the two earlier stages were combined to create a multi-microphone sphere that can be used to measure spatial attributes incorporating head movements in a perceptually-relevant manner, and which allows practical and rapid measurements to be made.
Hartmann C, Weitnauer M, Kim C (2012) A Hybrid Acquisition Approach for the Recording of Object-Based Audio Scenes,
This research incorporates the nature of head movement made in listening activities, into the development of a quasi- binaural acoustical measurement technique for the evaluation of spatial impression. A listening test was conducted where head movements were tracked whilst the subjects rated the perceived source width, envelopment, source direction and timbre of a number of stimuli. It was found that the extent of head movements was larger when evaluating source width and envelopment than when evaluating source direction and timbre. It was also found that the locus of ear positions corresponding to these head movements formed a bounded sloped path, higher towards the rear and lower towards the front. This led to the concept of a signal capture device comprising a torso-mounted sphere with multiple microphones. A prototype was constructed and used to measure three binaural parameters related to perceived spatial impression - interaural time and level differences (ITD and ILD) and interaural cross- correlation coefficient (IACC). Comparison of the prototype measurements to those made with a rotating Head and Torso Simulator (HATS) showed that the prototype could be perceptually accurate for the prediction of source direction using ITD and ILD, and for the prediction of perceived spatial impression using IACC. Further investigation into parameter derivation and interpolation methods indicated that 21 pairs of discretely spaced microphones were sufficient to measure the three binaural parameters across the sloped range of ear positions identified in the listening test.
This research incorporates the nature of head movement made in listening activities, into the development of a quasibinaural
acoustical measurement technique for the evaluation of spatial impression. A listening test was conducted
where head movements were tracked whilst the subjects rated the perceived source width, envelopment, source
direction and timbre of a number of stimuli. It was found that the extent of head movements was larger when
evaluating source width and envelopment than when evaluating source direction and timbre. It was also found that
the locus of ear positions corresponding to these head movements formed a bounded sloped path, higher towards the
rear and lower towards the front. This led to the concept of a signal capture device comprising a torso-mounted
sphere with multiple microphones. A prototype was constructed and used to measure three binaural parameters
related to perceived spatial impression - interaural time and level differences (ITD and ILD) and interaural crosscorrelation
coefficient (IACC). Comparison of the prototype measurements to those made with a rotating Head and
Torso Simulator (HATS) showed that the prototype could be perceptually accurate for the prediction of source
direction using ITD and ILD, and for the prediction of perceived spatial impression using IACC. Further
investigation into parameter derivation and interpolation methods indicated that 21 pairs of discretely spaced
microphones were sufficient to measure the three binaural parameters across the sloped range of ear positions
identified in the listening test.
This research aims, ultimately, to develop a system for the objective evaluation of spatial impression, incorporating the finding from a previous study that head movements are naturally made in its subjective evaluation. A spherical binaural capture model, comprising a head-sized sphere with multiple attached microphones, has been proposed. Research already conducted found significant differences in interaural time and level differences, and cross-correlation coefficient, between this spherical model and a head and torso simulator. It is attempted to lessen these differences by adding to the sphere a torso and simplified pinnae. Further analysis of the head movements made by listeners in a range of listening situations determines the range of head positions that needs to be taken into account. Analyses of these results inform the optimum positioning of the microphones around the sphere model.
Weitnauer M, Kim C, Hartmann C A Hybrid 3D Audio Acquisition Approach forthe Recording of Spatial Audio Scenes,
The present paper describes the conception and field trial of a recording system designed for three dimensional audio acquisition of complex acoustic scenes. In this regard, the pursued objective is to create audio signals which are generic enough in order to be used with highly diverse playback devices. Therefore, within the scope of the EU funded research project ROMEO, a hybrid audio recording system, based on familiar spatial recording techniques, was designed and implemented. The audio signals and metadata created by the system are versatile enough to be processed by and transmitted to various end user terminals, such as mobile devices, set-top boxes and tablet computers. A practical test recording at the bavarian broadcaster (BR) in Munich enabled a validation of the concept and at the same time provided audio testing material for the further ROMEO research work.
Experiments were undertaken to elicit the perceived effects of head-position-dependent variations in the interaural cross-correlation coefficient of a range of signals. A graphical elicitation experiment showed that the variations in the IACC strongly affected the perceived width and depth of the reverberant environment, as well as the perceived width and distance of the sound source. A verbal experiment gave similar results, and also indicated that the head-position-dependent IACC variations caused changes in the perceived spaciousness and envelopment of the stimuli.
In a previous study it was discovered that listeners normally make head movements attempting to evaluate source width and envelopment as well as source location. To accommodate this finding in the development of an objective measurement model for spatial impression, two capturing models were introduced and designed in this research, based on binaural technique: 1) rotating Head And Torso Simulator (HATS), and 2) a sphere with multiple microphones. As an initial study, measurements of interaural time difference (ITD), level difference (ILD) and cross-correlation coefficient (IACC) made with the HATS were compared with those made with a sphere containing two microphones. The magnitude of the differences was judged in a perceptually relevant manner by comparing them with the just-noticeable differences (JNDs) of these parameters. The results showed that the differences were generally not negligible, implying the necessity of enhancement of the sphere model, possibly by introducing equivalents of the pinnae or torso. An exception was the case of IACC, where the reference of JND specification affected the perceptual significance of its difference between the two models.
Kim C, Mason R, Brookes T (2010) Investigation into and modelling of head movement for objective evaluation of the spatial impression of audio, Journal of the Acoustical Society of America127(3)pp. 1886-1886 Acoustical Society of America
Research was undertaken to determine the nature of head movements made when judging spatial impression and to incorporate these into a system for measuring, in a perceptually relevant manner, the acoustic parameters which contribute to spatial impression: interaural time and level differences and interaural cross?correlation coefficient. First, a subjective test was conducted that showed that (i) the amount of head movement was larger when evaluating source width and envelopment than when judging localization and timbre and (ii) the pattern of head movement resulted in ear positions that formed a sloped area. These findings led to the design of a binaural signal capture technique using a sphere with multiple microphones, mounted on a simulated torso. Evaluation of this technique revealed that it would be appropriate for the prediction of perceived spatial attributes including both source direction and aspects of spatial impression. Reliable derivation of these attributes across the range of ear positions determined from the earlier subjective test was shown to be possible with a limited number of microphones through an appropriate interpolation and calculation technique. A prototype capture system was suggested as a result, using a sphere with torso, with 21 omnidirectional microphones on each side. [Work supported by the Engineering and Physical Sciences Research Council (EPSRC), UK, Grant No. EP/D049253.]
De Silva V, Kim C, Haddad N, Ekmekcioglu E, Dogan S, Kondoz A, Politis I, Kordelas A, Dagiuklas T, Weitnauer M, Hartmann C An End-to-End QoE Measurement Framework For Immersive 3D Media Delivery Systems,
The ROMEO project focuses on the delivery of multiview 3-Dimensional (3D) video enriched with spatial audio on a converged network architecture. Quality of Experience (QoE) modeling plays an important role in several aspects of the overall ROMEO architecture, such as in video compression, multicast tree formation in the P2P overlay, content adaptation and view synthesis. To address various use cases there will be several models of QoE that will be investigated and developed within the scope of the ROMEO project. This paper describes the various models of QoE that will be investigated, specifically, QoE modeling of compression artifacts, rendering artifacts, and packet loss effects on 3D multiview video and QoE factors related to Audio compression and rendering. It is expected that the QoE models developed within the ROMEO project will find important use cases in a wide range of advanced multimedia applications.
Kim C (2013) Object-based spatial audio: concept, advantages and challenges, In: Kondoz A, Dagiuklas T (eds.), 3D Future Internet Media Springer
This book describes recent innovations in 3D media and technologies, with coverage of 3D media capturing, processing, encoding, and adaptation, networking aspects for 3D Media, and quality of user experience (QoE).
The Signal Separation Evaluation Campaign (SiSEC) is a
large-scale regular event aimed at evaluating current progress
in source separation through a systematic and reproducible
comparison of the participants? algorithms, providing the
source separation community with an invaluable glimpse of
recent achievements and open challenges. This paper focuses
on the music separation task from SiSEC 2018, which
compares algorithms aimed at recovering instrument stems
from a stereo mix. In this context, we conducted a subjective
evaluation whereby 34 listeners picked which of six competing
algorithms, with high objective performance scores,
best separated the singing-voice stem from 13 professionally
mixed songs. The subjective results reveal strong differences
between the algorithms, and highlight the presence
of song-dependent performance for state-of-the-art systems.
Correlations between the subjective results and the scores of
two popular performance metrics are also presented.