Dr Hanne Stenzel

Postgraduate Research Student
+44 (0)1483 682257
09 BC 03


Jackson PJB, Stenzel HC, Francombe J (2017) Modeling horizontal audio-visual coherence with the psychometric function,AES Convention 142
Studies on perceived audio-visual spatial coherence in the literature have commonly employed continuous judgment scales. This method requires listeners to detect and to quantify their perception of a given feature and is a difficult task, particularly for untrained listeners. An alternative method is the quantification of a percept by conducting a simple forced choice test with subsequent modeling of the psychometric function. An experiment to validate this alternative method for the perception of azimuthal audio-visual spatial coherence was performed. Furthermore, information on participant training and localization ability was gathered. The results are consistent with previous research and show that the proposed methodology is suitable for this kind of test. The main differences between participants result from the presence or absence of musical training.
Stenzel Hanne, Francombe Jon, Jackson Philip J. B. (2019) Limits of Perceived Audio-Visual Spatial Coherence as Defined by Reaction Time Measurements,Frontiers in Neuroscience13(451) Frontiers
The ventriloquism effect describes the phenomenon of audio and visual signals with common features, such as a voice and a talking face merging perceptually into one percept even if they are spatially misaligned. The boundaries of the fusion of spatially misaligned stimuli are of interest for the design of multimedia products to ensure a perceptually satisfactory product. They have mainly been studied using continuous judgment scales and forced-choice measurement methods. These results vary greatly between different studies. The current experiment aims to evaluate audio-visual fusion using reaction time (RT) measurements as an indirect method of measurement to overcome these great variances. A two-alternative forced-choice (2AFC) word recognition test was designed and tested with noise and multi-talker speech background distractors. Visual signals were presented centrally and audio signals were presented between 0° and 31° audio-visual offset in azimuth. RT data were analyzed separately for the underlying Simon effect and attentional effects. In the case of the attentional effects, three models were identified but no single model could explain the observed RTs for all participants so data were grouped and analyzed accordingly. The results show that significant differences in RTs are measured from 5° to 10° onwards for the Simon effect. The attentional effect varied at the same audio-visual offset for two out of the three defined participant groups. In contrast with the prior research, these results suggest that, even for speech signals, small audio-visual offsets influence spatial integration subconsciously.
In media reproduction, there are many situations in which audio and visual signals, coming from the same object, are presented with a spatial offset. When the offset is small enough the spatial conflict is usually resolved by the brain, merging the different information into one unified object; this is the so-called ventriloquism effect. With respect to evolving immersive technologies such as virtual and augmented reality, it is important to define the maximally accepted offset angle to create a convincing environment. However, in literature on the ventriloquism effect, values for the maximally acceptable offset angle vary greatly. Therefore, a series of experiments was devised to investigate the influencing factors leading to this great variation. First, the influence of participants? background and sensory training in hearing and vision was assessed. In a second step, the influence of the stimulus properties such as their semantic category was examined. In both cases, a forced-choice yes/no experiment was conducted evaluating participants? thresholds in perceived spatial coherence. The third set of experiments strived to evaluate ventriloquism indirectly using reaction times measurement to circumvent the observed influencing factors. The results show that auditory sensory training greatly influences the measured offsetangles with a nearly doubled acceptable offset angle for untrained participants (19°) compared to musically trained ones (10°). The measured offset is further dependent on signal properties linked to localisation precision with variations in the range of ±2°. Both findings can be explained along the current model of bimodal spatial integration. Compared to these results, the reaction time measurements reveal that offsets as small as 5° and less can influence human bimodal integration independent of the sensory training. The divergent results are discussed along the lines of the two-stream processing in the brain for semantic and spatial information to derive recommendations for media reproduction taking into account the different use-cases of various devices and reproduction methods.