Computers learn to spot unspoken meaning in human conversations
Thursday 22 October 2009
Researchers at the University of Surrey have created an automatic system to spot non-verbal social signals in natural conversation. This allows computers to better understand meaning in speech, which enables more intuitive computer interfaces. Social cues such as agreement, understanding, thinking and questioning are detected in continuous video. The findings were presented at the IEEE International Workshop on Human-Computer Interaction on 4 October in Kyoto, Japan.
The research is being led by Tim Sheerman-Chase, Dr Eng-Jon Ong and Dr Richard Bowden within the Centre for Vision, Speech and Signal Processing at the University. The project was originated by an EPSRC study into lip reading which identified the need to provide more than the literal words for useful understanding. Humans unconsciously use body gestures, emotions and gaze direction to understand the meaning of spoken language. The automatic recognition of communication signals provides a valuable tool for computer interfaces and the study of social situations.
Human conversation was recorded with minimum intervention of the experimenter. Interesting clips from these conversations were rated by 21 annotators in a web browser. This provided clear examples of ‘thinking’ and ‘not thinking’, along with positive and negative examples of the other non-verbal signals. A computer learned which parts of the face could be used to identify each social signal in video.
Tim Sheerman-Chase comments: “This is a new direction in emotion recognition. Most previous work focused on actors or artificial social situations. The ability for computers to understand meaning in natural conversation is key to being able to use our innate communication skills to use computers.”
“Although the accuracy of the system is far from perfect, it is comparable to human performance for some types of social signals. The complexity of everyday conversations makes even humans disagree on what is happening.”
Recognition of communication signals can be applied to a range of applications including making computer game characters interact in more natural fashion, determining user experiences in real or virtual environments and safety critical applications. Future work will involve studying other social situations and cultural differences.

