Sign and gesture analysis

British Sign Language (BSL) is the primary form of communication for around 50,000 people in the UK, while speech recognition is now advanced to the point of being commercially available the task of sign language recognition is still unsolved.

C-Lab has advanced several areas of this field by using linguistics to bolster machine learning techniques and weakly supervised learning algorithms to help overcome the lack of data. These techniques are adapted for both appearance data from a standard camera as well as depth images from devices such as the Microsoft Kinect.

While some problems associated with Sign Language Recognition (SLR) are specific to the language model, there are many which overlap with the field of gesture recognition. For example, different users will vary in many ways, depending not only on their physical form but also on their fluency. A new user will often be slower, more deliberate and make larger manual gestures as opposed to a practiced user who will move more instinctively.

In addition, sign languages are complex languages; each has many thousands of signs each differing from the next by minor changes in hand motion, shape or position. They also have complex grammars, bearing little or no resemblance to spoken languages, for example adverbs may be shown by manipulating the action sign rather than compounding signs together, creating many variations on a theme.

While sign languages are predominantly thought of as containing a series of manual gestures, the languages tend to be richer than this, incorporating facial features and body postures to disambiguate signs and contexts. For this reason SLR includes research into facial feature analysis.

Linguistically based SLR

Sign languages can be described using a linguistic nomenclature such as HamNoSys.This system splits a sign into its component parts, motion, location, handshape etc. By learning features which relate to these phonemes we have been able to work with much larger lexicons of signs. This work has also created a more robust description of a manual gesture which improves user independence.

Interactive sign dictionary

Using this linguistic nomenclature research in combination with skeleton data from depth images, has allowed us to create a user independent, real-time, sign dictionary shown in this video.

Weakly supervised learning

One of the main problems facing SLR researchers is the lack of annotated data of natural signing. In C-Lab we have used temporal mining techniques to align televised signing footage with accompanying subtitles.

The occurrence of a word in the subtitles does not guarantee that the corresponding sign will occur, nor does it offer a concrete idea of when it will occur. This lead to an iterative solution which incorporated contextual negatives; allowing us to find the desired sign while also avoiding frequently co-occurring signs.

Mined sign language

Mined examples of the signs for army/soldier and obese.

Further information

For more information please see the following:

Contact us

Find us

Centre for Vision Speech and Signal Processing
Alan Turing Building (BB)
University of Surrey