Dr Daqi Liu
Academic and research departmentsCentre for Vision, Speech and Signal Processing (CVSSP), Department of Electrical and Electronic Engineering, Faculty of Engineering and Physical Sciences.
With the increasing popularity of social media and smart devices, the face as one of the key biometrics becomes vital for person identification. Among those face recognition algorithms, video-based face recognition methods could make use of both temporal and spatial information just as humans do to achieve better classification performance. However, they cannot identify individuals when certain key facial areas, such as eyes or nose, are disguised by heavy makeup or rubber/digital masks. To this end, we propose a novel deep spiking neural network architecture in this paper. It takes dynamic facial movements, the facial muscle changes induced by speaking or other activities, as the sole input. An event-driven continuous spike-timing-dependent plasticity learning rule with adaptive thresholding is applied to train the synaptic weights. The experiments on our proposed video-based disguise face database (MakeFace DB) demonstrate that the proposed learning method performs very well, i.e., it achieves from 95% to 100% correct classification rates under various realistic experimental scenarios.
Visual semantic information comprises two important parts: the meaning of each visual semantic unit and the coherent visual semantic relation conveyed by these visual semantic units. Essentially, the former one is a visual perception task while the latter one corresponds to visual context reasoning. Remarkable advances in visual perception have been achieved due to the success of deep learning. In contrast, visual semantic information pursuit, a visual scene semantic interpretation task combining visual perception and visual context reasoning, is still in its early stage. It is the core task of many different computer vision applications, such as object detection, visual semantic segmentation, visual relationship detection or scene graph generation. Since it helps to enhance the accuracy and the consistency of the resulting interpretation, visual context reasoning is often incorporated with visual perception in current deep end-to-end visual semantic information pursuit methods. Surprisingly, a comprehensive review for this exciting area is still lacking. In this survey, we present a unified theoretical paradigm for all these methods, followed by an overview of the major developments and the future trends in each potential direction. The common benchmark datasets, the evaluation metrics and the comparisons of the corresponding methods are also introduced.