Song X, Feng Z, Hu G (2017) Half-Face Dictionary Integration for Representation-Based Classification, IEEE Transactions on Cybernetics 47 (1) pp. 142-152 IEEE
This paper presents a half-face dictionary integration (HFDI) algorithm for representation-based classification. The proposed HFDI algorithm measures residuals between an input signal and the reconstructed one, using both the original and the synthesized dual-column (row) half-face training samples. More specifically, we first generate a set of virtual half-face samples for the purpose of training data augmentation. The aim is to obtain high-fidelity collaborative representation of a test sample. In this half-face integrated dictionary, each original training vector is replaced by an integrated dual-column (row) half-face matrix. Second, to reduce the redundancy between the original dictionary and the extended half-face dictionary, we propose an elimination strategy to gain the most robust training atoms. The last contribution of the proposed HFDI method is the use of a competitive fusion method weighting the reconstruction residuals from different dictionaries for robust face classification. Experimental results obtained from the Facial Recognition Technology, Aleix and Robert, Georgia Tech, ORL, and Carnegie Mellon University-pose, illumination and expression data sets demonstrate the effectiveness of the proposed method, especially in the case of the small sample size problem.
Kittler J, Huber P, Feng Z, Hu G, Christmas W (2016) 3D Morphable Face Models and Their Applications, Lecture Notes in Computer Science (LNCS) vol.9756: 9th International Conference, AMDO 2016, Palma de Mallorca, Spain, July 13-15, 2016, Proceedings 9756 pp. 185-206 Springer
3D Morphable Face Models (3DMM) have been used in face recognition for some time now. They can be applied in their own right as a basis for 3D face recognition and analysis involving 3D face data. However their prevalent use over the last decade has been as a versatile tool in 2D face recognition to normalise pose, illumination and expression of 2D face images. A 3DMM has the generative capacity to augment the training and test databases for various 2D face processing related tasks. It can be used to expand the gallery set for pose-invariant face matching. For any 2D face image it can furnish complementary information, in terms of its 3D face shape and texture. It can also aid multiple frame fusion by providing the means of registering a set of 2D images. A key enabling technology for this versatility is 3D face model to 2D face image fitting. In this paper recent developments in 3D face modelling and model fitting will be overviewed, and their merits in the context of diverse applications illustrated on several examples, including pose and illumination invariant face recognition, and 3D face reconstruction from video.
Shao C, Song X, Feng Z, Wu X-J, Zheng Y (2017) Dynamic dictionary optimization for sparse-representation-based face classification using local difference images, Information Sciences
In this study, we present a new sparse-representation-based face-classification algorithm that exploits dynamic dictionary optimization on an extended dictionary using synthesized faces. More specifically, given a dictionary consisting of face examples, we first augment the dictionary with a set of virtual faces generated by calculating the image difference of a pair of faces. This results in an extended dictionary with hybrid training samples, which enhances the capacity of the dictionary to represent new samples. Second, to reduce the redundancy of the extended dictionary and improve the classification accuracy, we use a dictionary-optimization method. We truncate the extended dictionary with a more compact structure by discarding the original samples with small contributions to represent a test sample. Finally, we perform sparse-representation-based face classification using the optimized dictionary. Experimental results obtained using the AR and FERRET face datasets demonstrate the superiority of the proposed method in terms of accuracy, especially for small-sample-size problems.
Feng Z, Song X, Yang X, Wu X, Yang J (2016) Towards multi-scale fuzzy sparse discriminant analysis using local third-order tensor model of face images, Neurocomputing 185 pp. 53-63 Elsevier
Traditional discriminant analysis (DA) methods are usually not amenable to being studied only with a few or even single facial image per subject. The fundamental reason lies in the fact that the traditional DA approaches cannot fully reflect the variations of a query sample with illumination, occlusion and pose variations, especially in the case of small sample size. In this paper, we develop a multi-scale fuzzy sparse discriminant analysis using a local third-order tensor model to perform robust face classification. More specifically, we firstly introduced a local third-order tensor model of face images to exploit a set of multi-scale characteristics of the Ridgelet transform. Secondly, a set of Ridgelet transformed coefficients with respect to each block from a face image are respectively generated. We then merge all these coefficients to form a new representative vector for the image. Lastly, we evaluate the sparse similarity grade between each training sample and class by constructing a sparse similarity metric, and redesign the traditional discriminant criterion that contains considerable fuzzy sparse similarity grades to perform robust classification. Experimental results conducted on a set of well-known face databases demonstrate the merits of the proposed method, especially in the case of insufficient training samples.
Beveridge JR, Zhang H, Draper BA, Flynn PJ, Feng Z, Huber P, Kittler J, Huang Z, Li S, Li Y, Kan M, Wang R, Shan S, Chen X, Li H, Hua G, Struc V, Krizaj J, Ding C, Tao D, Phillips PJ (2015) Report on the FG 2015 Video Person Recognition Evaluation, 2015 11TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOPS ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG), VOL. 2 IEEE
We present a new Cascaded Shape Regression (CSR) architecture, namely Dynamic Attention-Controlled CSR (DAC-CSR), for robust facial landmark detection on unconstrained faces. Our DAC-CSR divides facial landmark detection into three cascaded sub-tasks: face bounding box refinement, general CSR and attention-controlled CSR. The first two stages refine initial face bounding boxes and output intermediate facial landmarks. Then, an online dynamic model selection method is used to choose appropriate domain-specific CSRs for further landmark refinement. The key innovation of our DAC-CSR is the fault-tolerant mechanism, using fuzzy set sample weighting, for attentioncontrolled domain-specific model training. Moreover, we advocate data augmentation with a simple but effective 2D profile face generator, and context-aware feature extraction for better facial feature representation. Experimental results obtained on challenging datasets demonstrate the merits of our DAC-CSR over the state-of-the-art methods.
In practical applications of pattern recognition and computer vision, the performance of many approaches can be improved by using multiple models. In this paper, we develop a common theoretical framework for multiple model fusion at the feature level using multilinear subspace analysis (also known as tensor algebra). One disadvantage of the multilinear approach is that it is hard to obtain enough training observations for tensor decomposition algorithms. To overcome this difficulty, we adopted the M2SA algorithm to reconstruct the missing entries of the incomplete training tensor. Furthermore, we apply the proposed framework to the problem of face image analysis using Active Appearance Model (AAM) to validate its performance. Evaluations of AAM using the proposed framework are conducted on Multi-PIE face database with promising results. © Springer-Verlag 2013.
Feng Z-H, Huber P, Kittler J, Christmas W, Wu X-J (2015) Random Cascaded-Regression Copse for Robust Facial Landmark Detection, IEEE SIGNAL PROCESSING LETTERS 22 (1) pp. 76-80 IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Feng ZH, Kittler J, Christmas W, Wu XJ, Pfeiffer S (2012) Automatic face annotation by multilinear AAM with Missing Values, Proceedings - International Conference on Pattern Recognition pp. 2586-2589
It has been shown that multilinear subspace analysis is a powerful tool to overcome difficulties posed by viewpoint, illumination and expression variations in Active Appearance Model(AAM). However, the Higher Order Singular Value Decomposition (HOSVD) in multilinear analysis requires training samples to build the training tensor, which include face images under all different variations. It is hard to obtain such a complete training tensor in practical applications. In this paper, we propose a multilinear AAM which can be generated from an incomplete training tensor using Multilinear Subspace Analysis with Missing Values (M2SA). Also, the 2D appearance is used for training appearance tensor directly to reduce the memory requirements. Experimental results on the Multi-PIE face database show the efficiency of the proposed method. © 2012 ICPR Org Committee.
3D face reconstruction of shape and skin texture from a single 2D image can be performed using a 3D Morphable Model (3DMM) in an analysis-by-synthesis approach. However, performing this reconstruction (fitting) efficiently and accurately in a general imaging scenario is a challenge. Such a scenario would involve a perspective camera to describe the geometric projection from 3D to 2D, and the Phong model to characterise illumination. Under these imaging assumptions the reconstruction problem is nonlinear and, consequently, computationally very demanding. In this work, we present an efficient stepwise 3DMM-to-2D image-fitting procedure, which sequentially optimises the pose, shape, light direction, light strength and skin texture parameters in separate steps. By linearising each step of the fitting process we derive closed-form solutions for the recovery of the respective parameters, leading to efficient fitting. The proposed optimisation process involves all the pixels of the input image, rather than randomly selected subsets, which enhances the accuracy of the fitting. It is referred to as Efficient Stepwise Optimisation (ESO). The proposed fitting strategy is evaluated using reconstruction error as a performance measure. In addition, we demonstrate its merits in the context of a 3D-assisted 2D face recognition system which detects landmarks automatically and extracts both holistic and local features using a 3DMM. This contrasts with most other methods which only report results that use manual face landmarking to initialise the fitting. Our method is tested on the public CMU-PIE and Multi-PIE face databases, as well as one internal database. The experimental results show that the face reconstruction using ESO is significantly faster, and its accuracy is at least as good as that achieved by the existing 3DMM fitting algorithms. A face recognition system integrating ESO to provide a pose and illumination invariant solution compares favourably with other state-of-the-art methods. In particular, it outperforms deep learning methods when tested on the Multi-PIE database.
In this study, we present a new sparse-representation-based face-classification algorithm that exploits dynamic dictionary optimization on an extended dictionary using synthesized faces. More specifically, given a dictionary consisting of face examples, we first augment the dictionary with a set of virtual faces generated by calculating the image difference of a pair of faces. This results in an extended dictionary with hybrid training samples, which enhances the capacity of the dictionary to represent new samples. Second, to reduce the redundancy of the extended dictionary and improve the classification accuracy, we use a dictionary-optimization method. We truncate the extended dictionary with a more compact structure by discarding the original samples with small contributions to represent a test sample. Finally, we perform sparserepresentation- based face classification using the optimized dictionary. Experimental results obtained using the AR and FERRET face datasets demonstrate the superiority of the proposed method in terms of accuracy, especially for small-sample-size problems.
This paper proposes a progressive sparse representation-based classification algorithm using local discrete cosine transform (DCT) evaluation to perform face recognition. Specifically, the sum of the contributions of all training samples of each subject is first taken as the contribution of this subject, then the redundant subject with the smallest contribution to the test sample is iteratively eliminated. Second, the progressive method aims at representing the test sample as a linear combination of all the remaining training samples, by which the representation capability of each training sample is exploited to determine the optimal ?nearest neighbors? for the test sample. Third, the transformed DCT evaluation is constructed to measure the similarity between the test sample and each local training sample using cosine distance metrics in the DCT domain. The final goal of the proposed method is to determine an optimal weighted sum of nearest neighbors that are obtained under the local correlative degree evaluation, which is approximately equal to the test sample, and we can use this weighted linear combination to perform robust classification. Experimental results conducted on the ORL database of faces (created by the Olivetti Research Laboratory in Cambridge), the FERET face database (managed by the Defense Advanced Research Projects Agency and the National Institute of Standards and Technology), AR face database (created by Aleix Martinez and Robert Benavente in the Computer Vision Center at U.A.B), and USPS handwritten digit database (gathered at the Center of Excellence in Document Analysis and Recognition at SUNY Buffalo) demonstrate the effectiveness of the proposed method.
3D Morphable Face Models (3DMM) have been used in pattern recognition for some time now. They have been applied as a basis for 3D face recognition, as well as in an assistive role for 2D face recognition to perform geometric and photometric normalisation of the input image, or in 2D face recognition system training. The statistical distribution underlying 3DMM is Gaussian. However, the single-Gaussian model seems at odds with reality when we consider different cohorts of data, e.g. Black and Chinese faces. Their means are clearly different. This paper introduces the Gaussian Mixture 3DMM (GM-3DMM) which models the global population as a mixture of Gaussian subpopulations, each with its own mean. The proposed GM-3DMM extends the traditional 3DMM naturally, by adopting a shared covariance structure to mitigate small sample estimation problems associated with data in high dimensional spaces. We construct a GM-3DMM, the training of which involves a multiple cohort dataset, SURREY-JNU, comprising 942 3D face scans of people with mixed backgrounds. Experiments in fitting the GM-3DMM to 2D face images to facilitate their geometric and photometric normalisation for pose and illumination invariant face recognition demonstrate the merits of the proposed mixture of Gaussians 3D face model.
This paper investigates the evaluation of dense
3D face reconstruction from a single 2D image in the wild.
To this end, we organise a competition that provides a new
benchmark dataset that contains 2000 2D facial images of
135 subjects as well as their 3D ground truth face scans. In
contrast to previous competitions or challenges, the aim of this
new benchmark dataset is to evaluate the accuracy of a 3D
dense face reconstruction algorithm using real, accurate and
high-resolution 3D ground truth face scans. In addition to the
dataset, we provide a standard protocol as well as a Python
script for the evaluation. Last, we report the results obtained
by three state-of-the-art 3D face reconstruction systems on the
new benchmark dataset. The competition is organised along
with the 2018 13th IEEE Conference on Automatic Face &
In recent years, facial landmark detection ? also known as face alignment or facial landmark localisation ? has become a very active area, due to its importance to a variety of image and video-based face analysis systems, such as face recognition, emotion analysis, human-computer interaction and 3D face reconstruction. This article looks at the challenges and latest technology advances in facial landmarks.
We present a new loss function, namely Wing loss, for robust
facial landmark localisation with Convolutional Neural
Networks (CNNs). We first compare and analyse different
loss functions including L2, L1 and smooth L1. The
analysis of these loss functions suggests that, for the training
of a CNN-based localisation model, more attention
should be paid to small and medium range errors. To this
end, we design a piece-wise loss function. The new loss
amplifies the impact of errors from the interval (-w, w) by
switching from L1 loss to a modified logarithm function.
To address the problem of under-representation of samples
with large out-of-plane head rotations in the training
set, we propose a simple but effective boosting strategy, referred
to as pose-based data balancing. In particular, we
deal with the data imbalance problem by duplicating the
minority training samples and perturbing them by injecting
random image rotation, bounding box translation and
other data augmentation approaches. Last, the proposed
approach is extended to create a two-stage framework for
robust facial landmark localisation. The experimental results
obtained on AFLW and 300W demonstrate the merits
of the Wing loss function, and prove the superiority of the
proposed method over the state-of-the-art approaches.
The paper presents a dictionary integration algorithm
using 3D morphable face models (3DMM) for poseinvariant
collaborative-representation-based face classification.
To this end, we first fit a 3DMM to the 2D face images of
a dictionary to reconstruct the 3D shape and texture of each
image. The 3D faces are used to render a number of virtual
2D face images with arbitrary pose variations to augment the
training data, by merging the original and rendered virtual
samples to create an extended dictionary. Second, to reduce
the information redundancy of the extended dictionary and
improve the sparsity of reconstruction coefficient vectors using
collaborative-representation-based classification (CRC), we
exploit an on-line class elimination scheme to optimise the
extended dictionary by identifying the training samples of the
most representative classes for a given query. The final goal is
to perform pose-invariant face classification using the proposed
dictionary integration method and the on-line pruning strategy
under the CRC framework. Experimental results obtained for
a set of well-known face datasets demonstrate the merits of the
proposed method, especially its robustness to pose variations.
The problem of re-identification of people in a crowd com-
monly arises in real application scenarios, yet it has received less atten-
tion than it deserves. To facilitate research focusing on this problem, we
have embarked on constructing a new person re-identification dataset
with many instances of crowded indoor and outdoor scenes. This paper proposes a two-stage robust method for pedestrian detection in a
complex crowded background to provide bounding box annotations. The
first stage is to generate pedestrian proposals using Faster R-CNN and
locate each pedestrian using Non-maximum Suppression (NMS). Candidates in dense proposal regions are merged to identify crowd patches.
We then apply a bottom-up human pose estimation method to detect
individual pedestrians in the crowd patches. The locations of all subjects are achieved based on the bounding boxes from the two stages. The
identity of the detected subjects throughout each video is then automatically annotated using multiple features and spatial-temporal clues. The
experimental results on a crowded pedestrians dataset demonstrate the
effectiveness and efficiency of the proposed method.
Huang Zengxi, Feng Zhenhua, Kittler Josef, Liu Yiguang (2018) Improve the Spoofing Resistance of Multimodal Verification with Representation-Based Measures, In: Lai Jian-Huang, Liu Cheng-Lin, Chen Xilin, Zhou Jie, Tan Tieniu, Zheng Nanning, Zha Hongbin (eds.), Pattern Recognition and Computer Vision. PRCV 2018. 11258 pp. 388-399
Recently, the security of multimodal verification has become a grow-ing concern since many fusion systems have been known to be easily deceived by partial spoof attacks, i.e. only a subset of modalities is spoofed. In this paper, we verify such a vulnerability and propose to use two representation-based met-rics to close this gap. Firstly, we use the collaborative representation fidelity with non-target subjects to measure the affinity of a query sample to the claimed client. We further consider sparse coding as a competing comparison among the client and the non-target subjects, and hence explore two sparsity-based measures for recognition. Last, we select the representation-based measure, and assemble its score and the affinity score of each modality to train a support vector machine classifier. Our experimental results on a chimeric multimodal database with face and ear traits demonstrate that in both regular verification and partial spoof at-tacks, the proposed method significant
Traditional collaborative representation based classification (CRC) method usually faces the challenge of data uncertainty hence results in poor performance, especially in the presence of appearance variations in pose, expression and illumination. To overcome this issue, this paper presents a CRC-based face classification method by jointly using block weighted LBP and analysis dictionary learning. To this end, we first design a block weighted LBP histogram algorithm to form a set of local histogram-based feature vectors instead of using raw images. By this means we are able to effectively decrease data redundancy and uncertainty derived from image noises and appearance variations. Second, we adopt an analysis dictionary learning model as the projection transform to construct an analysis subspace, in which a new sample is characterized with the improved sparsity of its reconstruction coefficient vector. The crucial role of the analysis dictionary learning method in CRC is revealed by its capacity of the collaborative representation in an analytic coefficient space. Extensive experimental results conducted on a set of well-known face databases demonstrate the merits of the proposed method.
Effective data augmentation is crucial for facial landmark localisation with Convolutional Neural Networks (CNNs). In this letter, we investigate different data augmentation techniques that can be used to generate sufficient data for training CNN-based facial landmark localisation systems. To the best of our knowledge, this is the first study that provides a systematic analysis of different data augmentation techniques in the area. In addition, an online Hard Augmented Example Mining (HAEM) strategy is advocated for further performance boosting. We examine the effectiveness of those techniques using a regression-based CNN architecture. The experimental results obtained on the AFLW and COFW datasets demonstrate the importance of data augmentation and the effectiveness of HAEM. The performance achieved using these techniques is superior to the state-of-the-art algorithms.
Extended sparse representation-based classifcation (ESRC) has shown interesting results on the problem of undersampled face recognition by generating an auxiliary intraclass variant dictionary for the representation of possible appearance variations. However, the method has high computational complexity due to the l1-minimization problem. To address this issue, this paper proposes
two strategies to speed up SRC using quadratic optimisation in downsized coefient solution subspace. The frst one, namely Fast SRC using Quadratic
Optimisation (FSRC-QO), applies PCA and LDA hybrid constrained optimisation method to achieve compressed linear representations of test samples. By design, more accurate and discriminative reconstruction of a test sample can be achieved for face classifcation, using the downsized coefficient space. Secondly, to explore the positive impact of our proposed method on deep-learning-based face classifcation, we enhance FSRC-QO using CNN-based features (FSRC-QO-CNN), in which we replace the original input image using robust CNN features in our FSRC-QO framework. Experimental results conducted on a set of well known face datasets, including AR, FERET, LFW and FRGC, demonstrate the merits of the proposed methods, especially in computational efficiency.
With efficient appearance learning models, Discriminative Correlation Filter (DCF) has been proven to be very successful in recent video object tracking benchmarks and competitions. However, the existing DCF paradigm suffers from two major issues, i.e., spatial boundary effect and temporal filter degradation. To mitigate these challenges, we propose a new DCF-based tracking method. The key innovations of the proposed method include adaptive spatial feature selection and temporal consistent constraints, with which the new tracker enables joint spatial-temporal filter learning in a lower dimensional discriminative manifold. More specifically, we apply structured spatial sparsity constraints to multi-channel filers. Consequently, the process of learning spatial filters can be approximated by the lasso regularisation. To encourage temporal consistency, the filter model is restricted to lie around its historical value and updated locally to preserve the global structure in the manifold. Last, a unified optimisation framework is proposed to jointly select temporal consistency preserving spatial features and learn discriminative filters with the augmented Lagrangian method. Qualitative and quantitative evaluations have been conducted on a number of well-known benchmarking datasets such as OTB2013, OTB50, OTB100, Temple-Colour, UAV123 and VOT2018. The experimental results demonstrate the superiority of the proposed method over the state-of-the-art approaches.
Appearance variations result in many difficulties in face image analysis. To deal with this challenge, we present
a Unified Tensor-based Active Appearance Model (UT-AAM) for jointly modelling the geometry and texture
information of 2D faces. For each type of face information, namely shape and texture, we construct a unified
tensor model capturing all relevant appearance variations. This contrasts with the variation-specific models
of the classical tensor AAM. To achieve the unification across pose variations, a strategy for dealing with
self-occluded faces is proposed to obtain consistent shape and texture representations of pose-varied faces.
In addition, our UT-AAM is capable of constructing the model from an incomplete training dataset, using
tensor completion methods. Last, we use an effective cascaded-regression-based method for UT-AAM fitting.
With these advancements, the utility of UT-AAM in practice is considerably enhanced. As an example, we
demonstrate the improvements in training facial landmark detectors through the use of UT-AAM to synthesise
a large number of virtual samples. Experimental results obtained on a number of well-known face datasets
demonstrate the merits of the proposed approach.
We propose a new Group Feature Selection method for Discriminative Correlation Filters (GFS-DCF) based visual object tracking. The key innovation of the proposed method is to perform group feature selection across both channel and spatial dimensions, thus to pinpoint the structural relevance of multi-channel features to the filtering system. In contrast to the widely used spatial regularisation or feature selection methods, to the best of our knowledge, this is the first time that channel selection has been advocated for DCF-based tracking. We demonstrate that our GFS-DCF method is able to significantly improve the performance of a DCF tracker equipped with deep neural network features. In addition, our GFS-DCF enables joint feature selection and filter learning, achieving enhanced discrimination and interpretability of the learned filters.
To further improve the performance, we adaptively integrate historical information by constraining filters to be smooth across temporal frames, using an efficient low-rank approximation. By design, specific temporal-spatial-channel configurations are dynamically learned in the tracking process, highlighting the relevant features, and alleviating the performance degrading impact of less discriminative representations and reducing information redundancy. The experimental results obtained on OTB2013, OTB2015, VOT2017, VOT2018 and TrackingNet demonstrate the merits of our GFS-DCF and its superiority over the state-of-the-art trackers. The code is publicly available at https://github.com/XU-TIANYANG/GFS-DCF.