This paper presents a half-face dictionary integration (HFDI) algorithm for representation-based classification. The proposed HFDI algorithm measures residuals between an input signal and the reconstructed one, using both the original and the synthesized dual-column (row) half-face training samples. More specifically, we first generate a set of virtual half-face samples for the purpose of training data augmentation. The aim is to obtain high-fidelity collaborative representation of a test sample. In this half-face integrated dictionary, each original training vector is replaced by an integrated dual-column (row) half-face matrix. Second, to reduce the redundancy between the original dictionary and the extended half-face dictionary, we propose an elimination strategy to gain the most robust training atoms. The last contribution of the proposed HFDI method is the use of a competitive fusion method weighting the reconstruction residuals from different dictionaries for robust face classification. Experimental results obtained from the Facial Recognition Technology, Aleix and Robert, Georgia Tech, ORL, and Carnegie Mellon University-pose, illumination and expression data sets demonstrate the effectiveness of the proposed method, especially in the case of the small sample size problem.
Kittler J, Huber P, Feng Z, Hu G, Christmas W (2016) 3D Morphable Face Models and Their Applications, Lecture Notes in Computer Science (LNCS) vol.9756: 9th International Conference, AMDO 2016, Palma de Mallorca, Spain, July 13-15, 2016, Proceedings 9756 pp. 185-206 Springer
3D Morphable Face Models (3DMM) have been used in face recognition for some time now. They can be applied in their own right as a basis for 3D face recognition and analysis involving 3D face data. However their prevalent use over the last decade has been as a versatile tool in 2D face recognition to normalise pose, illumination and expression of 2D face images. A 3DMM has the generative capacity to augment the training and test databases for various 2D face processing related tasks. It can be used to expand the gallery set for pose-invariant face matching. For any 2D face image it can furnish complementary information, in terms of its 3D face shape and texture. It can also aid multiple frame fusion by providing the means of registering a set of 2D images. A key enabling technology for this versatility is 3D face model to 2D face image fitting. In this paper recent developments in 3D face modelling and model fitting will be overviewed, and their merits in the context of diverse applications illustrated on several examples, including pose and illumination invariant face recognition, and 3D face reconstruction from video.
Shao C, Song X, Feng Z, Wu X-J, Zheng Y (2017) Dynamic dictionary optimization for sparse-representation-based face classification using local difference images, Information Sciences
In this study, we present a new sparse-representation-based face-classification algorithm that exploits dynamic dictionary optimization on an extended dictionary using synthesized faces. More specifically, given a dictionary consisting of face examples, we first augment the dictionary with a set of virtual faces generated by calculating the image difference of a pair of faces. This results in an extended dictionary with hybrid training samples, which enhances the capacity of the dictionary to represent new samples. Second, to reduce the redundancy of the extended dictionary and improve the classification accuracy, we use a dictionary-optimization method. We truncate the extended dictionary with a more compact structure by discarding the original samples with small contributions to represent a test sample. Finally, we perform sparse-representation-based face classification using the optimized dictionary. Experimental results obtained using the AR and FERRET face datasets demonstrate the superiority of the proposed method in terms of accuracy, especially for small-sample-size problems.
Traditional discriminant analysis (DA) methods are usually not amenable to being studied only with a few or even single facial image per subject. The fundamental reason lies in the fact that the traditional DA approaches cannot fully reflect the variations of a query sample with illumination, occlusion and pose variations, especially in the case of small sample size. In this paper, we develop a multi-scale fuzzy sparse discriminant analysis using a local third-order tensor model to perform robust face classification. More specifically, we firstly introduced a local third-order tensor model of face images to exploit a set of multi-scale characteristics of the Ridgelet transform. Secondly, a set of Ridgelet transformed coefficients with respect to each block from a face image are respectively generated. We then merge all these coefficients to form a new representative vector for the image. Lastly, we evaluate the sparse similarity grade between each training sample and class by constructing a sparse similarity metric, and redesign the traditional discriminant criterion that contains considerable fuzzy sparse similarity grades to perform robust classification. Experimental results conducted on a set of well-known face databases demonstrate the merits of the proposed method, especially in the case of insufficient training samples.
Beveridge JR, Zhang H, Draper BA, Flynn PJ, Feng Z, Huber P, Kittler J, Huang Z, Li S, Li Y, Kan M, Wang R, Shan S, Chen X, Li H, Hua G, Struc V, Krizaj J, Ding C, Tao D, Phillips PJ (2015) Report on the FG 2015 Video Person Recognition Evaluation, 2015 11TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOPS ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG), VOL. 2 IEEE
We present a new Cascaded Shape Regression (CSR) architecture, namely Dynamic Attention-Controlled CSR (DAC-CSR), for robust facial landmark detection on unconstrained faces. Our DAC-CSR divides facial landmark detection into three cascaded sub-tasks: face bounding box refinement, general CSR and attention-controlled CSR. The first two stages refine initial face bounding boxes and output intermediate facial landmarks. Then, an online dynamic model selection method is used to choose appropriate domain-specific CSRs for further landmark refinement. The key innovation of our DAC-CSR is the fault-tolerant mechanism, using fuzzy set sample weighting, for attentioncontrolled domain-specific model training. Moreover, we advocate data augmentation with a simple but effective 2D profile face generator, and context-aware feature extraction for better facial feature representation. Experimental results obtained on challenging datasets demonstrate the merits of our DAC-CSR over the state-of-the-art methods.
In practical applications of pattern recognition and computer vision, the performance of many approaches can be improved by using multiple models. In this paper, we develop a common theoretical framework for multiple model fusion at the feature level using multilinear subspace analysis (also known as tensor algebra). One disadvantage of the multilinear approach is that it is hard to obtain enough training observations for tensor decomposition algorithms. To overcome this difficulty, we adopted the M2SA algorithm to reconstruct the missing entries of the incomplete training tensor. Furthermore, we apply the proposed framework to the problem of face image analysis using Active Appearance Model (AAM) to validate its performance. Evaluations of AAM using the proposed framework are conducted on Multi-PIE face database with promising results. © Springer-Verlag 2013.
Feng Z-H, Huber P, Kittler J, Christmas W, Wu X-J (2015) Random Cascaded-Regression Copse for Robust Facial Landmark Detection, IEEE SIGNAL PROCESSING LETTERS 22 (1) pp. 76-80 IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Feng ZH, Kittler J, Christmas W, Wu XJ, Pfeiffer S (2012) Automatic face annotation by multilinear AAM with Missing Values, Proceedings - International Conference on Pattern Recognition pp. 2586-2589
It has been shown that multilinear subspace analysis is a powerful tool to overcome difficulties posed by viewpoint, illumination and expression variations in Active Appearance Model(AAM). However, the Higher Order Singular Value Decomposition (HOSVD) in multilinear analysis requires training samples to build the training tensor, which include face images under all different variations. It is hard to obtain such a complete training tensor in practical applications. In this paper, we propose a multilinear AAM which can be generated from an incomplete training tensor using Multilinear Subspace Analysis with Missing Values (M2SA). Also, the 2D appearance is used for training appearance tensor directly to reduce the memory requirements. Experimental results on the Multi-PIE face database show the efficiency of the proposed method. © 2012 ICPR Org Committee.
3D face reconstruction of shape and skin texture from a single 2D image can be performed using a 3D Morphable Model (3DMM) in an analysis-by-synthesis approach. However, performing this reconstruction (fitting) efficiently and accurately in a general imaging scenario is a challenge. Such a scenario would involve a perspective camera to describe the geometric projection from 3D to 2D, and the Phong model to characterise illumination. Under these imaging assumptions the reconstruction problem is nonlinear and, consequently, computationally very demanding. In this work, we present an efficient stepwise 3DMM-to-2D image-fitting procedure, which sequentially optimises the pose, shape, light direction, light strength and skin texture parameters in separate steps. By linearising each step of the fitting process we derive closed-form solutions for the recovery of the respective parameters, leading to efficient fitting. The proposed optimisation process involves all the pixels of the input image, rather than randomly selected subsets, which enhances the accuracy of the fitting. It is referred to as Efficient Stepwise Optimisation (ESO). The proposed fitting strategy is evaluated using reconstruction error as a performance measure. In addition, we demonstrate its merits in the context of a 3D-assisted 2D face recognition system which detects landmarks automatically and extracts both holistic and local features using a 3DMM. This contrasts with most other methods which only report results that use manual face landmarking to initialise the fitting. Our method is tested on the public CMU-PIE and Multi-PIE face databases, as well as one internal database. The experimental results show that the face reconstruction using ESO is significantly faster, and its accuracy is at least as good as that achieved by the existing 3DMM fitting algorithms. A face recognition system integrating ESO to provide a pose and illumination invariant solution compares favourably with other state-of-the-art methods. In particular, it outperforms deep learning methods when tested on the Multi-PIE database.
In this study, we present a new sparse-representation-based face-classification algorithm that exploits dynamic dictionary optimization on an extended dictionary using synthesized faces. More specifically, given a dictionary consisting of face examples, we first augment the dictionary with a set of virtual faces generated by calculating the image difference of a pair of faces. This results in an extended dictionary with hybrid training samples, which enhances the capacity of the dictionary to represent new samples. Second, to reduce the redundancy of the extended dictionary and improve the classification accuracy, we use a dictionary-optimization method. We truncate the extended dictionary with a more compact structure by discarding the original samples with small contributions to represent a test sample. Finally, we perform sparserepresentation- based face classification using the optimized dictionary. Experimental results obtained using the AR and FERRET face datasets demonstrate the superiority of the proposed method in terms of accuracy, especially for small-sample-size problems.
This paper proposes a progressive sparse representation-based classification algorithm using local discrete cosine transform (DCT) evaluation to perform face recognition. Specifically, the sum of the contributions of all training samples of each subject is first taken as the contribution of this subject, then the redundant subject with the smallest contribution to the test sample is iteratively eliminated. Second, the progressive method aims at representing the test sample as a linear combination of all the remaining training samples, by which the representation capability of each training sample is exploited to determine the optimal ?nearest neighbors? for the test sample. Third, the transformed DCT evaluation is constructed to measure the similarity between the test sample and each local training sample using cosine distance metrics in the DCT domain. The final goal of the proposed method is to determine an optimal weighted sum of nearest neighbors that are obtained under the local correlative degree evaluation, which is approximately equal to the test sample, and we can use this weighted linear combination to perform robust classification. Experimental results conducted on the ORL database of faces (created by the Olivetti Research Laboratory in Cambridge), the FERET face database (managed by the Defense Advanced Research Projects Agency and the National Institute of Standards and Technology), AR face database (created by Aleix Martinez and Robert Benavente in the Computer Vision Center at U.A.B), and USPS handwritten digit database (gathered at the Center of Excellence in Document Analysis and Recognition at SUNY Buffalo) demonstrate the effectiveness of the proposed method.
3D Morphable Face Models (3DMM) have been used in pattern recognition for some time now. They have been applied as a basis for 3D face recognition, as well as in an assistive role for 2D face recognition to perform geometric and photometric normalisation of the input image, or in 2D face recognition system training. The statistical distribution underlying 3DMM is Gaussian. However, the single-Gaussian model seems at odds with reality when we consider different cohorts of data, e.g. Black and Chinese faces. Their means are clearly different. This paper introduces the Gaussian Mixture 3DMM (GM-3DMM) which models the global population as a mixture of Gaussian subpopulations, each with its own mean. The proposed GM-3DMM extends the traditional 3DMM naturally, by adopting a shared covariance structure to mitigate small sample estimation problems associated with data in high dimensional spaces. We construct a GM-3DMM, the training of which involves a multiple cohort dataset, SURREY-JNU, comprising 942 3D face scans of people with mixed backgrounds. Experiments in fitting the GM-3DMM to 2D face images to facilitate their geometric and photometric normalisation for pose and illumination invariant face recognition demonstrate the merits of the proposed mixture of Gaussians 3D face model.
This paper investigates the evaluation of dense
3D face reconstruction from a single 2D image in the wild.
To this end, we organise a competition that provides a new
benchmark dataset that contains 2000 2D facial images of
135 subjects as well as their 3D ground truth face scans. In
contrast to previous competitions or challenges, the aim of this
new benchmark dataset is to evaluate the accuracy of a 3D
dense face reconstruction algorithm using real, accurate and
high-resolution 3D ground truth face scans. In addition to the
dataset, we provide a standard protocol as well as a Python
script for the evaluation. Last, we report the results obtained
by three state-of-the-art 3D face reconstruction systems on the
new benchmark dataset. The competition is organised along
with the 2018 13th IEEE Conference on Automatic Face &
In recent years, facial landmark detection ? also known as face alignment or facial landmark localisation ? has become a very active area, due to its importance to a variety of image and video-based face analysis systems, such as face recognition, emotion analysis, human-computer interaction and 3D face reconstruction. This article looks at the challenges and latest technology advances in facial landmarks.
We present a new loss function, namely Wing loss, for robust
facial landmark localisation with Convolutional Neural
Networks (CNNs). We first compare and analyse different
loss functions including L2, L1 and smooth L1. The
analysis of these loss functions suggests that, for the training
of a CNN-based localisation model, more attention
should be paid to small and medium range errors. To this
end, we design a piece-wise loss function. The new loss
amplifies the impact of errors from the interval (-w, w) by
switching from L1 loss to a modified logarithm function.
To address the problem of under-representation of samples
with large out-of-plane head rotations in the training
set, we propose a simple but effective boosting strategy, referred
to as pose-based data balancing. In particular, we
deal with the data imbalance problem by duplicating the
minority training samples and perturbing them by injecting
random image rotation, bounding box translation and
other data augmentation approaches. Last, the proposed
approach is extended to create a two-stage framework for
robust facial landmark localisation. The experimental results
obtained on AFLW and 300W demonstrate the merits
of the Wing loss function, and prove the superiority of the
proposed method over the state-of-the-art approaches.
The paper presents a dictionary integration algorithm
using 3D morphable face models (3DMM) for poseinvariant
collaborative-representation-based face classification.
To this end, we first fit a 3DMM to the 2D face images of
a dictionary to reconstruct the 3D shape and texture of each
image. The 3D faces are used to render a number of virtual
2D face images with arbitrary pose variations to augment the
training data, by merging the original and rendered virtual
samples to create an extended dictionary. Second, to reduce
the information redundancy of the extended dictionary and
improve the sparsity of reconstruction coefficient vectors using
collaborative-representation-based classification (CRC), we
exploit an on-line class elimination scheme to optimise the
extended dictionary by identifying the training samples of the
most representative classes for a given query. The final goal is
to perform pose-invariant face classification using the proposed
dictionary integration method and the on-line pruning strategy
under the CRC framework. Experimental results obtained for
a set of well-known face datasets demonstrate the merits of the
proposed method, especially its robustness to pose variations.
The problem of re-identification of people in a crowd com-
monly arises in real application scenarios, yet it has received less atten-
tion than it deserves. To facilitate research focusing on this problem, we
have embarked on constructing a new person re-identification dataset
with many instances of crowded indoor and outdoor scenes. This paper proposes a two-stage robust method for pedestrian detection in a
complex crowded background to provide bounding box annotations. The
first stage is to generate pedestrian proposals using Faster R-CNN and
locate each pedestrian using Non-maximum Suppression (NMS). Candidates in dense proposal regions are merged to identify crowd patches.
We then apply a bottom-up human pose estimation method to detect
individual pedestrians in the crowd patches. The locations of all subjects are achieved based on the bounding boxes from the two stages. The
identity of the detected subjects throughout each video is then automatically annotated using multiple features and spatial-temporal clues. The
experimental results on a crowded pedestrians dataset demonstrate the
effectiveness and efficiency of the proposed method.
In recent years, discriminative correlation filter
(DCF) based algorithms have significantly advanced the state of the art in visual object tracking. The key to the success of DCF is an efficient discriminative regression model trained
with powerful multi-cue features, including both hand-crafted and deep neural network features. However, the tracking performance is hindered by their inability to respond adequately to abrupt target appearance variations. This issue is posed by the limited representation capability of fixed image features. In this work, we set out to rectify this shortcoming by proposing a complementary representation of a visual content. Specifically, we propose the use of a collaborative representation between
successive frames to extract the dynamic appearance information from a target with rapid appearance changes, which results in suppressing the undesirable impact of the background. The resulting collaborative representation coefficients are combined
with the original feature maps using a spatially regularised DCF framework for performance boosting. The experimental results on several benchmarking datasets demonstrate the effectiveness and robustness of the proposed method, as compared with a
number of state-of-the-art tracking algorithms.
Huang Zengxi, Feng Zhenhua, Kittler Josef, Liu Yiguang (2018) Improve the Spoofing Resistance of Multimodal Verification with Representation-Based Measures,In: Lai Jian-Huang, Liu Cheng-Lin, Chen Xilin, Zhou Jie, Tan Tieniu, Zheng Nanning, Zha Hongbin (eds.), Pattern Recognition and Computer Vision. PRCV 2018. 11258 pp. 388-399
Recently, the security of multimodal verification has become a grow-ing concern since many fusion systems have been known to be easily deceived by partial spoof attacks, i.e. only a subset of modalities is spoofed. In this paper, we verify such a vulnerability and propose to use two representation-based met-rics to close this gap. Firstly, we use the collaborative representation fidelity with non-target subjects to measure the affinity of a query sample to the claimed client. We further consider sparse coding as a competing comparison among the client and the non-target subjects, and hence explore two sparsity-based measures for recognition. Last, we select the representation-based measure, and assemble its score and the affinity score of each modality to train a support vector machine classifier. Our experimental results on a chimeric multimodal database with face and ear traits demonstrate that in both regular verification and partial spoof at-tacks, the proposed method significant
Traditional collaborative representation based classification (CRC) method usually faces the challenge of data uncertainty hence results in poor performance, especially in the presence of appearance variations in pose, expression and illumination. To overcome this issue, this paper presents a CRC-based face classification method by jointly using block weighted LBP and analysis dictionary learning. To this end, we first design a block weighted LBP histogram algorithm to form a set of local histogram-based feature vectors instead of using raw images. By this means we are able to effectively decrease data redundancy and uncertainty derived from image noises and appearance variations. Second, we adopt an analysis dictionary learning model as the projection transform to construct an analysis subspace, in which a new sample is characterized with the improved sparsity of its reconstruction coefficient vector. The crucial role of the analysis dictionary learning method in CRC is revealed by its capacity of the collaborative representation in an analytic coefficient space. Extensive experimental results conducted on a set of well-known face databases demonstrate the merits of the proposed method.
Effective data augmentation is crucial for facial landmark localisation with Convolutional Neural Networks (CNNs). In this letter, we investigate different data augmentation techniques that can be used to generate sufficient data for training CNN-based facial landmark localisation systems. To the best of our knowledge, this is the first study that provides a systematic analysis of different data augmentation techniques in the area. In addition, an online Hard Augmented Example Mining (HAEM) strategy is advocated for further performance boosting. We examine the effectiveness of those techniques using a regression-based CNN architecture. The experimental results obtained on the AFLW and COFW datasets demonstrate the importance of data augmentation and the effectiveness of HAEM. The performance achieved using these techniques is superior to the state-of-the-art algorithms.
Extended sparse representation-based classifcation (ESRC) has shown interesting results on the problem of undersampled face recognition by generating an auxiliary intraclass variant dictionary for the representation of possible appearance variations. However, the method has high computational complexity due to the l1-minimization problem. To address this issue, this paper proposes
two strategies to speed up SRC using quadratic optimisation in downsized coefient solution subspace. The frst one, namely Fast SRC using Quadratic
Optimisation (FSRC-QO), applies PCA and LDA hybrid constrained optimisation method to achieve compressed linear representations of test samples. By design, more accurate and discriminative reconstruction of a test sample can be achieved for face classifcation, using the downsized coefficient space. Secondly, to explore the positive impact of our proposed method on deep-learning-based face classifcation, we enhance FSRC-QO using CNN-based features (FSRC-QO-CNN), in which we replace the original input image using robust CNN features in our FSRC-QO framework. Experimental results conducted on a set of well known face datasets, including AR, FERET, LFW and FRGC, demonstrate the merits of the proposed methods, especially in computational efficiency.
With efficient appearance learning models, Discriminative Correlation Filter (DCF) has been proven to be very successful in recent video object tracking benchmarks and competitions. However, the existing DCF paradigm suffers from two major issues, i.e., spatial boundary effect and temporal filter degradation. To mitigate these challenges, we propose a new DCF-based tracking method. The key innovations of the proposed method include adaptive spatial feature selection and temporal consistent constraints, with which the new tracker enables joint spatial-temporal filter learning in a lower dimensional discriminative manifold. More specifically, we apply structured spatial sparsity constraints to multi-channel filers. Consequently, the process of learning spatial filters can be approximated by the lasso regularisation. To encourage temporal consistency, the filter model is restricted to lie around its historical value and updated locally to preserve the global structure in the manifold. Last, a unified optimisation framework is proposed to jointly select temporal consistency preserving spatial features and learn discriminative filters with the augmented Lagrangian method. Qualitative and quantitative evaluations have been conducted on a number of well-known benchmarking datasets such as OTB2013, OTB50, OTB100, Temple-Colour, UAV123 and VOT2018. The experimental results demonstrate the superiority of the proposed method over the state-of-the-art approaches.
Appearance variations result in many difficulties in face image analysis. To deal with this challenge, we present
a Unified Tensor-based Active Appearance Model (UT-AAM) for jointly modelling the geometry and texture
information of 2D faces. For each type of face information, namely shape and texture, we construct a unified
tensor model capturing all relevant appearance variations. This contrasts with the variation-specific models
of the classical tensor AAM. To achieve the unification across pose variations, a strategy for dealing with
self-occluded faces is proposed to obtain consistent shape and texture representations of pose-varied faces.
In addition, our UT-AAM is capable of constructing the model from an incomplete training dataset, using
tensor completion methods. Last, we use an effective cascaded-regression-based method for UT-AAM fitting.
With these advancements, the utility of UT-AAM in practice is considerably enhanced. As an example, we
demonstrate the improvements in training facial landmark detectors through the use of UT-AAM to synthesise
a large number of virtual samples. Experimental results obtained on a number of well-known face datasets
demonstrate the merits of the proposed approach.
We propose a new Group Feature Selection method for Discriminative Correlation Filters (GFS-DCF) based visual object tracking. The key innovation of the proposed method is to perform group feature selection across both channel and spatial dimensions, thus to pinpoint the structural relevance of multi-channel features to the filtering system. In contrast to the widely used spatial regularisation or feature selection methods, to the best of our knowledge, this is the first time that channel selection has been advocated for DCF-based tracking. We demonstrate that our GFS-DCF method is able to significantly improve the performance of a DCF tracker equipped with deep neural network features. In addition, our GFS-DCF enables joint feature selection and filter learning, achieving enhanced discrimination and interpretability of the learned filters.
To further improve the performance, we adaptively integrate historical information by constraining filters to be smooth across temporal frames, using an efficient low-rank approximation. By design, specific temporal-spatial-channel configurations are dynamically learned in the tracking process, highlighting the relevant features, and alleviating the performance degrading impact of less discriminative representations and reducing information redundancy. The experimental results obtained on OTB2013, OTB2015, VOT2017, VOT2018 and TrackingNet demonstrate the merits of our GFS-DCF and its superiority over the state-of-the-art trackers. The code is publicly available at https://github.com/XU-TIANYANG/GFS-DCF.
Discriminative correlation filter (DCF) has achieved advanced performance in visual object tracking with remarkable efficiency guaranteed by its implementation in the frequency domain. However, the effect of the structural relationship of DCF and object features has not been adequately explored in the context of the filter design. To remedy this deficiency, this paper proposes a Low-rank and Sparse DCF (LSDCF) that improves the relevance of features used by discriminative filters. To be more specific, we extend the classical DCF paradigm from ridge regression to lasso regression, and constrain the estimate to be of low-rank across frames, thus identifying and retaining the informative filters distributed on a low-dimensional manifold. To this end, specific temporal-spatial-channel configurations are adaptively learned to achieve enhanced discrimination and interpretability. In addition, we analyse the complementary characteristics between hand-crafted features and deep features, and propose a coarse-to-fine heuristic tracking strategy to further improve the performance of our LSDCF. Last, the augmented Lagrange multiplier optimisation method is used to achieve efficient optimisation. The experimental results obtained on a number of well-known benchmarking datasets, including OTB2013, OTB50, OTB100, TC128, UAV123, VOT2016 and VOT2018, demonstrate the effectiveness and robustness of the proposed method, delivering outstanding performance compared to the state-of-the-art trackers.
3D assisted 2D face recognition involves the process of reconstructing 3D faces from 2D images and solving
the problem of face recognition in 3D. To facilitate the use of deep neural networks, a 3D face, normally
represented as a 3D mesh of vertices and its corresponding surface texture, is remapped to image-like square
isomaps by a conformal mapping. Based on previous work, we assume that face recognition benefits more
from texture. In this work, we focus on the surface texture and its discriminatory information content for recognition
purposes. Our approach is to prepare a 3D mesh, the corresponding surface texture and the original 2D
image as triple input for the recognition network, to show that 3D data is useful for face recognition. Texture
enhancement methods to control the texture fusion process are introduced and we adapt data augmentation
methods. Our results show that texture-map-based face recognition can not only compete with state-of-the-art
systems under the same preconditions but also outperforms standard 2D methods from recent years.
Efficient and robust facial landmark localisation is crucial for the deployment of real-time face analysis systems. This paper presents a new loss function, namely Rectified Wing (RWing) loss, for regression-based facial landmark localisation with Convolutional Neural Networks (CNNs). We first systemically analyse different loss functions, including L2, L1 and smooth L1. The analysis suggests that the training of a network should pay more attention to small-medium errors. Motivated by this finding, we design a piece-wise loss that amplifies the impact of the samples with small-medium errors. Besides, we rectify the loss function for very small errors to mitigate the impact of inaccuracy of manual annotation. The use of our RWing loss boosts the performance significantly for regression-based CNNs in facial landmarking, especially for lightweight network architectures. To address the problem of under-representation of samples with large pose variations, we propose a simple but effective boosting strategy, referred to as pose-based data balancing. In particular, we deal with the data imbalance problem by duplicating the minority training samples and perturbing them by injecting random image rotation, bounding box translation and other data augmentation strategies. Last, the proposed approach is extended to create a coarse-to-fine framework for robust and efficient landmark localisation. Moreover, the proposed coarse-to-fine framework is able to deal with the small sample size problem effectively. The experimental results obtained on several well-known benchmarking datasets demonstrate the merits of our RWing loss and prove the superiority of the proposed method over the state-of-the-art approaches.
This paper presents a new Self-growing and Pruning
Generative Adversarial Network (SP-GAN) for realistic image
generation. In contrast to traditional GAN models, our SPGAN
is able to dynamically adjust the size and architecture
of a network in the training stage, by using the proposed selfgrowing and pruning mechanisms. To be more specific, we first train two seed networks as the generator and discriminator, each only contains a small number of convolution kernels. Such small-scale networks are much easier and faster to train than large-capacity networks. Second, in the self-growing step,we replicate the convolution kernels of each seed network to
augment the scale of the network, followed by fine-tuning the augmented/expanded network. More importantly, to prevent the excessive growth of each seed network in the self-growing stage, we propose a pruning strategy that reduces the redundancy of an augmented network, yielding the optimal scale of the network. Last, we design a new adaptive loss function that is treated as a variable loss computational process for the training of the proposed SP-GAN model. By design, the hyperparameters of the loss function can dynamically adapt to different training stages. Experimental results obtained on a set of datasets
demonstrate the merits of the proposed method, especially in
terms of the stability and efficiency of network training. The source code of the proposed SP-GAN method is publicly available at https://github.com/Lambert-chen/SPGAN.git
Modern face recognition systems extract face representations
using deep neural networks (DNNs) and give excellent
identification and verification results, when tested on
high resolution (HR) images. However, the performance of such an algorithm degrades significantly for low resolution (LR) images. A straight forward solution could be to train a DNN, using simultaneously, high and low resolution face images. This
approach yields a definite improvement at lower resolutions but suffers a performance degradation for high resolution images. To overcome this shortcoming, we propose to train a network using both HR and LR images under the guidance of a fixed network, pretrained on HR face images. The guidance is provided by minimising the KL-divergence between the output Softmax probabilities of the pretrained (i.e., Teacher) and trainable (i.e.,
Student) network as well as by sharing the Softmax weights
between the two networks. The resulting solution is tested on down-sampled images from FaceScrub and MegaFace datasets
and shows a consistent performance improvement across various resolutions. We also tested our proposed solution on standard LR benchmarks such as TinyFace and SCFace. Our algorithm consistently outperforms the state-of-the-art methods on these
datasets, confirming the effectiveness and merits of the proposed method.
To learn disentangled representations of facial images, we present a Dual Encoder-Decoder
based Generative Adversarial Network (DED-GAN). In the proposed method, both the generator and
discriminator are designed with deep encoder-decoder architectures as their backbones. To be more specific,
the encoder-decoder structured generator is used to learn a pose disentangled face representation, and the
encoder-decoder structured discriminator is tasked to perform real/fake classification, face reconstruction,
determining identity and estimating face pose. We further improve the proposed network architecture
by minimizing the additional pixel-wise loss defined by the Wasserstein distance at the output of the
discriminator so that the adversarial framework can be better trained. Additionally, we consider face pose
variation to be continuous, rather than discrete in existing literature, to inject richer pose information into
our model. The pose estimation task is formulated as a regression problem, which helps to disentangle
identity information from pose variations. The proposed network is evaluated on the tasks of pose-invariant
face recognition (PIFR) and face synthesis across poses. An extensive quantitative and qualitative evaluation
carried out on several controlled and in-the-wild benchmarking datasets demonstrates the superiority of the
proposed DED-GAN method over the state-of-the-art approaches.