Biography

Areas of specialism

Computer Vision; Machine Learning; Artifical Intelligence; Image Processing; Pattern Recognition

University roles and responsibilities

  • Applicant Day Coordinator
  • 1st Year UG Personal Tutor
  • Open Day Coordinator

    Research

    Research interests

    Supervision

    Postgraduate research supervision

    My teaching

    My publications

    Publications

    Zhenhua Feng, Josef Kittler, William Christmas, Patrik Huber, X-J Wu (2017)Dynamic Attention-controlled Cascaded Shape Regression Exploiting Training Data Augmentation and Fuzzy-set Sample Weighting, In: Proceedings of CVPR 2017pp. 2481-2490 IEEE

    We present a new Cascaded Shape Regression (CSR) architecture, namely Dynamic Attention-Controlled CSR (DAC-CSR), for robust facial landmark detection on unconstrained faces. Our DAC-CSR divides facial landmark detection into three cascaded sub-tasks: face bounding box refinement, general CSR and attention-controlled CSR. The first two stages refine initial face bounding boxes and output intermediate facial landmarks. Then, an online dynamic model selection method is used to choose appropriate domain-specific CSRs for further landmark refinement. The key innovation of our DAC-CSR is the fault-tolerant mechanism, using fuzzy set sample weighting, for attentioncontrolled domain-specific model training. Moreover, we advocate data augmentation with a simple but effective 2D profile face generator, and context-aware feature extraction for better facial feature representation. Experimental results obtained on challenging datasets demonstrate the merits of our DAC-CSR over the state-of-the-art methods.

    C Shao, X Song, Zhenhua Feng, X-J Wu, Y Zheng (2017)Dynamic Dictionary Optimization for Sparse-representation-based Face Classification using Local Difference Images, In: Information Sciences393pp. 1-14 Elsevier

    In this study, we present a new sparse-representation-based face-classification algorithm that exploits dynamic dictionary optimization on an extended dictionary using synthesized faces. More specifically, given a dictionary consisting of face examples, we first augment the dictionary with a set of virtual faces generated by calculating the image difference of a pair of faces. This results in an extended dictionary with hybrid training samples, which enhances the capacity of the dictionary to represent new samples. Second, to reduce the redundancy of the extended dictionary and improve the classification accuracy, we use a dictionary-optimization method. We truncate the extended dictionary with a more compact structure by discarding the original samples with small contributions to represent a test sample. Finally, we perform sparserepresentation- based face classification using the optimized dictionary. Experimental results obtained using the AR and FERRET face datasets demonstrate the superiority of the proposed method in terms of accuracy, especially for small-sample-size problems.

    Xiaoning Song, Youming Chen, Zhenhua Feng, Guosheng Hu, Tao Zhang, Xiao-jun Wu (2019)Collaborative Representation based Face Classification Exploiting Block Weighted LBP and Analysis Dictionary Learning, In: Pattern Recognition88pp. 127-138 Elsevier

    Traditional collaborative representation based classification (CRC) method usually faces the challenge of data uncertainty hence results in poor performance, especially in the presence of appearance variations in pose, expression and illumination. To overcome this issue, this paper presents a CRC-based face classification method by jointly using block weighted LBP and analysis dictionary learning. To this end, we first design a block weighted LBP histogram algorithm to form a set of local histogram-based feature vectors instead of using raw images. By this means we are able to effectively decrease data redundancy and uncertainty derived from image noises and appearance variations. Second, we adopt an analysis dictionary learning model as the projection transform to construct an analysis subspace, in which a new sample is characterized with the improved sparsity of its reconstruction coefficient vector. The crucial role of the analysis dictionary learning method in CRC is revealed by its capacity of the collaborative representation in an analytic coefficient space. Extensive experimental results conducted on a set of well-known face databases demonstrate the merits of the proposed method.

    Tianyang Xu, Zhenhua Feng, Xiao-Jun Wu, Josef Kittler (2019)Joint Group Feature Selection and Discriminative Filter Learning for Robust Visual Object Tracking, In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2019) Institute of Electrical and Electronics Engineers (IEEE)

    We propose a new Group Feature Selection method for Discriminative Correlation Filters (GFS-DCF) based visual object tracking. The key innovation of the proposed method is to perform group feature selection across both channel and spatial dimensions, thus to pinpoint the structural relevance of multi-channel features to the filtering system. In contrast to the widely used spatial regularisation or feature selection methods, to the best of our knowledge, this is the first time that channel selection has been advocated for DCF-based tracking. We demonstrate that our GFS-DCF method is able to significantly improve the performance of a DCF tracker equipped with deep neural network features. In addition, our GFS-DCF enables joint feature selection and filter learning, achieving enhanced discrimination and interpretability of the learned filters. To further improve the performance, we adaptively integrate historical information by constraining filters to be smooth across temporal frames, using an efficient low-rank approximation. By design, specific temporal-spatial-channel configurations are dynamically learned in the tracking process, highlighting the relevant features, and alleviating the performance degrading impact of less discriminative representations and reducing information redundancy. The experimental results obtained on OTB2013, OTB2015, VOT2017, VOT2018 and TrackingNet demonstrate the merits of our GFS-DCF and its superiority over the state-of-the-art trackers. The code is publicly available at https://github.com/XU-TIANYANG/GFS-DCF.

    Zengxi Huang, Zhenhua Feng, Josef Kittler, Yiguang Liu (2018)Improve the Spoofing Resistance of Multimodal Verification with Representation-Based Measures, In: Pattern Recognition and Computer Vision. PRCV 2018.11258pp. 388-399 Springer

    Recently, the security of multimodal verification has become a grow-ing concern since many fusion systems have been known to be easily deceived by partial spoof attacks, i.e. only a subset of modalities is spoofed. In this paper, we verify such a vulnerability and propose to use two representation-based met-rics to close this gap. Firstly, we use the collaborative representation fidelity with non-target subjects to measure the affinity of a query sample to the claimed client. We further consider sparse coding as a competing comparison among the client and the non-target subjects, and hence explore two sparsity-based measures for recognition. Last, we select the representation-based measure, and assemble its score and the affinity score of each modality to train a support vector machine classifier. Our experimental results on a chimeric multimodal database with face and ear traits demonstrate that in both regular verification and partial spoof at-tacks, the proposed method significant

    Zhenhua Feng, Patrik Huber, Josef Kittler, P Hancock, X-J Wu, Q Zhao, Paul Koppen, M Ratsch (2018)Evaluation of Dense 3D Reconstruction from 2D Face Images in the Wild, In: Proceedings of 13th IEEE International Conference on Automatic Face & Gesture Recognition Conference IEEE

    This paper investigates the evaluation of dense 3D face reconstruction from a single 2D image in the wild. To this end, we organise a competition that provides a new benchmark dataset that contains 2000 2D facial images of 135 subjects as well as their 3D ground truth face scans. In contrast to previous competitions or challenges, the aim of this new benchmark dataset is to evaluate the accuracy of a 3D dense face reconstruction algorithm using real, accurate and high-resolution 3D ground truth face scans. In addition to the dataset, we provide a standard protocol as well as a Python script for the evaluation. Last, we report the results obtained by three state-of-the-art 3D face reconstruction systems on the new benchmark dataset. The competition is organised along with the 2018 13th IEEE Conference on Automatic Face & Gesture Recognition.

    Z Feng, X Song, X Yang, X Wu, J Yang (2016)Towards multi-scale fuzzy sparse discriminant analysis using local third-order tensor model of face images, In: Neurocomputing185pp. 53-63 Elsevier

    Traditional discriminant analysis (DA) methods are usually not amenable to being studied only with a few or even single facial image per subject. The fundamental reason lies in the fact that the traditional DA approaches cannot fully reflect the variations of a query sample with illumination, occlusion and pose variations, especially in the case of small sample size. In this paper, we develop a multi-scale fuzzy sparse discriminant analysis using a local third-order tensor model to perform robust face classification. More specifically, we firstly introduced a local third-order tensor model of face images to exploit a set of multi-scale characteristics of the Ridgelet transform. Secondly, a set of Ridgelet transformed coefficients with respect to each block from a face image are respectively generated. We then merge all these coefficients to form a new representative vector for the image. Lastly, we evaluate the sparse similarity grade between each training sample and class by constructing a sparse similarity metric, and redesign the traditional discriminant criterion that contains considerable fuzzy sparse similarity grades to perform robust classification. Experimental results conducted on a set of well-known face databases demonstrate the merits of the proposed method, especially in the case of insufficient training samples.

    Paul Koppen, Zhenhua Feng, Josef Kittler, Muhammad Awais, William Christmas, Xiao-Jun Wu, He-Feng Yin (2017)Gaussian Mixture 3D Morphable Face Model, In: Pattern Recognition74pp. 617-628 Elsevier

    3D Morphable Face Models (3DMM) have been used in pattern recognition for some time now. They have been applied as a basis for 3D face recognition, as well as in an assistive role for 2D face recognition to perform geometric and photometric normalisation of the input image, or in 2D face recognition system training. The statistical distribution underlying 3DMM is Gaussian. However, the single-Gaussian model seems at odds with reality when we consider different cohorts of data, e.g. Black and Chinese faces. Their means are clearly different. This paper introduces the Gaussian Mixture 3DMM (GM-3DMM) which models the global population as a mixture of Gaussian subpopulations, each with its own mean. The proposed GM-3DMM extends the traditional 3DMM naturally, by adopting a shared covariance structure to mitigate small sample estimation problems associated with data in high dimensional spaces. We construct a GM-3DMM, the training of which involves a multiple cohort dataset, SURREY-JNU, comprising 942 3D face scans of people with mixed backgrounds. Experiments in fitting the GM-3DMM to 2D face images to facilitate their geometric and photometric normalisation for pose and illumination invariant face recognition demonstrate the merits of the proposed mixture of Gaussians 3D face model.

    X Song, Zhenhua Feng, G Hu, Josef Kittler, X-J Wu (2018)Dictionary Integration using 3D Morphable Face Models for Pose-invariant Collaborative-representation-based Classification, In: IEEE Transactions on Information Forensics & Security13(11)pp. 2734-2745 IEEE

    The paper presents a dictionary integration algorithm using 3D morphable face models (3DMM) for poseinvariant collaborative-representation-based face classification. To this end, we first fit a 3DMM to the 2D face images of a dictionary to reconstruct the 3D shape and texture of each image. The 3D faces are used to render a number of virtual 2D face images with arbitrary pose variations to augment the training data, by merging the original and rendered virtual samples to create an extended dictionary. Second, to reduce the information redundancy of the extended dictionary and improve the sparsity of reconstruction coefficient vectors using collaborative-representation-based classification (CRC), we exploit an on-line class elimination scheme to optimise the extended dictionary by identifying the training samples of the most representative classes for a given query. The final goal is to perform pose-invariant face classification using the proposed dictionary integration method and the on-line pruning strategy under the CRC framework. Experimental results obtained for a set of well-known face datasets demonstrate the merits of the proposed method, especially its robustness to pose variations.

    Cong Hu, Zhen-Hua Feng, Xiao-Jun Wu, Josef Kittler (2020)Dual Encoder-Decoder based Generative Adversarial Networks for Disentangled Facial Representation Learning, In: IEEE Access8pp. 130-159 Institute of Electrical and Electronics Engineers

    To learn disentangled representations of facial images, we present a Dual Encoder-Decoder based Generative Adversarial Network (DED-GAN). In the proposed method, both the generator and discriminator are designed with deep encoder-decoder architectures as their backbones. To be more specific, the encoder-decoder structured generator is used to learn a pose disentangled face representation, and the encoder-decoder structured discriminator is tasked to perform real/fake classification, face reconstruction, determining identity and estimating face pose. We further improve the proposed network architecture by minimizing the additional pixel-wise loss defined by the Wasserstein distance at the output of the discriminator so that the adversarial framework can be better trained. Additionally, we consider face pose variation to be continuous, rather than discrete in existing literature, to inject richer pose information into our model. The pose estimation task is formulated as a regression problem, which helps to disentangle identity information from pose variations. The proposed network is evaluated on the tasks of pose-invariant face recognition (PIFR) and face synthesis across poses. An extensive quantitative and qualitative evaluation carried out on several controlled and in-the-wild benchmarking datasets demonstrates the superiority of the proposed DED-GAN method over the state-of-the-art approaches.

    Zhenhua Feng, Josef Kittler, Xiaojun Wu (2019)Mining Hard Augmented Samples for Robust Facial Landmark Localisation with CNNs, In: IEEE Signal Processing Letters26(3)pp. 450-454 Institute of Electrical and Electronics Engineers (IEEE)

    Effective data augmentation is crucial for facial landmark localisation with Convolutional Neural Networks (CNNs). In this letter, we investigate different data augmentation techniques that can be used to generate sufficient data for training CNN-based facial landmark localisation systems. To the best of our knowledge, this is the first study that provides a systematic analysis of different data augmentation techniques in the area. In addition, an online Hard Augmented Example Mining (HAEM) strategy is advocated for further performance boosting. We examine the effectiveness of those techniques using a regression-based CNN architecture. The experimental results obtained on the AFLW and COFW datasets demonstrate the importance of data augmentation and the effectiveness of HAEM. The performance achieved using these techniques is superior to the state-of-the-art algorithms.

    Xiaoning Song, Guosheng Hu, Jian-Hao Luo, Zhenhua Feng, Dong-Jun Yu, Xiao-Jun Wu (2019)Fast SRC using quadratic optimisation in downsized coefficient solution subspace, In: Signal Processing161pp. 101-110 Elsevier

    Extended sparse representation-based classifcation (ESRC) has shown interesting results on the problem of undersampled face recognition by generating an auxiliary intraclass variant dictionary for the representation of possible appearance variations. However, the method has high computational complexity due to the l1-minimization problem. To address this issue, this paper proposes two strategies to speed up SRC using quadratic optimisation in downsized coefient solution subspace. The frst one, namely Fast SRC using Quadratic Optimisation (FSRC-QO), applies PCA and LDA hybrid constrained optimisation method to achieve compressed linear representations of test samples. By design, more accurate and discriminative reconstruction of a test sample can be achieved for face classifcation, using the downsized coefficient space. Secondly, to explore the positive impact of our proposed method on deep-learning-based face classifcation, we enhance FSRC-QO using CNN-based features (FSRC-QO-CNN), in which we replace the original input image using robust CNN features in our FSRC-QO framework. Experimental results conducted on a set of well known face datasets, including AR, FERET, LFW and FRGC, demonstrate the merits of the proposed methods, especially in computational efficiency.

    Syed Khalid, Muhammad Awais, Zhenhua Feng, Chi-Ho Chan, Ammarah Farooq, Josef Kittler (2020)Resolution Invariant Face Recognition using a Distillation Approach, In: IEEE Transactions on Biometrics, Behavior, and Identity Science Institute of Electrical and Electronics Engineers

    Modern face recognition systems extract face representations using deep neural networks (DNNs) and give excellent identification and verification results, when tested on high resolution (HR) images. However, the performance of such an algorithm degrades significantly for low resolution (LR) images. A straight forward solution could be to train a DNN, using simultaneously, high and low resolution face images. This approach yields a definite improvement at lower resolutions but suffers a performance degradation for high resolution images. To overcome this shortcoming, we propose to train a network using both HR and LR images under the guidance of a fixed network, pretrained on HR face images. The guidance is provided by minimising the KL-divergence between the output Softmax probabilities of the pretrained (i.e., Teacher) and trainable (i.e., Student) network as well as by sharing the Softmax weights between the two networks. The resulting solution is tested on down-sampled images from FaceScrub and MegaFace datasets and shows a consistent performance improvement across various resolutions. We also tested our proposed solution on standard LR benchmarks such as TinyFace and SCFace. Our algorithm consistently outperforms the state-of-the-art methods on these datasets, confirming the effectiveness and merits of the proposed method.

    Zhen-Hua Feng, Josef Kittler, Bill Christmas, Xiao-Jun Wu (2019)A Unified Tensor-based Active Appearance Model, In: ACM Transactions on Multimedia Computing, Communications and Applications Association for Computing Machinery (ACM)

    Appearance variations result in many difficulties in face image analysis. To deal with this challenge, we present a Unified Tensor-based Active Appearance Model (UT-AAM) for jointly modelling the geometry and texture information of 2D faces. For each type of face information, namely shape and texture, we construct a unified tensor model capturing all relevant appearance variations. This contrasts with the variation-specific models of the classical tensor AAM. To achieve the unification across pose variations, a strategy for dealing with self-occluded faces is proposed to obtain consistent shape and texture representations of pose-varied faces. In addition, our UT-AAM is capable of constructing the model from an incomplete training dataset, using tensor completion methods. Last, we use an effective cascaded-regression-based method for UT-AAM fitting. With these advancements, the utility of UT-AAM in practice is considerably enhanced. As an example, we demonstrate the improvements in training facial landmark detectors through the use of UT-AAM to synthesise a large number of virtual samples. Experimental results obtained on a number of well-known face datasets demonstrate the merits of the proposed approach.

    Xue-Feng Zhu, Xiao-Jun Wu, Tianyang Xu, Zhenhua Feng, Josef Kittler (2020)Complementary Discriminative Correlation Filters Based on Collaborative Representation for Visual Object Tracking, In: IEEE Transactions on Circuits and Systems for Video Technology Institute of Electrical and Electronics Engineers

    In recent years, discriminative correlation filter (DCF) based algorithms have significantly advanced the state of the art in visual object tracking. The key to the success of DCF is an efficient discriminative regression model trained with powerful multi-cue features, including both hand-crafted and deep neural network features. However, the tracking performance is hindered by their inability to respond adequately to abrupt target appearance variations. This issue is posed by the limited representation capability of fixed image features. In this work, we set out to rectify this shortcoming by proposing a complementary representation of a visual content. Specifically, we propose the use of a collaborative representation between successive frames to extract the dynamic appearance information from a target with rapid appearance changes, which results in suppressing the undesirable impact of the background. The resulting collaborative representation coefficients are combined with the original feature maps using a spatially regularised DCF framework for performance boosting. The experimental results on several benchmarking datasets demonstrate the effectiveness and robustness of the proposed method, as compared with a number of state-of-the-art tracking algorithms.

    Recently, word enhancement has become very popular for Chinese Named Entity Recognition (NER), reducing segmentation errors and increasing the semantic and boundary information of Chinese words. However, these methods tend to ignore the information of the Chi-nese character structure after integrating the lexical information. Chinese characters have evolved from pictographs since ancient times, and their structure often reflects more information about the characters. This paper presents a novel Multi-metadata Embedding based Cross-Transformer (MECT) to improve the performance of Chinese NER by fusing the structural information of Chinese characters. Specifically , we use multi-metadata embedding in a two-stream Transformer to integrate Chinese character features with the radical-level embedding. With the structural characteristics of Chinese characters, MECT can better capture the semantic information of Chinese characters for NER. The experimental results obtained on several well-known benchmarking datasets demonstrate the merits and superiority of the proposed MECT method.

    ALI AKBARI, Muhammad Awais, AMMARAH FAROOQ, JOSEF VACLAV KITTLER (2020)A Flatter Loss for Bias Mitigation in Cross-dataset Facial Age Estimation

    —Existing studies in facial age estimation have mostly focused on intra-dataset protocols that assume training and test images captured under similar conditions. However, this is rarely valid in practical applications, where training and test sets usually have different characteristics. In this paper, we advocate a cross-dataset protocol for age estimation benchmarking. In order to improve the cross-dataset age estimation performance, we mitigate the inherent bias caused by the learning algorithm itself. To this end, we propose a novel loss function that is more effective for neural network training. The relative smoothness of the proposed loss function is its advantage with regards to the optimisation process performed by stochastic gradient descent. Its lower gradient, compared with existing loss functions, facilitates the discovery of and convergence to a better optimum, and consequently a better generalisation. The crossdataset experimental results demonstrate the superiority of the proposed method over the state-of-the-art algorithms in terms of accuracy and generalisation capability.

    P Huber, Z Feng, WJ Christmas, J Kittler, M Raetsch (2015)Fitting 3D Morphable Models using Local Features, In: 2015 IEEE International Conference on Image Processing ICIP 2015 Proceedingspp. 1195-1199

    In this paper, we propose a novel fitting method that uses local image features to fit a 3D Morphable Face Model to 2D images. To overcome the obstacle of optimising a cost function that contains a non-differentiable feature extraction operator, we use a learning-based cascaded regression method that learns the gradient direction from data. The method allows to simultaneously solve for shape and pose parameters. Our method is thoroughly evaluated on Morphable Model generated data and first results on real data are presented. Compared to traditional fitting methods, which use simple raw features like pixel colour or edge maps, local features have been shown to be much more robust against variations in imaging conditions. Our approach is unique in that we are the first to use local features to fit a 3D Morphable Model. Because of the speed of our method, it is applicable for real-time applications. Our cascaded regression framework is available as an open source library at github.com/patrikhuber/superviseddescent.

    Tianyang Xu, Zhenhua Feng, Xiao-Jun Wu, Josef Kittler (2021)Adaptive Channel Selection for Robust Visual Object Tracking with Discriminative Correlation Filters, In: International Journal of Computer Vision129(5)pp. 1359-1375 Springer Nature

    Discriminative Correlation Filters (DCF) have been shown to achieve impressive performance in visual object tracking. However, existing DCF-based trackers rely heavily on learning regularised appearance models from invariant image feature representations. To further improve the performance of DCF in accuracy and provide a parsimonious model from the attribute perspective, we propose to gauge the relevance of multi-channel features for the purpose of channel selection. This is achieved by assessing the information conveyed by the features of each channel as a group, using an adaptive group elastic net inducing independent sparsity and temporal smoothness on the DCF solution. The robustness and stability of the learned appearance model are significantly enhanced by the proposed method as the process of channel selection performs implicit spatial regularisation. We use the augmented Lagrangian method to optimise the discriminative filters efficiently. The experimental results obtained on a number of well-known benchmarking datasets demonstrate the effectiveness and stability of the proposed method. A superior performance over the state-of-the-art trackers is achieved using less than $$10\%$$ 10 % deep feature channels.

    Yahang Wang, Xiaoning Song, TIANYANG XU, ZHENHUA FENG, Xiao-Jun Wu (2021)From RGB to Depth: Domain Transfer Network for Face Anti-Spoofing, In: IEEE Transactions on Information Forensics and Security IEEE

    —With the rapid development in face recognition, most of the existing systems can perform very well in uncon-strained scenarios. However, it is still a very challenging task to detect face spoofing attacks, thus face anti-spoofing has become one of the most important research topics in the community. Though various anti-spoofing models have been proposed, the generalisation capability of these models usually degrades for unseen attacks in the presence of challenging appearance variations , e.g., background, illumination, diverse spoofing materials and low image quality. To address this issue, we propose to use a Generative Adversarial Network (GAN) that transfers an input face image from the RGB domain to the depth domain. The generated depth clue enables biometric preservation against challenging appearance variations and diverse image qualities. To be more specific, the proposed method has two main stages. The first one is a GAN-based domain transfer module that converts an input image to its corresponding depth map. By design, a live face image should be transferred to a depth map whereas a spoofing face image should be transferred to a plain (black) image. The aim is to improve the discriminative capability of the proposed system. The second stage is a classification model that determines whether an input face image is live or spoofing. Benefit from the use of the GAN-based domain transfer module, the latent variables can effectively represent the depth information, complementarily enhancing the discrimination of the original RGB features. The experimental results obtained on several benchmarking datasets demonstrate the effectiveness of the proposed method, with superior performance over the state-of-the-art methods. The source code of the proposed method is publicly available at https://github.com/coderwangson/DFA. Index Terms—Face anti-spoofing, generative adversarial network , domain transfer.

    Tianyang Xu, Zhen-Hua Feng, Xiao-Jun Wu, Josef Kittler (2020)An accelerated correlation filter tracker, In: Pattern recognition102107172 Elsevier Ltd

    •A formulation of the DCF design problem which focuses on informative feature channels and spatial structures by means of novel regularisation.•A proposed relaxed optimisation algorithm referred to as R_A-ADMM for optimising the regularised DCF. In contrast with the standard ADMM, the algorithm achieves a better convergence rate.•A temporal smoothness constraint, implemented by an adaptive initialisation mechanism, to achieve further speed up via transfer learning among video frames.•The proposed adoption of AlexNet to construct a light-weight deep representation with a tracking accuracy comparable to more complicated deep networks, such as VGG and ResNet.•An extensive evaluation of the proposed methodology on several well-known visual object tracking datasets, with the results confirming the acceleration gains for the regularised DCF paradigm.

    Tianyang Xu, Zhen-Hua Feng, Xiao-Jun Wu, Josef Kittler (2019)Joint Group Feature Selection and Discriminative Filter Learning for Robust Visual Object Tracking, In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV)pp. 7949-7959 IEEE

    We propose a new Group Feature Selection method for Discriminative Correlation Filters (GFS-DCF) based visual object tracking. The key innovation of the proposed method is to perform group feature selection across both channel and spatial dimensions, thus to pinpoint the structural relevance of multi-channel features to the filtering system. In contrast to the widely used spatial regularisation or feature selection methods, to the best of our knowledge, this is the first time that channel selection has been advocated for DCF-based tracking. We demonstrate that our GFS-DCF method is able to significantly improve the performance of a DCF tracker equipped with deep neural network features. In addition, our GFS-DCF enables joint feature selection and filter learning, achieving enhanced discrimination and interpretability of the learned filters. To further improve the performance, we adaptively integrate historical information by constraining filters to be smooth across temporal frames, using an efficient low-rank approximation. By design, specific temporal-spatial-channel configurations are dynamically learned in the tracking process, highlighting the relevant features, and alleviating the performance degrading impact of less discriminative representations and reducing information redundancy. The experimental results obtained on OTB2013, OTB2015, VOT2017, VOT2018 and TrackingNet demonstrate the merits of our GFS-DCF and its superiority over the state-of-the-art trackers. The code is publicly available at \url{https://github.com/XU-TIANYANG/GFS-DCF}.

    Tianyang Xu, Zhen-hua Feng, Xiao-Jun Wu, Josef Kittler (2019)Learning Adaptive Discriminative Correlation Filters via Temporal Consistency preserving Spatial Feature Selection for Robust Visual Object Tracking, In: IEEE Transactions on Image Processing28(11)pp. 5596-5609 Institute of Electrical and Electronics Engineers (IEEE)

    With efficient appearance learning models, Discriminative Correlation Filter (DCF) has been proven to be very successful in recent video object tracking benchmarks and competitions. However, the existing DCF paradigm suffers from two major issues, i.e., spatial boundary effect and temporal filter degradation. To mitigate these challenges, we propose a new DCF-based tracking method. The key innovations of the proposed method include adaptive spatial feature selection and temporal consistent constraints, with which the new tracker enables joint spatial-temporal filter learning in a lower dimensional discriminative manifold. More specifically, we apply structured spatial sparsity constraints to multi-channel filers. Consequently, the process of learning spatial filters can be approximated by the lasso regularisation. To encourage temporal consistency, the filter model is restricted to lie around its historical value and updated locally to preserve the global structure in the manifold. Last, a unified optimisation framework is proposed to jointly select temporal consistency preserving spatial features and learn discriminative filters with the augmented Lagrangian method. Qualitative and quantitative evaluations have been conducted on a number of well-known benchmarking datasets such as OTB2013, OTB50, OTB100, Temple-Colour, UAV123 and VOT2018. The experimental results demonstrate the superiority of the proposed method over the state-of-the-art approaches.

    Zhenhua Feng, Josef Kittler, Muhammad Awais, Xiao-Jun Wu (2019)Rectified Wing Loss for Efficient and Robust Facial Landmark Localisation with Convolutional Neural Networks., In: International Journal of Computer Visionpp. 1-20 Springer

    Efficient and robust facial landmark localisation is crucial for the deployment of real-time face analysis systems. This paper presents a new loss function, namely Rectified Wing (RWing) loss, for regression-based facial landmark localisation with Convolutional Neural Networks (CNNs). We first systemically analyse different loss functions, including L2, L1 and smooth L1. The analysis suggests that the training of a network should pay more attention to small-medium errors. Motivated by this finding, we design a piece-wise loss that amplifies the impact of the samples with small-medium errors. Besides, we rectify the loss function for very small errors to mitigate the impact of inaccuracy of manual annotation. The use of our RWing loss boosts the performance significantly for regression-based CNNs in facial landmarking, especially for lightweight network architectures. To address the problem of under-representation of samples with large pose variations, we propose a simple but effective boosting strategy, referred to as pose-based data balancing. In particular, we deal with the data imbalance problem by duplicating the minority training samples and perturbing them by injecting random image rotation, bounding box translation and other data augmentation strategies. Last, the proposed approach is extended to create a coarse-to-fine framework for robust and efficient landmark localisation. Moreover, the proposed coarse-to-fine framework is able to deal with the small sample size problem effectively. The experimental results obtained on several well-known benchmarking datasets demonstrate the merits of our RWing loss and prove the superiority of the proposed method over the state-of-the-art approaches.

    Michael Danner, Matthias Raetsch, Patrik Huber, Muhammad Awais, Zhenhua Feng, Josef Kittler (2020)Texture-based 3D Face Recognition using Deep Neural Networks for unconstrained Human-Machine Interaction, In: Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications VISAPP 2020 SCITEPRESS

    3D assisted 2D face recognition involves the process of reconstructing 3D faces from 2D images and solving the problem of face recognition in 3D. To facilitate the use of deep neural networks, a 3D face, normally represented as a 3D mesh of vertices and its corresponding surface texture, is remapped to image-like square isomaps by a conformal mapping. Based on previous work, we assume that face recognition benefits more from texture. In this work, we focus on the surface texture and its discriminatory information content for recognition purposes. Our approach is to prepare a 3D mesh, the corresponding surface texture and the original 2D image as triple input for the recognition network, to show that 3D data is useful for face recognition. Texture enhancement methods to control the texture fusion process are introduced and we adapt data augmentation methods. Our results show that texture-map-based face recognition can not only compete with state-of-the-art systems under the same preconditions but also outperforms standard 2D methods from recent years.

    Zhenhua Feng, Josef Kittler (2018)Advances in facial landmark detection, In: Biometric Technology Today2018(3)pp. 8-11 Elsevier

    In recent years, facial landmark detection – also known as face alignment or facial landmark localisation – has become a very active area, due to its importance to a variety of image and video-based face analysis systems, such as face recognition, emotion analysis, human-computer interaction and 3D face reconstruction. This article looks at the challenges and latest technology advances in facial landmarks.

    Tianyang Xu, Zhen-Hua Feng, Xiao-Jun Wu, Josef Kittler (2019)Learning Low-rank and Sparse Discriminative Correlation Filters for Coarse-to-Fine Visual Object Tracking, In: IEEE Transactions on Circuits and Systems for Video Technologypp. 1-13 Institute of Electrical and Electronics Engineers (IEEE)

    Discriminative correlation filter (DCF) has achieved advanced performance in visual object tracking with remarkable efficiency guaranteed by its implementation in the frequency domain. However, the effect of the structural relationship of DCF and object features has not been adequately explored in the context of the filter design. To remedy this deficiency, this paper proposes a Low-rank and Sparse DCF (LSDCF) that improves the relevance of features used by discriminative filters. To be more specific, we extend the classical DCF paradigm from ridge regression to lasso regression, and constrain the estimate to be of low-rank across frames, thus identifying and retaining the informative filters distributed on a low-dimensional manifold. To this end, specific temporal-spatial-channel configurations are adaptively learned to achieve enhanced discrimination and interpretability. In addition, we analyse the complementary characteristics between hand-crafted features and deep features, and propose a coarse-to-fine heuristic tracking strategy to further improve the performance of our LSDCF. Last, the augmented Lagrange multiplier optimisation method is used to achieve efficient optimisation. The experimental results obtained on a number of well-known benchmarking datasets, including OTB2013, OTB50, OTB100, TC128, UAV123, VOT2016 and VOT2018, demonstrate the effectiveness and robustness of the proposed method, delivering outstanding performance compared to the state-of-the-art trackers.

    Zhenhua Feng, Josef Kittler, M Awais, Patrik Huber, X-J Wu (2018)Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks, In: Proceedings of CVPR 2018pp. 2235-2245 IEEE

    We present a new loss function, namely Wing loss, for robust facial landmark localisation with Convolutional Neural Networks (CNNs). We first compare and analyse different loss functions including L2, L1 and smooth L1. The analysis of these loss functions suggests that, for the training of a CNN-based localisation model, more attention should be paid to small and medium range errors. To this end, we design a piece-wise loss function. The new loss amplifies the impact of errors from the interval (-w, w) by switching from L1 loss to a modified logarithm function. To address the problem of under-representation of samples with large out-of-plane head rotations in the training set, we propose a simple but effective boosting strategy, referred to as pose-based data balancing. In particular, we deal with the data imbalance problem by duplicating the minority training samples and perturbing them by injecting random image rotation, bounding box translation and other data augmentation approaches. Last, the proposed approach is extended to create a two-stage framework for robust facial landmark localisation. The experimental results obtained on AFLW and 300W demonstrate the merits of the Wing loss function, and prove the superiority of the proposed method over the state-of-the-art approaches.

    G Hu, Fei Yan, Josef Kittler, William Christmas, Chi Ho Chan, Zhenhua Feng, Patrik Huber (2017)Efficient 3D Morphable Face Model Fitting, In: Pattern Recognition67pp. 366-379 Elsevier

    3D face reconstruction of shape and skin texture from a single 2D image can be performed using a 3D Morphable Model (3DMM) in an analysis-by-synthesis approach. However, performing this reconstruction (fitting) efficiently and accurately in a general imaging scenario is a challenge. Such a scenario would involve a perspective camera to describe the geometric projection from 3D to 2D, and the Phong model to characterise illumination. Under these imaging assumptions the reconstruction problem is nonlinear and, consequently, computationally very demanding. In this work, we present an efficient stepwise 3DMM-to-2D image-fitting procedure, which sequentially optimises the pose, shape, light direction, light strength and skin texture parameters in separate steps. By linearising each step of the fitting process we derive closed-form solutions for the recovery of the respective parameters, leading to efficient fitting. The proposed optimisation process involves all the pixels of the input image, rather than randomly selected subsets, which enhances the accuracy of the fitting. It is referred to as Efficient Stepwise Optimisation (ESO). The proposed fitting strategy is evaluated using reconstruction error as a performance measure. In addition, we demonstrate its merits in the context of a 3D-assisted 2D face recognition system which detects landmarks automatically and extracts both holistic and local features using a 3DMM. This contrasts with most other methods which only report results that use manual face landmarking to initialise the fitting. Our method is tested on the public CMU-PIE and Multi-PIE face databases, as well as one internal database. The experimental results show that the face reconstruction using ESO is significantly faster, and its accuracy is at least as good as that achieved by the existing 3DMM fitting algorithms. A face recognition system integrating ESO to provide a pose and illumination invariant solution compares favourably with other state-of-the-art methods. In particular, it outperforms deep learning methods when tested on the Multi-PIE database.

    Z Huang, Zhenhua Feng, Fei Yan, Josef Kittler, X-J Wu (2018)Robust Pedestrian Detection for Semi-automatic Construction of A Crowded Person Re-Identification Dataset, In: LNCS10945pp. 63-72 Springer Verlag

    The problem of re-identification of people in a crowd com- monly arises in real application scenarios, yet it has received less atten- tion than it deserves. To facilitate research focusing on this problem, we have embarked on constructing a new person re-identification dataset with many instances of crowded indoor and outdoor scenes. This paper proposes a two-stage robust method for pedestrian detection in a complex crowded background to provide bounding box annotations. The first stage is to generate pedestrian proposals using Faster R-CNN and locate each pedestrian using Non-maximum Suppression (NMS). Candidates in dense proposal regions are merged to identify crowd patches. We then apply a bottom-up human pose estimation method to detect individual pedestrians in the crowd patches. The locations of all subjects are achieved based on the bounding boxes from the two stages. The identity of the detected subjects throughout each video is then automatically annotated using multiple features and spatial-temporal clues. The experimental results on a crowded pedestrians dataset demonstrate the effectiveness and efficiency of the proposed method.

    Xiaoning Song, Yao Chen, Zhenhua Feng, Guosheng Hu, Dong-Jun Yu, Xiao-Jun Wu (2020)SP-GAN: Self-growing and Pruning Generative Adversarial Networks, In: IEEE Transactions on Neural Networks and Learning Systems Institute of Electrical and Electronics Engineers

    This paper presents a new Self-growing and Pruning Generative Adversarial Network (SP-GAN) for realistic image generation. In contrast to traditional GAN models, our SPGAN is able to dynamically adjust the size and architecture of a network in the training stage, by using the proposed selfgrowing and pruning mechanisms. To be more specific, we first train two seed networks as the generator and discriminator, each only contains a small number of convolution kernels. Such small-scale networks are much easier and faster to train than large-capacity networks. Second, in the self-growing step,we replicate the convolution kernels of each seed network to augment the scale of the network, followed by fine-tuning the augmented/expanded network. More importantly, to prevent the excessive growth of each seed network in the self-growing stage, we propose a pruning strategy that reduces the redundancy of an augmented network, yielding the optimal scale of the network. Last, we design a new adaptive loss function that is treated as a variable loss computational process for the training of the proposed SP-GAN model. By design, the hyperparameters of the loss function can dynamically adapt to different training stages. Experimental results obtained on a set of datasets demonstrate the merits of the proposed method, especially in terms of the stability and efficiency of network training. The source code of the proposed SP-GAN method is publicly available at https://github.com/Lambert-chen/SPGAN.git.

    X Song, Z Feng, G Hu (2017)Half-Face Dictionary Integration for Representation-Based Classification, In: IEEE Transactions on Cybernetics47(1)pp. 142-152 IEEE

    This paper presents a half-face dictionary integration (HFDI) algorithm for representation-based classification. The proposed HFDI algorithm measures residuals between an input signal and the reconstructed one, using both the original and the synthesized dual-column (row) half-face training samples. More specifically, we first generate a set of virtual half-face samples for the purpose of training data augmentation. The aim is to obtain high-fidelity collaborative representation of a test sample. In this half-face integrated dictionary, each original training vector is replaced by an integrated dual-column (row) half-face matrix. Second, to reduce the redundancy between the original dictionary and the extended half-face dictionary, we propose an elimination strategy to gain the most robust training atoms. The last contribution of the proposed HFDI method is the use of a competitive fusion method weighting the reconstruction residuals from different dictionaries for robust face classification. Experimental results obtained from the Facial Recognition Technology, Aleix and Robert, Georgia Tech, ORL, and Carnegie Mellon University-pose, illumination and expression data sets demonstrate the effectiveness of the proposed method, especially in the case of the small sample size problem.

    J Kittler, P Huber, Z Feng, G Hu, W Christmas (2016)3D Morphable Face Models and Their Applications, In: Lecture Notes in Computer Science (LNCS) vol.9756: 9th International Conference, AMDO 2016, Palma de Mallorca, Spain, July 13-15, 2016, Proceedings9756pp. 185-206

    3D Morphable Face Models (3DMM) have been used in face recognition for some time now. They can be applied in their own right as a basis for 3D face recognition and analysis involving 3D face data. However their prevalent use over the last decade has been as a versatile tool in 2D face recognition to normalise pose, illumination and expression of 2D face images. A 3DMM has the generative capacity to augment the training and test databases for various 2D face processing related tasks. It can be used to expand the gallery set for pose-invariant face matching. For any 2D face image it can furnish complementary information, in terms of its 3D face shape and texture. It can also aid multiple frame fusion by providing the means of registering a set of 2D images. A key enabling technology for this versatility is 3D face model to 2D face image fitting. In this paper recent developments in 3D face modelling and model fitting will be overviewed, and their merits in the context of diverse applications illustrated on several examples, including pose and illumination invariant face recognition, and 3D face reconstruction from video.

    X Song, Zhen-Hua Feng, G Hunt, X Yang, J Yang, Y Qi (2015)Progressive sparse representation-based classification using local discrete cosine transform evaluation for image recognition, In: Journal of Electronic Imaging24(5) SPIE

    This paper proposes a progressive sparse representation-based classification algorithm using local discrete cosine transform (DCT) evaluation to perform face recognition. Specifically, the sum of the contributions of all training samples of each subject is first taken as the contribution of this subject, then the redundant subject with the smallest contribution to the test sample is iteratively eliminated. Second, the progressive method aims at representing the test sample as a linear combination of all the remaining training samples, by which the representation capability of each training sample is exploited to determine the optimal “nearest neighbors” for the test sample. Third, the transformed DCT evaluation is constructed to measure the similarity between the test sample and each local training sample using cosine distance metrics in the DCT domain. The final goal of the proposed method is to determine an optimal weighted sum of nearest neighbors that are obtained under the local correlative degree evaluation, which is approximately equal to the test sample, and we can use this weighted linear combination to perform robust classification. Experimental results conducted on the ORL database of faces (created by the Olivetti Research Laboratory in Cambridge), the FERET face database (managed by the Defense Advanced Research Projects Agency and the National Institute of Standards and Technology), AR face database (created by Aleix Martinez and Robert Benavente in the Computer Vision Center at U.A.B), and USPS handwritten digit database (gathered at the Center of Excellence in Document Analysis and Recognition at SUNY Buffalo) demonstrate the effectiveness of the proposed method.