My publications

Publications

W Dai, T Xu, Wenwu Wang (2012)Simultaneous codeword optimization (SimCO) for dictionary update and learning, In: IEEE Transactions on Signal Processing60(12)pp. 6340-6353

We consider the data-driven dictionary learning problem. The goal is to seek an over-complete dictionary from which every training signal can be best approximated by a linear combination of only a few codewords. This task is often achieved by iteratively executing two operations: sparse coding and dictionary update. The focus of this paper is on the dictionary update step, where the dictionary is optimized with a given sparsity pattern. We propose a novel framework where an arbitrary set of codewords and the corresponding sparse coefficients are simultaneously updated, hence the term simultaneous codeword optimization (SimCO). The SimCO formulation not only generalizes benchmark mechanisms MOD and K-SVD, but also allows the discovery that singular points, rather than local minima, are the major bottleneck of dictionary update. To mitigate the problem caused by the singular points, regularized SimCO is proposed. First and second order optimization procedures are designed to solve regularized SimCO. Simulations show that regularization substantially improves the performance of dictionary learning. © 1991-2012 IEEE.

Tianyang Xu, Zhenhua Feng, Xiao-Jun Wu, Josef Kittler (2019)Joint Group Feature Selection and Discriminative Filter Learning for Robust Visual Object Tracking, In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2019) Institute of Electrical and Electronics Engineers (IEEE)

We propose a new Group Feature Selection method for Discriminative Correlation Filters (GFS-DCF) based visual object tracking. The key innovation of the proposed method is to perform group feature selection across both channel and spatial dimensions, thus to pinpoint the structural relevance of multi-channel features to the filtering system. In contrast to the widely used spatial regularisation or feature selection methods, to the best of our knowledge, this is the first time that channel selection has been advocated for DCF-based tracking. We demonstrate that our GFS-DCF method is able to significantly improve the performance of a DCF tracker equipped with deep neural network features. In addition, our GFS-DCF enables joint feature selection and filter learning, achieving enhanced discrimination and interpretability of the learned filters. To further improve the performance, we adaptively integrate historical information by constraining filters to be smooth across temporal frames, using an efficient low-rank approximation. By design, specific temporal-spatial-channel configurations are dynamically learned in the tracking process, highlighting the relevant features, and alleviating the performance degrading impact of less discriminative representations and reducing information redundancy. The experimental results obtained on OTB2013, OTB2015, VOT2017, VOT2018 and TrackingNet demonstrate the merits of our GFS-DCF and its superiority over the state-of-the-art trackers. The code is publicly available at https://github.com/XU-TIANYANG/GFS-DCF.

W Dai, T Xu, Wenwu Wang (2012)Dictionary learning and update based on simultaneous codeword optimization (SimCO), In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedingspp. 2037-2040

Dictionary learning aims to adapt elementary codewords directly from training data so that each training signal can be best approximated by a linear combination of only a few codewords. Following the two-stage iterative processes: sparse coding and dictionary update, that are commonly used, for example, in the algorithms of MOD and K-SVD, we propose a novel framework that allows one to update an arbitrary set of codewords and the corresponding sparse coefficients simultaneously, hence termed simultaneous codeword optimization (SimCO). Under this framework, we have developed two algorithms, namely the primitive and the regularized SimCO. Simulations are provided to show the advantages of our approach over the K-SVD algorithm in terms of both learning performance and running speed. © 2012 IEEE.

T Xu, W Wang (2010)A block-based compressed sensing method for underdetermined blind speech separation incorporating binary mask, In: Proceedings of 2010 IEEE International Conference on Acoustics, Speech, and Signal Processingpp. 2022-2025

A block-based compressed sensing approach coupled with binary time-frequency masking is presented for the underdetermined speech separation problem. The proposed algorithm consists of multiple steps. First, the mixed signals are segmented to a number of blocks. For each block, the unknown mixing matrix is estimated in the transform domain by a clustering algorithm. Using the estimated mixing matrix, the sources are recovered by a compressed sensing approach. The coarsely separated sources are then used to estimate the time-frequency binary masks which are further applied to enhance the separation performance. The separated source components from all the blocks are concatenated to reconstruct the whole signal. Numerical experiments are provided to show the improved separation performance of the proposed algorithm, as compared with two recent approaches. The block-based operation has the advantage in improving considerably the computational efficiency of the compressed sensing algorithm without degrading its separation performance.

T Xu, W Wang (2009)A compressed sensing approach for underdetermined blind audio source separation with sparse representation, In: IEEE Workshop on Statistical Signal Processing Proceedingspp. 493-496

The problem of underdetermined blind audio source separation is usually addressed under the framework of sparse signal representation. In this paper, we develop a novel algorithm for this problem based on compressed sensing which is an emerging technique for efficient data reconstruction. The proposed algorithm consists of two stages. The unknown mixing matrix is firstly estimated from the audio mixtures in the transform domain, as in many existing methods, by a K-means clustering algorithm. Different from conventional approaches, in the second stage, the sources are recovered by using a compressed sensing approach. This is motivated by the similarity between the mathematical models adopted in compressed sensing and source separation. Numerical experiments including the comparison with a recent sparse representation approach are provided to show the good performance of the proposed method.

T Xu, W Wang (2011)Methods for learning adaptive dictionary in underdetermined speech separation, In: Proceedings of MLSP2011pp. 1-6

Underdetermined speech separation is a challenging problem that has been studied extensively in recent years. A promising method to this problem is based on the so-called sparse signal representation. Using this technique, we have recently developed a multi-stage algorithm, where the source signals are recovered using a pre-defined dictionary obtained by e.g. the discrete cosine transform (DCT). In this paper, instead of using the pre-defined dictionary, we present three methods for learning adaptive dictionaries for the reconstruction of source signals, and compare their performance with several state-of-the-art speech separation methods. © 2011 IEEE.

Qiang Qian, Xiao-Jun Wu, Josef Kittler, Tianyang Xu (2020)Correlation tracking with implicitly extending search region, In: Visual Computer Springer

Recently, the correlation filters have been successfully applied to visual tracking, but the boundary effect severely restrains their tracking performance. In this paper, to overcome this problem, we propose a correlation tracking framework with implicitly extending search region (TESR) without introducing background noise. The proposed tracking method is a two- stage detection framework. To implicitly extend the search region of the correlation tracking, firstly we add other four search centers except for the original search center in an elegant manner, which is given by the target location in previous frame, so our TESR will totally generate five potential object locations based on these five search centers. Then, an SVM classifier is used to determine the correct target position. We also apply the salient object detection score to regularize the output of the SVM classifier to improve its performance. The experimental results demonstrate that TESR exhibits superior performance in comparison with the state-of-the-art trackers.

Xue-Feng Zhu, Xiao-Jun Wu, Tianyang Xu, Zhenhua Feng, Josef Kittler (2020)Complementary Discriminative Correlation Filters Based on Collaborative Representation for Visual Object Tracking, In: IEEE Transactions on Circuits and Systems for Video Technology Institute of Electrical and Electronics Engineers

In recent years, discriminative correlation filter (DCF) based algorithms have significantly advanced the state of the art in visual object tracking. The key to the success of DCF is an efficient discriminative regression model trained with powerful multi-cue features, including both hand-crafted and deep neural network features. However, the tracking performance is hindered by their inability to respond adequately to abrupt target appearance variations. This issue is posed by the limited representation capability of fixed image features. In this work, we set out to rectify this shortcoming by proposing a complementary representation of a visual content. Specifically, we propose the use of a collaborative representation between successive frames to extract the dynamic appearance information from a target with rapid appearance changes, which results in suppressing the undesirable impact of the background. The resulting collaborative representation coefficients are combined with the original feature maps using a spatially regularised DCF framework for performance boosting. The experimental results on several benchmarking datasets demonstrate the effectiveness and robustness of the proposed method, as compared with a number of state-of-the-art tracking algorithms.

Tianyang Xu, Xiao-Jun Wu, Josef Kittler (2018)Non-negative Subspace Representation Learning Scheme for Correlation Filter Based Tracking, In: 2018 24th International Conference on Pattern Recognition (ICPR)2018-pp. 1888-1893 IEEE

Discriminative correlation filter (DCF) based tracking methods have achieved great success recently. However, the temporal learning scheme in the current paradigm is of a linear recursion form determined by a fixed learning rate which can not adaptively feedback appearance variations. In this paper, we propose a unified non-negative subspace representation constrained leaning scheme for DCF. The subspace is constructed by several templates with auxiliary memory mechanisms. Then the current template is projected onto the subspace to find the non-negative representation and to determine the corresponding template weights. Our learning scheme enables efficient combination of correlation filter and subspace structure. The experimental results on OTB50 demonstrate the effectiveness of our learning formulation.

Tianyang Xu, Zhenhua Feng, Xiao-Jun Wu, Josef Kittler (2021)Adaptive Channel Selection for Robust Visual Object Tracking with Discriminative Correlation Filters, In: International Journal of Computer Vision129(5)pp. 1359-1375 Springer Nature

Discriminative Correlation Filters (DCF) have been shown to achieve impressive performance in visual object tracking. However, existing DCF-based trackers rely heavily on learning regularised appearance models from invariant image feature representations. To further improve the performance of DCF in accuracy and provide a parsimonious model from the attribute perspective, we propose to gauge the relevance of multi-channel features for the purpose of channel selection. This is achieved by assessing the information conveyed by the features of each channel as a group, using an adaptive group elastic net inducing independent sparsity and temporal smoothness on the DCF solution. The robustness and stability of the learned appearance model are significantly enhanced by the proposed method as the process of channel selection performs implicit spatial regularisation. We use the augmented Lagrangian method to optimise the discriminative filters efficiently. The experimental results obtained on a number of well-known benchmarking datasets demonstrate the effectiveness and stability of the proposed method. A superior performance over the state-of-the-art trackers is achieved using less than $$10\%$$ 10 % deep feature channels.

Yahang Wang, Xiaoning Song, TIANYANG XU, ZHENHUA FENG, Xiao-Jun Wu (2021)From RGB to Depth: Domain Transfer Network for Face Anti-Spoofing, In: IEEE Transactions on Information Forensics and Security IEEE

—With the rapid development in face recognition, most of the existing systems can perform very well in uncon-strained scenarios. However, it is still a very challenging task to detect face spoofing attacks, thus face anti-spoofing has become one of the most important research topics in the community. Though various anti-spoofing models have been proposed, the generalisation capability of these models usually degrades for unseen attacks in the presence of challenging appearance variations , e.g., background, illumination, diverse spoofing materials and low image quality. To address this issue, we propose to use a Generative Adversarial Network (GAN) that transfers an input face image from the RGB domain to the depth domain. The generated depth clue enables biometric preservation against challenging appearance variations and diverse image qualities. To be more specific, the proposed method has two main stages. The first one is a GAN-based domain transfer module that converts an input image to its corresponding depth map. By design, a live face image should be transferred to a depth map whereas a spoofing face image should be transferred to a plain (black) image. The aim is to improve the discriminative capability of the proposed system. The second stage is a classification model that determines whether an input face image is live or spoofing. Benefit from the use of the GAN-based domain transfer module, the latent variables can effectively represent the depth information, complementarily enhancing the discrimination of the original RGB features. The experimental results obtained on several benchmarking datasets demonstrate the effectiveness of the proposed method, with superior performance over the state-of-the-art methods. The source code of the proposed method is publicly available at https://github.com/coderwangson/DFA. Index Terms—Face anti-spoofing, generative adversarial network , domain transfer.

Tianyang Xu, Zhen-Hua Feng, Xiao-Jun Wu, Josef Kittler (2020)An accelerated correlation filter tracker, In: Pattern recognition102107172 Elsevier Ltd

•A formulation of the DCF design problem which focuses on informative feature channels and spatial structures by means of novel regularisation.•A proposed relaxed optimisation algorithm referred to as R_A-ADMM for optimising the regularised DCF. In contrast with the standard ADMM, the algorithm achieves a better convergence rate.•A temporal smoothness constraint, implemented by an adaptive initialisation mechanism, to achieve further speed up via transfer learning among video frames.•The proposed adoption of AlexNet to construct a light-weight deep representation with a tracking accuracy comparable to more complicated deep networks, such as VGG and ResNet.•An extensive evaluation of the proposed methodology on several well-known visual object tracking datasets, with the results confirming the acceleration gains for the regularised DCF paradigm.

Tianyang Xu, Zhen-Hua Feng, Xiao-Jun Wu, Josef Kittler (2019)Joint Group Feature Selection and Discriminative Filter Learning for Robust Visual Object Tracking, In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV)pp. 7949-7959 IEEE

We propose a new Group Feature Selection method for Discriminative Correlation Filters (GFS-DCF) based visual object tracking. The key innovation of the proposed method is to perform group feature selection across both channel and spatial dimensions, thus to pinpoint the structural relevance of multi-channel features to the filtering system. In contrast to the widely used spatial regularisation or feature selection methods, to the best of our knowledge, this is the first time that channel selection has been advocated for DCF-based tracking. We demonstrate that our GFS-DCF method is able to significantly improve the performance of a DCF tracker equipped with deep neural network features. In addition, our GFS-DCF enables joint feature selection and filter learning, achieving enhanced discrimination and interpretability of the learned filters. To further improve the performance, we adaptively integrate historical information by constraining filters to be smooth across temporal frames, using an efficient low-rank approximation. By design, specific temporal-spatial-channel configurations are dynamically learned in the tracking process, highlighting the relevant features, and alleviating the performance degrading impact of less discriminative representations and reducing information redundancy. The experimental results obtained on OTB2013, OTB2015, VOT2017, VOT2018 and TrackingNet demonstrate the merits of our GFS-DCF and its superiority over the state-of-the-art trackers. The code is publicly available at \url{https://github.com/XU-TIANYANG/GFS-DCF}.

Tianyang Xu, Zhen-hua Feng, Xiao-Jun Wu, Josef Kittler (2019)Learning Adaptive Discriminative Correlation Filters via Temporal Consistency preserving Spatial Feature Selection for Robust Visual Object Tracking, In: IEEE Transactions on Image Processing28(11)pp. 5596-5609 Institute of Electrical and Electronics Engineers (IEEE)

With efficient appearance learning models, Discriminative Correlation Filter (DCF) has been proven to be very successful in recent video object tracking benchmarks and competitions. However, the existing DCF paradigm suffers from two major issues, i.e., spatial boundary effect and temporal filter degradation. To mitigate these challenges, we propose a new DCF-based tracking method. The key innovations of the proposed method include adaptive spatial feature selection and temporal consistent constraints, with which the new tracker enables joint spatial-temporal filter learning in a lower dimensional discriminative manifold. More specifically, we apply structured spatial sparsity constraints to multi-channel filers. Consequently, the process of learning spatial filters can be approximated by the lasso regularisation. To encourage temporal consistency, the filter model is restricted to lie around its historical value and updated locally to preserve the global structure in the manifold. Last, a unified optimisation framework is proposed to jointly select temporal consistency preserving spatial features and learn discriminative filters with the augmented Lagrangian method. Qualitative and quantitative evaluations have been conducted on a number of well-known benchmarking datasets such as OTB2013, OTB50, OTB100, Temple-Colour, UAV123 and VOT2018. The experimental results demonstrate the superiority of the proposed method over the state-of-the-art approaches.

T Xu, W Wang (2010)Learning Dictionary for Underdetermined Blind Speech Separation Based on Compressed Sensing Method, In: Proc. INSPIRE Conference on Information Representation and Estimation
T Xu, W Wang, W Dai (2013)Sparse coding with adaptive dictionary learning for underdetermined blind speech separation, In: Speech Communication55(3)pp. 432-450

A block-based approach coupled with adaptive dictionary learning is presented for underdetermined blind speech separation. The proposed algorithm, derived as a multi-stage method, is established by reformulating the underdetermined blind source separation problem as a sparse coding problem. First, the mixing matrix is estimated in the transform domain by a clustering algorithm. Then a dictionary is learned by an adaptive learning algorithm for which three algorithms have been tested, including the simultaneous codeword optimization (SimCO) technique that we have proposed recently. Using the estimated mixing matrix and the learned dictionary, the sources are recovered from the blocked mixtures by a signal recovery approach. The separated source components from all the blocks are concatenated to reconstruct the whole signal. The block-based operation has the advantage of improving considerably the computational efficiency of the source recovery process without degrading its separation performance. Numerical experiments are provided to show the competitive separation performance of the proposed algorithm, as compared with the state-of-the-art approaches. Using mutual coherence and sparsity index, the performance of a variety of dictionaries that are applied in underdetermined speech separation is compared and analyzed, such as the dictionaries learned from speech mixtures and ground truth speech sources, as well as those predefined by mathematical transforms such as discrete cosine transform (DCT) and short time Fourier transform (STFT). © 2012 Elsevier B.V. All rights reserved.

Tianyang Xu, Zhen-Hua Feng, Xiao-Jun Wu, Josef Kittler (2019)Learning Low-rank and Sparse Discriminative Correlation Filters for Coarse-to-Fine Visual Object Tracking, In: IEEE Transactions on Circuits and Systems for Video Technologypp. 1-13 Institute of Electrical and Electronics Engineers (IEEE)

Discriminative correlation filter (DCF) has achieved advanced performance in visual object tracking with remarkable efficiency guaranteed by its implementation in the frequency domain. However, the effect of the structural relationship of DCF and object features has not been adequately explored in the context of the filter design. To remedy this deficiency, this paper proposes a Low-rank and Sparse DCF (LSDCF) that improves the relevance of features used by discriminative filters. To be more specific, we extend the classical DCF paradigm from ridge regression to lasso regression, and constrain the estimate to be of low-rank across frames, thus identifying and retaining the informative filters distributed on a low-dimensional manifold. To this end, specific temporal-spatial-channel configurations are adaptively learned to achieve enhanced discrimination and interpretability. In addition, we analyse the complementary characteristics between hand-crafted features and deep features, and propose a coarse-to-fine heuristic tracking strategy to further improve the performance of our LSDCF. Last, the augmented Lagrange multiplier optimisation method is used to achieve efficient optimisation. The experimental results obtained on a number of well-known benchmarking datasets, including OTB2013, OTB50, OTB100, TC128, UAV123, VOT2016 and VOT2018, demonstrate the effectiveness and robustness of the proposed method, delivering outstanding performance compared to the state-of-the-art trackers.

X Zhao, G Zhou, W Dai, T Xu, W Wang (2013)Joint image separation and dictionary learning, In: 2013 18th International Conference on Digital Signal Processing, DSP 2013

Blind source separation (BSS) aims to estimate unknown sources from their mixtures. Methods to address this include the benchmark ICA, SCA, MMCA, and more recently, a dictionary learning based algorithm BMMCA. In this paper, we solve the separation problem by using the recently proposed SimCO optimization framework. Our approach not only allows to unify the two sub-problems emerging in the separation problem, but also mitigates the singularity issue which was reported in the dictionary learning literature. Another unique feature is that only one dictionary is used to sparsely represent the source signals while in the literature typically multiple dictionaries are assumed (one dictionary per source). Numerical experiments are performed and the results show that our scheme significantly improves the performance, especially in terms of the accuracy of the mixing matrix estimation. © 2013 IEEE.