My publications

Publications

Shi Zhiyuan, Yang Yongxin, Hospedales Timothy M, Xiang Tao (2014) Weakly supervised learning of objects, attributes and their associations, Lecture Notes in Computer Science 8690 L (PART 2) pp. 472-487 Springer Verlag
When humans describe images they tend to use combinations of nouns and adjectives, corresponding to objects and their associated attributes respectively. To generate such a description automatically, one needs to model objects, attributes and their associations. Conventional methods require strong annotation of object and attribute locations, making them less scalable. In this paper, we model object-attribute associations from weakly labelled images, such as those widely available on media sharing sites (e.g. Flickr), where only image-level labels (either object or attributes) are given, without their locations and associations. This is achieved by introducing a novel weakly supervised non-parametric Bayesian model. Once learned, given a new image, our model can describe the image, including objects, attributes and their associations, as well as their locations and segmentation. Extensive experiments on benchmark datasets demonstrate that our weakly supervised model performs at par with strongly supervised models on tasks such as image description and retrieval based on object-attribute associations. © 2014 Springer International Publishing.
Fu Yanwei, Yang Yongxin, Hospedales Timothy, Xiang Tao, Gong Shaogang (2014) Transductive multi-label zero-shot learning, Proceedings of the British Machine Vision Conference 2014 BMVA Press
Zero-shot learning has received increasing interest as a means to alleviate the often prohibitive expense of annotating training data for large scale recognition problems. These methods have achieved great success via learning intermediate semantic representations in the form of attributes and more recently, semantic word vectors. However, they have thus far been constrained to the single-label case, in contrast to the growing popularity and importance of more realistic multi-label data. In this paper, for the first time, we investigate and formalise a general framework for multi-label zero-shot learning, addressing the unique challenge therein: how to exploit multi-label correlation at test time with no training data for those classes? In particular, we propose (1) a multi-output deep regression model to project an image into a semantic word space, which explicitly exploits the correlations in the intermediate semantic layer of word vectors; (2) a novel zero-shot learning algorithm for multi-label data that exploits the unique compositionality property of semantic word vector representations; and (3) a transductive learning strategy to enable the regression model learned from seen classes to generalise well to unseen classes. Our zero-shot learning experiments on a number of standard multi-label datasets demonstrate that our method outperforms a variety of baselines. © 2014. The copyright of this document resides with its authors.
Hu Guosheng, Yang Yongxin, Yi Dong, Kittler Josef, Christmas William, Li Stan, Hospedales Timothy (2015) When Face Recognition Meets with Deep Learning: an Evaluation of Convolutional Neural Networks for Face Recognition, Computer Vision Workshop (ICCVW), 2015 IEEE International Conference on pp. 384-392
Deep learning, in particular Convolutional Neural Network (CNN), has achieved promising results in face recognition recently. However, it remains an open question: why CNNs work well and how to design a ?good? architecture. The existing works tend to focus on reporting CNN architectures that work well for face recognition rather than investigate the reason. In this work, we conduct an extensive evaluation of CNN-based face recognition systems (CNN-FRS) on a common ground to make our work easily reproducible. Specifically, we use public database LFW (Labeled Faces in the Wild) to train CNNs, unlike most existing CNNs trained on private databases. We propose three CNN architectures which are the first reported architectures trained using LFW data. This paper quantitatively compares the architectures of CNNs and evaluates the effect of different implementation choices. We identify several useful properties of CNN-FRS. For instance, the dimensionality of the learned features can be significantly reduced without adverse effect on face recognition accuracy. In addition, a traditional metric learning method exploiting CNN-learned features is evaluated. Experiments show two crucial factors to good CNN-FRS performance are the fusion of multiple CNNs and metric learning. To make our work reproducible, source code and models will be made publicly available.
Yu Qian, Yang Yongxin, Liu Feng, Song Yi-Zhe, Xiang Tao, Hospedales Timothy M. (2016) Sketch-a-Net: A Deep Neural Network that Beats Humans, International Journal of Computer Vision 122 (3) pp. 411-425 Springer New York LLC
We propose a deep learning approach to free-hand sketch recognition that achieves state-of-the-art performance, significantly surpassing that of humans. Our superior performance is a result of modelling and exploiting the unique characteristics of free-hand sketches, i.e., consisting of an ordered set of strokes but lacking visual cues such as colour and texture, being highly iconic and abstract, and exhibiting extremely large appearance variations due to different levels of abstraction and deformation. Specifically, our deep neural network, termed Sketch-a-Net has the following novel components: (i) we propose a network architecture designed for sketch rather than natural photo statistics. (ii) Two novel data augmentation strategies are developed which exploit the unique sketch-domain properties to modify and synthesise sketch training data at multiple abstraction levels. Based on this idea we are able to both significantly increase the volume and diversity of sketches for training, and address the challenge of varying levels of sketching detail commonplace in free-hand sketches. (iii) We explore different network ensemble fusion strategies, including a re-purposed joint Bayesian scheme, to further improve recognition performance. We show that state-of-the-art deep networks specifically engineered for photos of natural objects fail to perform well on sketch recognition, regardless whether they are trained using photos or sketches. Furthermore, through visualising the learned filters, we offer useful insights in to where the superior performance of our network comes from. © 2016, Springer Science+Business Media New York.
Song Mingying, Karatutlu Ali, Ali Isma, Ersoy Osman, Zhou Yun, Yang Yongxin, Zhang Yuanpeng, Little William R., Wheeler Ann P., Sapelkin Andrei V. (2017) Spectroscopic super-resolution fluorescence cell imaging using ultra-small Ge quantum dots, Optics Express 25 (4) pp. 4240-4253 Optical Society of America
We demonstrate a spectroscopic imaging based super-resolution approach by separating the overlapping diffraction spots into several detectors during a single scanning period and taking advantage of the size-dependent emission wavelength in nanoparticles. This approach has been tested using off-the-shelf quantum dots (Invitrogen Qdot) and inhouse novel ultra-small (âý¼3 nm) Ge QDs. Furthermore, we developed a method-specific Gaussian fitting and maximum likelihood estimation based on a Matlab algorithm for fast QD localisation. This methodology results in a three-fold improvement in the number of localised QDs compared to non-spectroscopic images. With the addition of advanced ultra-small Ge probes, the number can be improved even further, giving at least 1.5 times improvement when compared to Qdots. Using a standard scanning confocal microscope we achieved a data acquisition rate of 200 ms per image frame. This is an improvement on single molecule localisation super-resolution microscopy where repeated image capture limits the imaging speed, and the size of fluorescence probes limits the possible theoretical localisation resolution. We show that our spectral deconvolution approach has a potential to deliver data acquisition rates on the ms scale thus providing super-resolution in live systems. © 2017, OSA - The Optical Society. All rights reserved.
Pang Kaiyue, Li Ke, Yang Yongxin, Zhang Honggang, Hospedales Timothy M., Xiang Tao, Song Yi-Zhe (2019) Generalising Fine-Grained Sketch-Based Image Retrieval, Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019) pp. 677-686 Institute of Electrical and Electronics Engineers (IEEE)
Fine-grained sketch-based image retrieval (FG-SBIR) addresses matching specific photo instance using free-hand sketch as a query modality. Existing models aim to learn an embedding space in which sketch and photo can be directly compared. While successful, they require instance-level pairing within each coarse-grained category as annotated training data. Since the learned embedding space is
domain-specific, these models do not generalise well across categories. This limits the practical applicability of FGSBIR. In this paper, we identify cross-category generalisation for FG-SBIR as a domain generalisation problem, and propose the first solution. Our key contribution is a novel unsupervised learning approach to model a universal manifold of prototypical visual sketch traits. This manifold can then be used to paramaterise the learning of a sketch/photo representation. Model adaptation to novel categories then becomes automatic via embedding the novel sketch in the manifold and updating the representation and retrieval function accordingly. Experiments on the two largest FG-SBIR datasets, Sketchy and QMUL-Shoe-V2, demonstrate the efficacy of our approach in enabling crosscategory generalisation of FG-SBIR.
Song Jifei, Yang Yongxin, Song Yi-Zhe, Xiang Tao, Hospedales Timothy M. (2019) Generalizable Person Re-identification by Domain-Invariant Mapping Network, Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019) pp. 719-728 Institute of Electrical and Electronics Engineers (IEEE)
We aim to learn a domain generalizable person reidentification (ReID) model. When such a model is trained on a set of source domains (ReID datasets collected from different camera networks), it can be directly applied to any new unseen dataset for effective ReID without any model updating. Despite its practical value in real-world deployments, generalizable ReID has seldom been studied. In this work, a novel deep ReID model termed Domain-Invariant Mapping Network(DIMN) is proposed. DIMN is designed to learn a mapping between a person image and its identity classifier, i.e., it produces a classifier using a single shot. To make the model domain-invariant, we follow a meta-learning pipeline and sample a subset of source domain training tasks during each training episode. However, the model is significantly different from conventional meta-learning methods in that: (1) no model updating is required for the target domain, (2) different training tasks share a memory bank for maintaining both scalability and discrimination ability, and (3) it can be used to match an arbitrary number of identities in a target domain. Extensive experiments on a newly proposed large-scale ReID domain generalization benchmark show that our DIMN significantly outperforms alternative domain generalization or meta-learning methods.