My publications


AYAN DAS, YONGXIN YANG, Timothy M. Hospedales, TAO XIANG, YI-ZHE SONG (2021)Cloud2Curve: Generation and Vectorization of Parametric Sketches

Analysis of human sketches in deep learning has advanced immensely through the use of waypoint-sequences rather than raster-graphic representations. We further aim to model sketches as a sequence of low-dimensional parametric curves. To this end, we propose an inverse graphics framework capable of approximating a raster or waypoint based stroke encoded as a point-cloud with a variable-degree Bezier curve. Building on this module, ´we present Cloud2Curve, a generative model for scalable high-resolution vector sketches that can be trained end-to-end using point-cloud data alone. As a consequence, our model is also capable of deterministic vectorization which can map novel raster or waypoint based sketches to their corresponding high-resolution scalable Bezier equivalent. ´We evaluate the generation and vectorization capabilities of our model on Quick, Draw! and K-MNIST datasets. The analysis of free-hand sketches using deep learning [40] has flourished over the past few years, with sketches now being well analysed from classification [43, 42] and retrieval [27, 12, 4] perspectives. Sketches for digital analysis have always been acquired in two primary modalities - raster (pixel grids) and vector (line segments). Raster sketches have mostly been the modality of choice for sketch recognition and retrieval [43, 27]. However, generative sketch models began to advance rapidly [16] after focusing on vector representations and generating sketches as sequences [7, 37] of waypoints/line segments, similarly to how humans sketch. As a happy byproduct, this paradigm leads to clean and blur-free image generation as opposed to direct raster-graphic generations [30]. Recent works have studied creativity in sketch generation [16], learning to sketch raster photo input images [36], learning efficient

ANEESHAN SAIN, AYAN KUMAR BHUNIA, YONGXIN YANG, TAO XIANG, YI-ZHE SONG (2021)StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval

Sketch-based image retrieval (SBIR) is a cross-modal matching problem which is typically solved by learning a joint embedding space where the semantic content shared between photo and sketch modalities are preserved. However, a fundamental challenge in SBIR has been largely ignored so far, that is, sketches are drawn by humans and considerable style variations exist amongst different users. An effective SBIR model needs to explicitly account for this style diversity, crucially, to generalise to unseen user styles. To this end, a novel style-agnostic SBIR model is proposed. Different from existing models, a cross-modal variational autoencoder (VAE) is employed to explicitly disentangle each sketch into a semantic content part shared with the corresponding photo, and a style part unique to the sketcher. Importantly, to make our model dynamically adaptable to any unseen user styles, we propose to metatrain our cross-modal VAE by adding two style-adaptive components: a set of feature transformation layers to its encoder and a regulariser to the disentangled semantic content latent code. With this meta-learning framework, our model can not only disentangle the cross-modal shared semantic content for SBIR, but can adapt the disentanglement to any unseen user style as well, making the SBIR model truly style-agnostic. Extensive experiments show that our style-agnostic model yields state-of-the-art performance for both category-level and instance-level SBIR.

SEN HE, Wentong Liao, Michael Ying Yang, YONGXIN YANG, YI-ZHE SONG, Bodo Rosenhahn, TAO XIANG (2021)Context-Aware Layout to Image Generation with Enhanced Object Appearance

A layout to image (L2I) generation model aims to generate a complicated image containing multiple objects (things) against natural background (stuff), conditioned on a given layout. Built upon the recent advances in gen-erative adversarial networks (GANs), existing L2I models have made great progress. However, a close inspection of their generated images reveals two major limitations: (1) the object-to-object as well as object-to-stuff relations are often broken and (2) each object's appearance is typically distorted lacking the key defining characteristics associated with the object class. We argue that these are caused by the lack of context-aware object and stuff feature encoding in their generators, and location-sensitive appearance representation in their discriminators. To address these limitations, two new modules are proposed in this work. First, a context-aware feature transformation module is introduced in the generator to ensure that the generated feature encoding of either object or stuff is aware of other co-existing objects/stuff in the scene. Second, instead of feeding location-insensitive image features to the discriminator, we use the Gram matrix computed from the feature maps of the generated object images to preserve location-sensitive information, resulting in much enhanced object appearance. Extensive experiments show that the proposed method achieves state-of-the-art performance on the COCO-Thing-Stuff and Visual Genome benchmarks. Code available at:

AYAN KUMAR BHUNIA, PINAKI NATH CHOWDHURY, YONGXIN YANG, Timothy M. Hospedales, TAO XIANG, YI-ZHE SONG (2021)Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting

Self-supervised learning has gained prominence due to its efficacy at learning powerful representations from un-labelled data that achieve excellent performance on many challenging downstream tasks. However, supervision-free pretext tasks are challenging to design and usually modality specific. Although there is a rich literature of self-supervised methods for either spatial (such as images) or temporal data (sound or text) modalities, a common pretext task that benefits both modalities is largely missing. In this paper, we are interested in defining a self-supervised pretext task for sketches and handwriting data. This data is uniquely characterised by its existence in dual modalities of rasterized images and vector coordinate sequences. We address and exploit this dual representation by proposing two novel cross-modal translation pretext tasks for self-supervised feature learning: Vectorization and Rasteriza-tion. Vectorization learns to map image space to vector coordinates and rasterization maps vector coordinates to image space. We show that our learned encoder modules benefit both raster-based and vector-based downstream approaches to analysing hand-drawn data. Empirical evidence shows that our novel pretext tasks surpass existing single and multi-modal self-supervision methods.

Ayan Kumar Bhunia, PINAKI NATH CHOWDHURY, ANEESHAN SAIN, YONGXIN YANG, TAO XIANG, YI-ZHE SONG (2021)More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval

A fundamental challenge faced by existing Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) models is the data scarcity – model performances are largely bottlenecked by the lack of sketch-photo pairs. Whilst the number of photos can be easily scaled, each corresponding sketch still needs to be individually produced. In this paper, we aim to mitigate such an upper-bound on sketch data, and study whether unlabelled photos alone (of which they are many) can be cultivated for performance gain. In particular, we introduce a novel semi-supervised framework for cross-modal retrieval that can additionally leverage large-scale unla-belled photos to account for data scarcity. At the center of our semi-supervision design is a sequential photo-to-sketch generation model that aims to generate paired sketches for unlabelled photos. Importantly, we further introduce a discriminator-guided mechanism to guide against unfaithful generation, together with a distillation loss-based regu-larizer to provide tolerance against noisy training samples. Last but not least, we treat generation and retrieval as two conjugate problems, where a joint learning procedure is devised for each module to mutually benefit from each other. Extensive experiments show that our semi-supervised model yields a significant performance boost over the state-of-the-art supervised alternatives, as well as existing methods that can exploit unlabelled photos for FG-SBIR.

Zhihe Lu, Yongxin Yang, Xiatian Zhu, Cong Liu, Yi-Zhe Song, Tao Xiang (2020)Stochastic Classifiers for Unsupervised Domain Adaptation, In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)pp. 9108-9117 IEEE

A common strategy adopted by existing state-of-the-art unsupervised domain adaptation (UDA) methods is to employ two classifiers to identify the misaligned local regions between source and target domain. Following the 'wisdom of the crowd' principle, one has to ask: why stop at two? Indeed, we find that using more classifiers leads to better performance, but also introduces more model parameters, therefore risking overfitting. In this paper, we introduce a novel method called STochastic clAssifieRs (STAR) for addressing this problem. Instead of representing one classifier as a weight vector, STAR models it as a Gaussian distribution with its variance representing the inter-classifier discrepancy. With STAR, we can now sample an arbitrary number of classifiers from the distribution, whilst keeping the model size the same as having two classifiers. Extensive experiments demonstrate that a variety of existing UDA methods can greatly benefit from STAR and achieve the state-of-the-art performance on both image classification and semantic segmentation tasks.

Yanwei Fu, Yongxin Yang, Timothy Hospedales, Tao Xiang, Shaogang Gong (2014)Transductive multi-label zero-shot learning, In: Proceedings of the British Machine Vision Conference 2014 BMVA Press

Zero-shot learning has received increasing interest as a means to alleviate the often prohibitive expense of annotating training data for large scale recognition problems. These methods have achieved great success via learning intermediate semantic representations in the form of attributes and more recently, semantic word vectors. However, they have thus far been constrained to the single-label case, in contrast to the growing popularity and importance of more realistic multi-label data. In this paper, for the first time, we investigate and formalise a general framework for multi-label zero-shot learning, addressing the unique challenge therein: how to exploit multi-label correlation at test time with no training data for those classes? In particular, we propose (1) a multi-output deep regression model to project an image into a semantic word space, which explicitly exploits the correlations in the intermediate semantic layer of word vectors; (2) a novel zero-shot learning algorithm for multi-label data that exploits the unique compositionality property of semantic word vector representations; and (3) a transductive learning strategy to enable the regression model learned from seen classes to generalise well to unseen classes. Our zero-shot learning experiments on a number of standard multi-label datasets demonstrate that our method outperforms a variety of baselines. © 2014. The copyright of this document resides with its authors.

Umar Riaz Muhammad, Yongxin Yang, Timothy Hospedales, Tao Xiang, Yi-Zhe Song (2019)Goal-Driven Sequential Data Abstraction, In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV)pp. 71-80 IEEE

Automatic data abstraction is an important capability for both benchmarking machine intelligence and supporting summarization applications. In the former one asks whether a machine can `understand' enough about the meaning of input data to produce a meaningful but more compact abstraction. In the latter this capability is exploited for saving space or human time by summarizing the essence of input data. In this paper we study a general reinforcement learning based framework for learning to abstract sequential data in a goal-driven way. The ability to define different abstraction goals uniquely allows different aspects of the input data to be preserved according to the ultimate purpose of the abstraction. Our reinforcement learning objective does not require human-defined examples of ideal abstraction. Importantly our model processes the input sequence holistically without being constrained by the original input order. Our framework is also domain agnostic -- we demonstrate applications to sketch, video and text data and achieve promising results in all domains.

Tianyuan Yu, Da Li, Yongxin Yang, Timothy Hospedales, Tao Xiang (2019)Robust Person Re-Identification by Modelling Feature Uncertainty, In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV)pp. 552-561 IEEE

We aim to learn deep person re-identification (ReID) models that are robust against noisy training data. Two types of noise are prevalent in practice: (1) label noise caused by human annotator errors and (2) data outliers caused by person detector errors or occlusion. Both types of noise pose serious problems for training ReID models, yet have been largely ignored so far. In this paper, we propose a novel deep network termed DistributionNet for robust ReID. Instead of representing each person image as a feature vector, DistributionNet models it as a Gaussian distribution with its variance representing the uncertainty of the extracted features. A carefully designed loss is formulated in DistributionNet to unevenly allocate uncertainty across training samples. Consequently, noisy samples are assigned large variance/uncertainty, which effectively alleviates their negative impacts on model fitting. Extensive experiments demonstrate that our model is more effective than alternative noise-robust deep models. The source code is available at:

Kaiyang Zhou, Yongxin Yang, Andrea Cavallaro, Tao Xiang (2019)Omni-Scale Feature Learning for Person Re-Identification, In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV)pp. 3701-3711 IEEE

As an instance-level recognition problem, person re-identification (ReID) relies on discriminative features, which not only capture different spatial scales but also encapsulate an arbitrary combination of multiple scales. We callse features of both homogeneous and heterogeneous scales omni-scale features. In this paper, a novel deep ReID CNN is designed, termed Omni-Scale Network (OSNet), for omni-scale feature learning. This is achieved by designing a residual block composed of multiple convolutional feature streams, each detecting features at a certain scale. Importantly, a novel unified aggregation gate is introduced to dynamically fuse multi-scale features with input-dependent channel-wise weights. To efficiently learn spatial-channel correlations and avoid overfitting, the building block uses both pointwise and depthwise convolutions. By stacking such blocks layer-by-layer, our OSNet is extremely lightweight and can be trained from scratch on existing ReID benchmarks. Despite its small model size, our OSNet achieves state-of-the-art performance on six person-ReID datasets. Code and models are available at:

Kaiyang Zhou, Yongxin Yang, Timothy Hospedales, Tao Xiang (2020)Deep Domain-Adversarial Image Generation for Domain Generalisation, In: Proceedings of the ... AAAI Conference on Artificial Intelligence34(7)pp. 13025-13032

Machine learning models typically suffer from the domain shift problem when trained on a source dataset and evaluated on a target dataset of different distribution. To overcome this problem, domain generalisation (DG) methods aim to leverage data from multiple source domains so that a trained model can generalise to unseen domains. In this paper, we propose a novel DG approach based on Deep Domain-Adversarial Image Generation (DDAIG). Specifically, DDAIG consists of three components, namely a label classifier, a domain classifier and a domain transformation network (DoTNet). The goal for DoTNet is to map the source training data to unseen domains. This is achieved by having a learning objective formulated to ensure that the generated data can be correctly classified by the label classifier while fooling the domain classifier. By augmenting the source training data with the generated unseen domain data, we can make the label classifier more robust to unknown domain changes. Extensive experiments on four DG datasets demonstrate the effectiveness of our approach.

Kaiyue Pang, Yongxin Yang, Timothy M Hospedales, Tao Xiang, Yi-Zhe Song (2020)Solving Mixed-Modal Jigsaw Puzzle for Fine-Grained Sketch-Based Image Retrieval, In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)pp. 10344-10352 IEEE

ImageNet pre-training has long been considered crucial by the fine-grained sketch-based image retrieval (FG-SBIR) community due to the lack of large sketch-photo paired datasets for FG-SBIR training. In this paper, we propose a self-supervised alternative for representation pre-training. Specifically, we consider the jigsaw puzzle game of recomposing images from shuffled parts. We identify two key facets of jigsaw task design that are required for effective FG-SBIR pre-training. The first is formulating the puzzle in a mixed-modality fashion. Second we show that framing the optimisation as permutation matrix inference via Sinkhorn iterations is more effective than the common classifier formulation of Jigsaw self-supervision. Experiments show that this self-supervised pre-training strategy significantly outperforms the standard ImageNet-based pipeline across all four product-level FG-SBIR benchmarks. Interestingly it also leads to improved cross-category generalisation across both pre-train/fine-tune and fine-tune/testing stages.

AYAN KUMAR BHUNIA, YONGXIN YANG, T.M. Hospedales, TAO XIANG, Yi-Zhe Song (2020)Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval, In: Proceedings Conference on Computer Vision and Pattern Recognition (CVPR) 2020

Fine-grained sketch-based image retrieval (FG-SBIR) addresses the problem of retrieving a particular photo instance given a user’s query sketch. Its widespread applicability is however hindered by the fact that drawing a sketch takes time, and most people struggle to draw a complete and faithful sketch. In this paper, we reformulate the conventional FG-SBIR framework to tackle these challenges, with the ultimate goal of retrieving the target photo with the least number of strokes possible. We further propose an on-the-fly design that starts retrieving as soon as the user starts drawing. To accomplish this, we devise a reinforcement learning based cross-modal retrieval framework that directly optimizes rank of the ground-truth photo over a complete sketch drawing episode. Additionally, we introduce a novel reward scheme that circumvents the problems related to irrelevant sketch strokes, and thus provides us with a more consistent rank list during the retrieval. We achievesuperiorearly-retrievalefficiencyoverstate-of-theartmethodsandalternativebaselinesontwopubliclyavailable fine-grained sketch retrieval datasets.

Jifei Song, Yongxin Yang, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales (2019)Generalizable Person Re-identification by Domain-Invariant Mapping Network, In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019)pp. 719-728 Institute of Electrical and Electronics Engineers (IEEE)

We aim to learn a domain generalizable person reidentification (ReID) model. When such a model is trained on a set of source domains (ReID datasets collected from different camera networks), it can be directly applied to any new unseen dataset for effective ReID without any model updating. Despite its practical value in real-world deployments, generalizable ReID has seldom been studied. In this work, a novel deep ReID model termed Domain-Invariant Mapping Network(DIMN) is proposed. DIMN is designed to learn a mapping between a person image and its identity classifier, i.e., it produces a classifier using a single shot. To make the model domain-invariant, we follow a meta-learning pipeline and sample a subset of source domain training tasks during each training episode. However, the model is significantly different from conventional meta-learning methods in that: (1) no model updating is required for the target domain, (2) different training tasks share a memory bank for maintaining both scalability and discrimination ability, and (3) it can be used to match an arbitrary number of identities in a target domain. Extensive experiments on a newly proposed large-scale ReID domain generalization benchmark show that our DIMN significantly outperforms alternative domain generalization or meta-learning methods.

Ayan Kumar Bhunia, Ayan Das, Umar Riaz Muhammad, Yongxin Yang, Timothy M. Hospedales, Tao Xiang, Yulia Gryaditskaya, Yi-Zhe Song (2020)Pixelor: A Competitive Sketching AI Agent. So you think you can sketch?, In: ACM Transactions on Graphics39(6) Association for Computing Machinery (ACM)

We present the first competitive drawing agent Pixelor that exhibits human-level performance at a Pictionary-like sketching game, where the participant whose sketch is recognized first is a winner. Our AI agent can autonomously sketch a given visual concept, and achieve a recognizable rendition as quickly or faster than a human competitor. The key to victory for the agent’s goal is to learn the optimal stroke sequencing strategies that generate the most recognizable and distinguishable strokes first. Training Pixelor is done in two steps. First, we infer the stroke order that maximizes early recognizability of human training sketches. Second, this order is used to supervise the training of a sequence-to-sequence stroke generator. Our key technical contributions are a tractable search of the exponential space of orderings using neural sorting; and an improved Seq2Seq Wasserstein (S2S-WAE) generator that uses an optimal-transport loss to accommodate the multi-modal nature of the optimal stroke distribution. Our analysis shows that Pixelor is better than the human players of the Quick, Draw! game, under both AI and human judging of early recognition. To analyze the impact of human competitors’ strategies, we conducted a further human study with participants being given unlimited thinking time and training in early recognizability by feedback from an AI judge. The study shows that humans do gradually improve their strategies with training, but overall Pixelor still matches human performance. The code and the dataset are available at

Conghui Hu, Da Li, Yongxin Yang, Timothy M Hospedales, Yi-Zhe Song (2020)Sketch-a-Segmenter: Sketch-based Photo Segmenter Generation, In: IEEE transactions on image processing29pp. 1-1 IEEE

Given pixel-level annotated data, traditional photo segmentation techniques have achieved promising results. However, these photo segmentation models can only identify objects in categories for which data annotation and training have been carried out. This limitation has inspired recent work on few-shot and zero-shot learning for image segmentation. In this paper, we show the value of sketch for photo segmentation, in particular as a transferable representation to describe a concept to be segmented. We show, for the first time, that it is possible to generate a photo-segmentation model of a novel category using just a single sketch and furthermore exploit the unique fine-grained characteristics of sketch to produce more detailed segmentation. More specifically, we propose a sketch-based photo segmentation method that takes sketch as input and synthesizes the weights required for a neural network to segment the corresponding region of a given photo. Our framework can be applied at both the category-level and the instance-level, and fine-grained input sketches provide more accurate segmentation in the latter. This framework generalizes across categories via sketch and thus provides an alternative to zero-shot learning when segmenting a photo from a category without annotated training data. To investigate the instance-level relationship across sketch and photo, we create the SketchySeg dataset which contains segmentation annotations for photos corresponding to paired sketches in the Sketchy Dataset.

Jiang Zhang, Jun Du, Yongxin Yang, Yi-Zhe Song, Si Wei, Lirong Dai (2020)A Tree-Structured Decoder for Image-to-Markup Generation, In: Journal of Machine Learning Research Microtome Publishing

Recent encoder-decoder approaches typically employ string decoders to convert images into serialized strings for image-to-markup. However, for tree-structured representational markup, string representations can hardly cope with the structural complexity. In this work, we first show via a set of toy problems that string decoders struggle to decode tree structures, especially as structural complexity increases, we then propose a tree-structured decoder that specifically aims at generating a tree-structured markup. Our decoders works sequentially, where at each step a child node and its parent node are simultaneously generated to form a sub-tree. This sub-tree is consequently used to construct the final tree structure in a recurrent manner. Key to the success of our tree decoder is twofold, (i) it strictly respects the parent-child relationship of trees, and (ii) it explicitly outputs trees as oppose to a linear string. Evaluated on both math formula recognition and chemical formula recognition, the proposed tree decoder is shown to greatly outperform strong string decoder baselines.

Aneeshan Sain, Ayan Kumar Bhunia, Yongxin Yang, Tao Xiang, Yi-Zhe Song (2020)Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval, In: Proceedings of The 31st British Machine Vision Virtual Conference (BMVC 2020)pp. 1-14 British Machine Vision Association

Sketch as an image search query is an ideal alternative to text in capturing the finegrained visual details. Prior successes on fine-grained sketch-based image retrieval (FGSBIR) have demonstrated the importance of tackling the unique traits of sketches as opposed to photos, e.g., temporal vs. static, strokes vs. pixels, and abstract vs. pixelperfect. In this paper, we study a further trait of sketches that has been overlooked to date, that is, they are hierarchical in terms of the levels of detail – a person typically sketches up to various extents of detail to depict an object. This hierarchical structure is often visually distinct. In this paper, we design a novel network that is capable of cultivating sketch-specific hierarchies and exploiting them to match sketch with photo at corresponding hierarchical levels. In particular, features from a sketch and a photo are enriched using cross-modal co-attention, coupled with hierarchical node fusion at every level to form a better embedding space to conduct retrieval. Experiments on common benchmarks show our method to outperform state-of-the-arts by a significant margin.

Zhiyuan Shi, Yongxin Yang, Timothy M Hospedales, Tao Xiang (2014)Weakly supervised learning of objects, attributes and their associations, In: Lecture Notes in Computer Science8690 L(PART 2)pp. 472-487 Springer Verlag

When humans describe images they tend to use combinations of nouns and adjectives, corresponding to objects and their associated attributes respectively. To generate such a description automatically, one needs to model objects, attributes and their associations. Conventional methods require strong annotation of object and attribute locations, making them less scalable. In this paper, we model object-attribute associations from weakly labelled images, such as those widely available on media sharing sites (e.g. Flickr), where only image-level labels (either object or attributes) are given, without their locations and associations. This is achieved by introducing a novel weakly supervised non-parametric Bayesian model. Once learned, given a new image, our model can describe the image, including objects, attributes and their associations, as well as their locations and segmentation. Extensive experiments on benchmark datasets demonstrate that our weakly supervised model performs at par with strongly supervised models on tasks such as image description and retrieval based on object-attribute associations. © 2014 Springer International Publishing.

Guosheng Hu, Yongxin Yang, Dong Yi, Josef Kittler, William Christmas, Stan Li, Timothy Hospedales (2015)When Face Recognition Meets with Deep Learning: an Evaluation of Convolutional Neural Networks for Face Recognition, In: Computer Vision Workshop (ICCVW), 2015 IEEE International Conference onpp. 384-392

Deep learning, in particular Convolutional Neural Network (CNN), has achieved promising results in face recognition recently. However, it remains an open question: why CNNs work well and how to design a ‘good’ architecture. The existing works tend to focus on reporting CNN architectures that work well for face recognition rather than investigate the reason. In this work, we conduct an extensive evaluation of CNN-based face recognition systems (CNN-FRS) on a common ground to make our work easily reproducible. Specifically, we use public database LFW (Labeled Faces in the Wild) to train CNNs, unlike most existing CNNs trained on private databases. We propose three CNN architectures which are the first reported architectures trained using LFW data. This paper quantitatively compares the architectures of CNNs and evaluates the effect of different implementation choices. We identify several useful properties of CNN-FRS. For instance, the dimensionality of the learned features can be significantly reduced without adverse effect on face recognition accuracy. In addition, a traditional metric learning method exploiting CNN-learned features is evaluated. Experiments show two crucial factors to good CNN-FRS performance are the fusion of multiple CNNs and metric learning. To make our work reproducible, source code and models will be made publicly available.

X Liu, P Barnaghi, B Cheng, L Wan, Y Yang (2015)OMI-DL: An Ontology Matching Framework, In: Services Computing, IEEE Transactions onPP99pp. 1-1
G Chi, S Hu, Y Yang, T Chen (2012)Response surface methodology with prediction uncertainty: A multi-objective optimisation approach, In: Chemical Engineering Research and Design90(9)pp. 1235-1244

In the field of response surface methodology (RSM), the prediction uncertainty of the empirical model needs to be considered for effective process optimisation. Current methods combine the prediction mean and uncertainty through certain weighting strategies, either explicitly or implicitly, to form a single objective function for optimisation. This paper proposes to address this problem under the multi-objective optimisation framework. Overall, the method iterates through initial experimental design, empirical modelling and model-based optimisation to allocate promising experiments for the next iteration. Specifically, the Gaussian process regression is adopted as the empirical model due to its demonstrated prediction accuracy and reliable quantification of prediction uncertainty in the literature. The non-dominated sorting genetic algorithm II (NSGA-II) is used to search for Pareto points that are further clustered to give experimental points to be conducted in the next iteration. The application study, on the optimisation of a catalytic epoxidation process, demonstrates that the proposed method is a powerful tool to aid the development of chemical and potentially other processes. © 2011 The Institution of Chemical Engineers.

D. Li, Y. Yang, Yi-Zhe Song, T.M. Hospedales (2017)Deeper, Broader and Artier Domain Generalization, In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV 2017)pp. 5543-5551 Institute of Electrical and Electronics Engineers Inc.

The problem of domain generalization is to learn from multiple training domains, and extract a domain-agnostic model that can then be applied to an unseen domain. Domain generalization (DG) has a clear motivation in contexts where there are target domains with distinct characteristics, yet sparse data for training. For example recognition in sketch images, which are distinctly more abstract and rarer than photos. Nevertheless, DG methods have primarily been evaluated on photo-only benchmarks focusing on alleviating the dataset bias where both problems of domain distinctiveness and data sparsity can be minimal. We argue that these benchmarks are overly straightforward, and show that simple deep learning baselines perform surprisingly well on them. In this paper, we make two main contributions: Firstly, we build upon the favorable domain shift-robust properties of deep learning methods, and develop a low-rank parameterized CNN model for end-to-end DG learning. Secondly, we develop a DG benchmark dataset covering photo, sketch, cartoon and painting domains. This is both more practically relevant, and harder (bigger domain shift) than existing benchmarks. The results show that our method outperforms existing DG alternatives, and our dataset provides a more significant DG challenge to drive future research.

Mingying Song, Ali Karatutlu, Isma Ali, Osman Ersoy, Yun Zhou, Yongxin Yang, Yuanpeng Zhang, William R. Little, Ann P. Wheeler, Andrei V. Sapelkin (2016)Spectroscopic super-resolution fluorescence cell imaging using ultra-small Ge quantum dots, In: Optics Express25(4)pp. 4240-4253 Optical Society of America

We demonstrate a spectroscopic imaging based super-resolution approach by separating the overlapping diffraction spots into several detectors during a single scanning period and taking advantage of the size-dependent emission wavelength in nanoparticles. This approach has been tested using off-the-shelf quantum dots (Invitrogen Qdot) and inhouse novel ultra-small (â¼3 nm) Ge QDs. Furthermore, we developed a method-specific Gaussian fitting and maximum likelihood estimation based on a Matlab algorithm for fast QD localisation. This methodology results in a three-fold improvement in the number of localised QDs compared to non-spectroscopic images. With the addition of advanced ultra-small Ge probes, the number can be improved even further, giving at least 1.5 times improvement when compared to Qdots. Using a standard scanning confocal microscope we achieved a data acquisition rate of 200 ms per image frame. This is an improvement on single molecule localisation super-resolution microscopy where repeated image capture limits the imaging speed, and the size of fluorescence probes limits the possible theoretical localisation resolution. We show that our spectral deconvolution approach has a potential to deliver data acquisition rates on the ms scale thus providing super-resolution in live systems. © 2017, OSA - The Optical Society. All rights reserved.

Qian Yu, Yongxin Yang, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales (2016)Sketch-a-Net: A Deep Neural Network that Beats Humans, In: International Journal of Computer Vision122(3)pp. 411-425 Springer New York LLC

We propose a deep learning approach to free-hand sketch recognition that achieves state-of-the-art performance, significantly surpassing that of humans. Our superior performance is a result of modelling and exploiting the unique characteristics of free-hand sketches, i.e., consisting of an ordered set of strokes but lacking visual cues such as colour and texture, being highly iconic and abstract, and exhibiting extremely large appearance variations due to different levels of abstraction and deformation. Specifically, our deep neural network, termed Sketch-a-Net has the following novel components: (i) we propose a network architecture designed for sketch rather than natural photo statistics. (ii) Two novel data augmentation strategies are developed which exploit the unique sketch-domain properties to modify and synthesise sketch training data at multiple abstraction levels. Based on this idea we are able to both significantly increase the volume and diversity of sketches for training, and address the challenge of varying levels of sketching detail commonplace in free-hand sketches. (iii) We explore different network ensemble fusion strategies, including a re-purposed joint Bayesian scheme, to further improve recognition performance. We show that state-of-the-art deep networks specifically engineered for photos of natural objects fail to perform well on sketch recognition, regardless whether they are trained using photos or sketches. Furthermore, through visualising the learned filters, we offer useful insights in to where the superior performance of our network comes from. © 2016, Springer Science+Business Media New York.

Y Zhou, Y Yang, L Liang, D He, Z Sun (2010)An agent-based scheme for supporting service and resource management in wireless cloud, In: Proceedings - 9th International Conference on Grid and Cloud Computing, GCC 2010pp. 34-39

The growing demand of mobile wireless internet access has prompted rapid growth of wireless data services. The key issue of wireless cloud is to provide complex services by using the available resources within reasonable cost. Accompanying the emergence of integrated wireless and mobile networks (e.g. WLAN, 3G/4G net), it makes a challenge of networks as service to supporting the complete lifecycle of service building and delivery via wireless cloud providers. This paper proposes an agent-based scheme to discover comprehensive service, select and allocate resources for supporting the cloud applications in wireless platforms with respect to efficiency and fairness of resource utilization. This paper presents an optimization resource selection strategy with selection and allocation of network resources based on the agent-based scheme in enabling Quality of Service (QoS) in wireless cloud environment. © 2010 IEEE.

Q. Yu, Y. Yang, F. Liu, Yi-Zhe Song, T. Xiang, T.M. Hospedales (2017)Sketch-a-Net: A Deep Neural Network that Beats Humans, In: International Journal of Computer Vision122(3)pp. 411-425 Springer

We propose a deep learning approach to free-hand sketch recognition that achieves state-of-the-art performance, significantly surpassing that of humans. Our superior performance is a result of modelling and exploiting the unique characteristics of free-hand sketches, i.e., consisting of an ordered set of strokes but lacking visual cues such as colour and texture, being highly iconic and abstract, and exhibiting extremely large appearance variations due to different levels of abstraction and deformation. Specifically, our deep neural network, termed Sketch-a-Net has the following novel components: (i) we propose a network architecture designed for sketch rather than natural photo statistics. (ii) Two novel data augmentation strategies are developed which exploit the unique sketch-domain properties to modify and synthesise sketch training data at multiple abstraction levels. Based on this idea we are able to both significantly increase the volume and diversity of sketches for training, and address the challenge of varying levels of sketching detail commonplace in free-hand sketches. (iii) We explore different network ensemble fusion strategies, including a re-purposed joint Bayesian scheme, to further improve recognition performance. We show that state-of-the-art deep networks specifically engineered for photos of natural objects fail to perform well on sketch recognition, regardless whether they are trained using photos or sketches. Furthermore, through visualising the learned filters, we offer useful insights in to where the superior performance of our network comes from. © 2016, Springer Science+Business Media New York.

Y Yang, K Chen (2006)An Ensemble of Competitive Learning Networks with Different Representations for Temporal Data Clustering, In: 2006 International Joint Conference on Neural Networkspp. 3120-3127

Temporal data clustering provides useful techniques for condensing and summarizing information conveyed in temporal data, which is demanded in various fields ranging from time series analysis to sequential data understanding. In this paper, we propose a novel approach to temporal data clustering by an ensemble of competitive learning networks incorporated by different representations of temporal data. In our approach, competitive learning networks of the rival-penalized learning mechanism are employed for clustering analyses based on different temporal data representations while an optimal selection function is applied to find out a final consensus partition from multiple partition candidates yielded by applying alternative consensus functions to results of competitive learning on different representations. Thanks to its capability of the rival penalized learning rules in automatic model selection and the synergy of fusing diverse partitions on different representations, our ensemble approach yields favorite results, which has been demonstrated in time series and motion trajectory clustering tasks.

Y Yang, Y Zhou, Zhili Sun, Haitham Cruickshank (2013)Heuristic scheduling algorithms for allocation of virtualized network and computing resources, In: Journal of Software Engineering and Applications6(1)pp. 1-13 Scientific Research Publishing

Cloud computing technology facilitates computing-intensive applications by providing virtualized resources which can be dynamically provisioned. However, user’s requests are varied according to different applications’ computation abil- ity needs. These applications can be presented as meta-job of user’s demand. The total processing time of these jobs may need data transmission time over the Internet as well as the completed time of jobs to execute on the virtual ma- chine must be taken into account. In this paper, we presented V-heuristics scheduling algorithm for allocation of virtu- alized network and computing resources under user’s constraint which applied into a service-oriented resource broker for jobs scheduling. This scheduling algorithm takes into account both data transmission time and computation time that related to virtualized network and virtual machine. The simulation results are compared with three different types of heuristic algorithms under conventional network or virtual network conditions such as MCT, Min-Min and Max-Min. e evaluate these algorithms within a simulated cloud environment via an abilene network topology which is real physical core network topology. These experimental results show that V-heuristic scheduling algorithm achieved significant performance gain for a variety of applications in terms of load balance, Makespan, average resource utilization and total processing time.

D. Li, Y. Yang, Yi-Zhe Song, T.M. Hospedales (2018)Learning to generalize: Meta-learning for domain generalization, In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018)pp. 3490-3497 AAAI press

Domain shift refers to the well known problem that a model trained in one source domain performs poorly when applied to a target domain with different statistics. Domain Generalization (DG) techniques attempt to alleviate this issue by producing models which by design generalize well to novel testing domains. We propose a novel meta-learning method for domain generalization. Rather than designing a specific model that is robust to domain shift as in most previous DG work, we propose a model agnostic training procedure for DG. Our algorithm simulates train/test domain shift during training by synthesizing virtual testing domains within each mini-batch. The meta-optimization objective requires that steps to improve training domain performance should also improve testing domain performance. This meta-learning procedure trains models with good generalization ability to novel domains. We evaluate our method and achieve state of the art results on a recent cross-domain image classification benchmark, as well demonstrating its potential on two classic reinforcement learning tasks.

Zhili Sun, Yichao Yang, Yanbao Zhou, Haitham Cruickshank (2016)Agent-Based Resource Management for Mobile Cloud, In: Web-Based Services: Concepts, Methodologies, Tools, and Applicationspp. 290-306 IGI Global

Mobile cloud computing is a new computing paradigm to integrate cloud computing technology into the mobile environment. It takes full advantages of cloud computing with great potential to transform a large part of the IT industry. The objectives of mobile cloud computing are to meet user demand, efficiently utilize a pool of resources, including mobile network, storage, and computation resources, and optimize energy on mobile devices. Here, the authors review the current mobile cloud computing technologies, highlight the main issues and challenges for the future development, and focus on resource management. Then, combining the current agent architectures and resource optimization strategies, they present an agent-based resource management to deal with multiple data and computation intensive applications of user demand. The chapter offers a promising solution of selecting the best service provider and efficiently utilizing mobile network resources given the user's request constraint.

Y Yang, K Chen (2011)Temporal data clustering via weighted clustering ensemble with different representations, In: IEEE Transactions on Knowledge and Data Engineering23(2)pp. 307-320 IEEE

Temporal data clustering provides underpinning techniques for discovering the intrinsic structure and condensing information over temporal data. In this paper, we present a temporal data clustering framework via a weighted clustering ensemble of multiple partitions produced by initial clustering analysis on different temporal data representations. In our approach, we propose a novel weighted consensus function guided by clustering validation criteria to reconcile initial partitions to candidate consensus partitions from different perspectives, and then, introduce an agreement function to further reconcile those candidate consensus partitions to a final partition. As a result, the proposed weighted clustering ensemble algorithm provides an effective enabling technique for the joint use of different representations, which cuts the information loss in a single representation and exploits various information sources underlying temporal data. In addition, our approach tends to capture the intrinsic structure of a data set, e.g., the number of clusters. Our approach has been evaluated with benchmark time series, motion trajectory, and time-series data stream clustering tasks. Simulation results demonstrate that our approach yields favorite results for a variety of temporal data clustering tasks. As our weighted cluster ensemble algorithm can combine any input partitions to generate a clustering ensemble, we also investigate its limitation by formal analysis and empirical studies.

Y Yang, K Chen (2011)Time series clustering via RPCL network ensemble with different representations, In: IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews41(2)pp. 190-199 IEEE

Time series clustering provides underpinning techniques for discovering the intrinsic structure and condensing/summarizing information conveyed in time series, which is demanded in various fields ranging from bioinformatics to video content understanding. In this paper, we present an unsupervised ensemble learning approach to time series clustering by combining rival-penalized competitive learning (RPCL) networks with different representations of time series. In our approach, the RPCL network ensemble is employed for clustering analyses based on different representations of time series whenever available, and an optimal selection function is applied to find out a final consensus partition from multiple partition candidates yielded by applying various consensus functions for the combination of competitive learning results. As a result, our approach first exploits its capability of the RPCL rule in clustering analysis of automatic model selection on individual representations and subsequently applies ensemble learning for the synergy of reconciling diverse partitions resulted from the use of different representations and augmenting RPCL networks in automatic model selection and overcoming its inherent limitation. Our approach has been evaluated on 16 benchmark time series data mining tasks with comparison to state-of-the-art time series clustering techniques. Simulation results demonstrate that our approach yields favorite results in clustering analysis of automatic model selection.

Y Yang, Y Zhou, L Liang, D He, Z Sun (2010)A sevice-oriented broker for bulk data transfer in cloud computing, In: Proceedings of 9th International Conference on Grid and Cloud Computingpp. 264-269

Cloud computing emerges as new computing paradigms in which virtualized resources provide reliable and guarantee service for users demand. Actually, cloud is a service-oriented platform because all kind of virtual resources are treated as service to users. Nowadays, most of data-intensive applications have been developed on cloud system. These applications reaches geographically separated storage or data resource with even cross-continental-networks. Then, the performance degradation of networks will surely affect the cloud application performance and user request. In order to ensure guarantee service of bulk data transfer in cloud computing, the reservation and combined resources utilization become critical issues which include data and network resources. This issue involves reserve and assign combined resources to meet user's QoS requirement. According to this problem, a cloud infrastructure service framework (CISF) is proposed to achieve guarantee service for data-intensive applications in this paper. And a service-oriented resource broker (SRB) based on this framework which is proposed to discovery, select, reserve and assign best combined resources. Finally, under user's QoS constraint dynamic resource selection algorithm has been implemented for optimization of combined resources allocation.

MI Hossain, T Chen, Y Yang, R Lau (2009)Determination of actual object size distribution from direct imaging, In: Industrial and Engineering Chemistry Research48(22)pp. 10136-10146 American Chemical Society
Kaiyue Pang, Ke Li, Yongxin Yang, Honggang Zhang, Timothy M. Hospedales, Tao Xiang, Yi-Zhe Song (2019)Generalising Fine-Grained Sketch-Based Image Retrieval, In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019)pp. 677-686 Institute of Electrical and Electronics Engineers (IEEE)

Fine-grained sketch-based image retrieval (FG-SBIR) addresses matching specific photo instance using free-hand sketch as a query modality. Existing models aim to learn an embedding space in which sketch and photo can be directly compared. While successful, they require instance-level pairing within each coarse-grained category as annotated training data. Since the learned embedding space is domain-specific, these models do not generalise well across categories. This limits the practical applicability of FGSBIR. In this paper, we identify cross-category generalisation for FG-SBIR as a domain generalisation problem, and propose the first solution. Our key contribution is a novel unsupervised learning approach to model a universal manifold of prototypical visual sketch traits. This manifold can then be used to paramaterise the learning of a sketch/photo representation. Model adaptation to novel categories then becomes automatic via embedding the novel sketch in the manifold and updating the representation and retrieval function accordingly. Experiments on the two largest FG-SBIR datasets, Sketchy and QMUL-Shoe-V2, demonstrate the efficacy of our approach in enabling crosscategory generalisation of FG-SBIR.

U.R. Muhammad, Y. Yang, Yi-Zhe Song, T. Xiang, T.M. Hospedales (2019)Learning Deep Sketch Abstraction, In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognitionpp. 8014-8023 IEEE Computer Society

Human free-hand sketches have been studied in various contexts including sketch recognition, synthesis and fine-grained sketch-based image retrieval (FG-SBIR). A fundamental challenge for sketch analysis is to deal with drastically different human drawing styles, particularly in terms of abstraction level. In this work, we propose the first stroke-level sketch abstraction model based on the insight of sketch abstraction as a process of trading off between the recognizability of a sketch and the number of strokes used to draw it. Concretely, we train a model for abstract sketch generation through reinforcement learning of a stroke removal policy that learns to predict which strokes can be safely removed without affecting recognizability. We show that our abstraction model can be used for various sketch analysis tasks including: (1) modeling stroke saliency and understanding the decision of sketch recognition models, (2) synthesizing sketches of variable abstraction for a given category, or reference object instance in a photo, and (3) training a FG-SBIR model with photos only, bypassing the expensive photo-sketch pair collection step.

X Yang, Zhili Sun, Y Miao, N Wang, S Kang, Y Wang, Y Yang (2016)Performance Optimisation for DSDV in VANETs, In: Proceedings of the 17th UKSim-AMSS International Conference on Modelling and Simulation (UKSim), 2015pp. 514-519

In recent years, Mobile Ad hoc Networks (MANETs) have been great interest all over the world for its advantage of high mobility and flexibility. It is also among the greatest challenges in wireless communications. As a special type of MANET, Vehicular Ad hoc Networks (VANETs) are considerably important in Next-Generation Networking (NGN). Unlike typical MANETs, VANETs are much more challenging due to high velocity, which makes classic MANET routing protocols cannot fit in such scenarios efficiently. This paper is intended to evaluate performance of two different routing protocols, namely DSDV and AODV, in various realistic scenarios. Thus, a DSDV optimization approach is therefore proposed to improve DSDV's performance in VANETs.