Biography

My publications

Publications

J. Collomosse, T. Bui, A. Brown, J. Sheridan, A. Green, M. Bell, J. Fawcett, J. Higgins, O. Thereaux (2020)ARCHANGEL: Trusted Archives of Digital Public Documents, In: Proceedings of the ACM Symposium on Document Engineering 2018 - DocEng '18pp. 1-4

We present ARCHANGEL; a decentralised platform for ensuring the long-term integrity of digital documents stored within public archives. Document integrity is fundamental to public trust in archives. Yet currently that trust is built upon institutional reputation --- trust at face value in a centralised authority, like a national government archive or University. ARCHANGEL proposes a shift to a technological underscoring of that trust, using distributed ledger technology (DLT) to cryptographically guarantee the provenance, immutability and so the integrity of archived documents. We describe the ARCHANGEL architecture, and report on a prototype of that architecture build over the Ethereum infrastructure. We report early evaluation and feedback of ARCHANGEL from stakeholders in the research data archives space.

Leo Sampaio Ferraz Ribeiro, Tu Bui, John Collomosse, Moacir Ponti (2020)Sketchformer: Transformer-Based Representation for Sketched Structure, In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)pp. 14141-14150

Sketchformer is a novel transformer-based representation for encoding free-hand sketches input in a vector form, i.e. as a sequence of strokes. Sketchformer effectively addresses multiple tasks: sketch classification, sketch based image retrieval (SBIR), and the reconstruction and interpolation of sketches. We report several variants exploring continuous and tokenized input representations, and contrast their performance. Our learned embedding, driven by a dictionary learning tokenization scheme, yields state of the art performance in classification and image retrieval tasks, when compared against baseline representations driven by LSTM sequence to sequence architectures: SketchRNN and derivatives. We show that sketch reconstruction and interpolation are improved significantly by the Sketchformer embedding for complex sketches with longer stroke sequences.

Tu Bui, L Ribeiro, M Ponti, John Collomosse (2017)Compact Descriptors for Sketch-based Image Retrieval using a Triplet loss Convolutional Neural Network, In: Computer Vision and Image Understanding164pp. 27-37 Elsevier

We present an efficient representation for sketch based image retrieval (SBIR) derived from a triplet loss convolutional neural network (CNN). We treat SBIR as a cross-domain modelling problem, in which a depiction invariant embedding of sketch and photo data is learned by regression over a siamese CNN architecture with half-shared weights and modified triplet loss function. Uniquely, we demonstrate the ability of our learned image descriptor to generalise beyond the categories of object present in our training data, forming a basis for general cross-category SBIR. We explore appropriate strategies for training, and for deriving a compact image descriptor from the learned representation suitable for indexing data on resource constrained e. g. mobile devices. We show the learned descriptors to outperform state of the art SBIR on the defacto standard Flickr15k dataset using a significantly more compact (56 bits per image, i. e. ≈ 105KB total) search index than previous methods.

Tu Bui, Daniel Cooper, John Collomosse, Mark Bell, Alex Green, John Sheridan, Jez Higgins, Arindra Das, Jared Robert Keller, Olivier Thereaux (2020)Tamper-proofing Video with Hierarchical Attention Autoencoder Hashing on Blockchain, In: IEEE Transactions on Multimedia Institute of Electrical and Electronics Engineers

We present ARCHANGEL; a novel distributed ledger based system for assuring the long-term integrity of digital video archives. First, we introduce a novel deep network architecture using a hierarchical attention autoencoder (HAAE) to compute temporal content hashes (TCHs) from minutes or hourlong audio-visual streams. Our TCHs are sensitive to accidental or malicious content modification (tampering). The focus of our self-supervised HAAE is to guard against content modification such as frame truncation or corruption but ensure invariance against format shift (i.e. codec change). This is necessary due to the curatorial requirement for archives to format shift video over time to ensure future accessibility. Second, we describe how the TCHs (and the models used to derive them) are secured via a proof-of-authority blockchain distributed across multiple independent archives.We report on the efficacy of ARCHANGEL within the context of a trial deployment in which the national government archives of the United Kingdom, United States of America, Estonia, Australia and Norway participated.

Tu Bui, Leonardo Ribeiro, Moacir Ponti, John Collomosse (2018)Sketching out the details: Sketch-based image retrieval using convolutional neural networks with multi-stage regression, In: Computers & Graphics71pp. 77-87 Elsevier

We propose and evaluate several deep network architectures for measuring the similarity between sketches and photographs, within the context of the sketch based image retrieval (SBIR) task. We study the ability of our networks to generalize across diverse object categories from limited training data, and explore in detail strategies for weight sharing, pre-processing, data augmentation and dimensionality reduction. In addition to a detailed comparative study of network configurations, we contribute by describing a hybrid multi-stage training network that exploits both contrastive and triplet networks to exceed state of the art performance on several SBIR benchmarks by a significant margin.

Moacir Antonelli Ponti, Leonardo Sampaio Ferraz Ribeiro, Tiago Santana Nazare, Tu Bui, John Collomosse (2018)Everything You Wanted to Know about Deep Learning for Computer Vision but Were Afraid to Ask, In: Proceedings of Sibgrapi 2017pp. 17-41 IEEE

Deep Learning methods are currently the state-of-the-art in many Computer Vision and Image Processing problems, in particular image classification. After years of intensive investigation, a few models matured and became important tools, including Convolutional Neural Networks (CNNs), Siamese and Triplet Networks, Auto-Encoders (AEs) and Generative Adversarial Networks (GANs). The field is fast-paced and there is a lot of terminologies to catch up for those who want to adventure in Deep Learning waters. This paper has the objective to introduce the most fundamental concepts of Deep Learning for Computer Vision in particular CNNs, AEs and GANs, including architectures, inner workings and optimization. We offer an updated description of the theoretical and practical knowledge of working with those models. After that, we describe Siamese and Triplet Networks, not often covered in tutorial papers, as well as review the literature on recent and exciting topics such as visual stylization, pixel-wise prediction and video processing. Finally, we discuss the limitations of Deep Learning for Computer Vision.

John Collomosse, Tu Bui, Hailin Jin (2019)LiveSketch: Query Perturbations for Guided Sketch-based Visual Search, In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019)pp. 2879-2887 Institute of Electrical and Electronics Engineers (IEEE)

LiveSketch is a novel algorithm for searching large image collections using hand-sketched queries. LiveSketch tackles the inherent ambiguity of sketch search by creating visual suggestions that augment the query as it is drawn, making query specification an iterative rather than one-shot process that helps disambiguate users' search intent. Our technical contributions are: a triplet convnet architecture that incorporates an RNN based variational autoencoder to search for images using vector (stroke-based) queries; real-time clustering to identify likely search intents (and so, targets within the search embedding); and the use of backpropagation from those targets to perturb the input stroke sequence, so suggesting alterations to the query in order to guide the search. We show improvements in accuracy and time-to-task over contemporary baselines using a 67M image corpus.

Tu Bui, J Collomosse (2015)Font finder: Visual recognition of typeface in printed documents, In: 2015 IEEE International Conference on Image Processing (ICIP) IEEE

We describe a novel algorithm for visually identifying the font used in a scanned printed document. Our algorithm requires no pre-recognition of characters in the string (i. e. optical character recognition). Gradient orientation features are collected local the character boundaries, and quantized into a hierarchical Bag of Visual Words representation. Following stop-word analysis, classification via logistic regression (LR) of the codebooked features yields per-character probabilities which are combined across the string to decide the posterior for each font. We achieve 93.4% accuracy over a 1000 font database of scanned printed text comprising Latin characters.

Tu Bui, Leonardo Ribeiro, Moacir Ponti, John Collomosse (2019)Deep Manifold Alignment for Mid-Grain Sketch Based Image Retrieval, In: Computer Vision – ACCV 201811363pp. 314-329 Springer Verlag

We present an algorithm for visually searching image collections using free-hand sketched queries. Prior sketch based image retrieval (SBIR) algorithms adopt either a category-level or fine-grain (instance-level) definition of cross-domain similarity—returning images that match the sketched object class (category-level SBIR), or a specific instance of that object (fine-grain SBIR). In this paper we take the middle-ground; proposing an SBIR algorithm that returns images sharing both the object category and key visual characteristics of the sketched query without assuming photo-approximate sketches from the user. We describe a deeply learned cross-domain embedding in which ‘mid-grain’ sketch-image similarity may be measured, reporting on the efficacy of unsupervised and semi-supervised manifold alignment techniques to encourage better intra-category (mid-grain) discrimination within that embedding. We propose a new mid-grain sketch-image dataset (MidGrain65c) and demonstrate not only mid-grain discrimination, but also improved category-level discrimination using our approach.

John Collomosse, Tu Bui, M Wilber, C Fang, H Jin (2017)Sketching with Style: Visual Search with Sketches and Aesthetic Context, In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2017 IEEE

We propose a novel measure of visual similarity for image retrieval that incorporates both structural and aesthetic (style) constraints. Our algorithm accepts a query as sketched shape, and a set of one or more contextual images specifying the desired visual aesthetic. A triplet network is used to learn a feature embedding capable of measuring style similarity independent of structure, delivering significant gains over previous networks for style discrimination. We incorporate this model within a hierarchical triplet network to unify and learn a joint space from two discriminatively trained streams for style and structure. We demonstrate that this space enables, for the first time, style-constrained sketch search over a diverse domain of digital artwork comprising graphics, paintings and drawings. We also briefly explore alternative query modalities.

Tu Bui, John Collomosse (2016)Scalable sketch-based image retrieval using color gradient features, In: Proceedings of the IEEE International Conference on Computer Vision Workshops IEEE

We present a scalable system for sketch-based image retrieval (SBIR), extending the state of the art Gradient Field HoG (GF-HoG) retrieval framework through two technical contributions. First, we extend GF-HoG to enable color-shape retrieval and comprehensively evaluate several early-and late-fusion approaches for integrating the modality of color, considering both the accuracy and speed of sketch retrieval. Second, we propose an efficient inverse-index representation for GF-HoG that delivers scalable search with interactive query times over millions of images. A mobile app demo accompanies this paper (Android).

Tu Bui, John Collomosse, Mark Bell, Alex Green, John Sheridan, Jez Higgins, Arindra Das, Jared Keller, Olivier Thereaux, Alan Brown (2019)ARCHANGEL: Tamper-Proofing Video Archives Using Temporal Content Hashes on the Blockchain, In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019) Workshops Institute of Electrical and Electronics Engineers (IEEE)

We present ARCHANGEL; a novel distributed ledger based system for assuring the long-term integrity of digital video archives. First, we describe a novel deep network architecture for computing compact temporal content hashes (TCHs) from audio-visual streams with durations of minutes or hours. Our TCHs are sensitive to accidental or malicious content modification (tampering) but invariant to the codec used to encode the video. This is necessary due to the curatorial requirement for archives to format shift video over time to ensure future accessibility. Second, we describe how the TCHs (and the models used to derive them) are secured via a proof-of-authority blockchain distributed across multiple independent archives. We report on the efficacy of ARCHANGEL within the context of a trial deployment in which the national government archives of the United Kingdom, Estonia and Norway participated.