My research focuses on novel techniques in Signal Processing, Computer Vision and Machine Learning and their applications in industry, healthcare, big-data and security. I have a particular interest in image and video analysis & retrieval (visual search, object recognition, analysis of motion, shape & texture). The broad research objective is to develop unique methods and technology solutions for visual content understanding that can dramatically improve on existing state-of-the art leading to new applications. My algorithms for shape analysis and image/video fingerprinting as well as visual search are considered world-leading and were selected for ISO International standards within MPEG and used by, e.g. Metropolitan Police.
I have extensive collaboration links with universities and research institutions in Europe (UK, Switzerland, Germany, Poland, France, Spain), US, Japan and China. I have also worked with the following companies: the BBC (UK), Bang and Olufsen (DE), CEDEO (IT), Casio (JP), Ericsson (SE), Huawei (DE), Mitsubishi Electric (JP), RAI television (IT), Renesas Electronics (JP), Telecom Italia (IT), and Visual Atoms (UK).
I am the project coordinator and PI for the BRIDGET FP-7 project [5.28 M€], where my team is responsible for the development of ultra large-scale visual search and media analysis algorithms for the broadcast industry. The project aims to open new dimensions for multimedia content creation and consumption by bridging the gap between the Broadcast and Internet. Project partners include RAI television, Huawei, Telecom Italia and more.
CODAM is my latest project (PI) and is funded by the TSB creative media call [£1.05 M]. My team is working with the BBC and Visual Atoms to develop an advanced video asset management system with unique visual fingerprinting and visual search capabilities. It will aid content creation and deployment by enabling visual content tracking, identification and searching across multiple devices and platforms, and across diverse digital media ecosystems and markets. Where is the original version of the low-quality clip? Which video clip has been used most often in BBC programmes? Is it a stock shot of a red double decker bus, or an excerpt from a royal wedding? Is there other footage in the archive that shows the same event but can provide a fresh viewpoint? The CODAM system will answer these questions, track the origins of video clips across multi-platform productions and search for related material. It will take the form of a modular software system that can identify individual video clips in edited programmes, and perform object or scene recognition to find similar footage in an archive without relying on manually entered and often incomplete metadata.
Find me on campus Room: 29 BA 00
© 2016 IEEE.Local feature descriptors underpin many diverse applications, supporting object recognition, image registration, database search, 3D reconstruction and more. The recent phenomenal growth in mobile devices and mobile computing in general has created demand for descriptors that are not only discriminative, but also compact in size and fast to extract and match. In response, a large number of binary descriptors have been proposed, each claiming to overcome some limitations of the predecessors. This paper provides a comprehensive evaluation of several promising binary designs. We show that existing evaluation methodologies are not sufficient to fully characterize descriptors' performance and propose a new evaluation protocol and a challenging dataset. In contrast to the previous reviews, we investigate the effects of the matching criteria, operating points and compaction methods, showing that they all have a major impact on the systems' design and performance. Finally, we provide descriptor extraction times for both general-purpose systems and mobile devices, in order to better understand the real complexity of the extraction task. The objective is to provide a comprehensive reference and a guide that will help in selection and design of the future descriptors.
Visual search and image retrieval underpin numerous applications, however the task is still challenging predominantly due to the variability of object appearance and ever increasing size of the databases, often exceeding billions of images. Prior art methods rely on aggregation of local scale-invariant descriptors, such as SIFT, via mechanisms including Bag of Visual Words (BoW), Vector of Locally Aggregated Descriptors (VLAD) and Fisher Vectors (FV). However, their performance is still short of what is required. This paper presents a novel method for deriving a compact and distinctive representation of image content called Robust Visual Descriptor with Whitening (RVD-W). It significantly advances the state of the art and delivers world-class performance. In our approach local descriptors are rank-assigned to multiple clusters. Residual vectors are then computed in each cluster, normalized using a direction-preserving normalization function and aggregated based on the neighborhood rank. Importantly, the residual vectors are de-correlated and whitened in each cluster before aggregation, leading to a balanced energy distribution in each dimension and significantly improved performance. We also propose a new post-PCA normalization approach which improves separability between the matching and non-matching global descriptors. This new normalization benefits not only our RVD-W descriptor but also improves existing approaches based on FV and VLAD aggregation. Furthermore, we show that the aggregation framework developed using hand-crafted SIFT features also performs exceptionally well with Convolutional Neural Network (CNN) based features. The RVD-W pipeline outperforms state-of-the-art global descriptors on both the Holidays and Oxford datasets. On the large scale datasets, Holidays1M and Oxford1M, SIFT-based RVD-W representation obtains a mAP of 45.1% and 35.1%, while CNN-based RVD-W achieve a mAP of 63.5% and 44.8%, all yielding superior performance to the state-of-the-art.
This paper presents an approach for generating class-specific image segmentation. We introduce two novel features that use the quantized data of the Discrete Cosine Transform (DCT) in a Semantic Texton Forest based framework (STF), by combining together colour and texture information for semantic segmentation purpose. The combination of multiple features in a segmentation system is not a straightforward process. The proposed system is designed to exploit complementary features in a computationally efficient manner. Our DCT based features describe complex textures represented in the frequency domain and not just simple textures obtained using differences between intensity of pixels as in the classic STF approach. Differently than existing methods (e.g., filter bank) just a limited amount of resources is required. The proposed method has been tested on two popular databases: CamVid and MSRC-v2. Comparison with respect to recent state-of-the-art methods shows improvement in terms of semantic segmentation accuracy.
This paper presents the core technologies of the Video Signature Tools recently standardized by ISO/IEC MPEG as an amendment to the MPEG-7 Standard (ISO/IEC 15938). The Video Signature is a high-performance content fingerprint which is suitable for desktop-scale to web-scale deployment and provides high levels of robustness to common video editing operations and high temporal localization accuracy at extremely low false alarm rates, achieving a detection rate in the order of 96% at a false alarm rate in the order of five false matches per million comparisons. The applications of the Video Signature are numerous and include rights management and monetization, distribution management, usage monitoring, metadata association, and corporate or personal database management. In this paper we review the prior work in the field, we explain the standardization process and status, and we provide the details and evaluation results of the Video Signature Tools.
This paper addresses the problem of ultra-large-scale search in Hamming spaces. There has been considerable research on generating compact binary codes in vision, for example for visual search tasks. However the issue of efficient searching through huge sets of binary codes remains largely unsolved. To this end, we propose a novel, unsupervised approach to thresholded search in Hamming space, supporting long codes (e.g. 512-bits) with a wide-range of Hamming distance radii. Our method is capable of working efficiently with billions of codes delivering between one to three orders of magnitude acceleration, as compared to prior art. This is achieved by relaxing the equal-size constraint in the Multi-Index Hashing approach, leading to multiple hash-tables with variable length hash-keys. Based on the theoretical analysis of the retrieval probabilities of multiple hash-tables we propose a novel search algorithm for obtaining a suitable set of hash-key lengths. The resulting retrieval mechanism is shown empirically to improve the efficiency over the state-of-the-art, across a range of datasets, bit-depths and retrieval thresholds.
This paper addresses the problem of aggregating local binary descriptors for large scale image retrieval in mobile scenarios. Binary descriptors are becoming increasingly popular, especially in mobile applications, as they deliver high matching speed, have a small memory footprint and are fast to extract. However, little research has been done on how to efficiently aggregate binary descriptors. Direct application of methods developed for conventional descriptors, such as SIFT, results in unsatisfactory performance. In this paper we introduce and evaluate several algorithms to compress high-dimensional binary local descriptors, for efficient retrieval in large databases. In addition, we propose a robust global image representation; Binary Robust Visual Descriptor (B-RVD), with rank-based multi-assignment of local descriptors and direction-based aggregation, achieved by the use of L1-norm on residual vectors. The performance of the B-RVD is further improved by balancing the variances of residual vector directions in order to maximize the discriminatory power of the aggregated vectors. Standard datasets and measures have been used for evaluation showing significant improvement of around 4% mean Average Precision as compared to the state-of-the-art.
Recent studies have shown that it is possible to attack a finger vein (FV) based biometric system using printed materials. In this study, we propose a novel method to detect spoofing of static finger vein images using Windowed Dynamic mode decomposition (W-DMD). This is an atemporal variant of the recently proposed Dynamic Mode Decomposition for image sequences. The proposed method achieves better results when compared to established methods such as local binary patterns (LBP), discrete wavelet transforms (DWT), histogram of gradients (HoG), and filter methods such as range-filters, standard deviation filters (STD) and entropy filters, when using SVM with a minimum intersection kernel. The overall pipeline which consists ofW-DMD and SVM, proves to be efficient, and convenient to use, given the absence of additional parameter tuning requirements. The effectiveness of our methodology is demonstrated using FV-Spoofing-Attack database which is publicly available. Our test results show that W-DMD can successfully detect printed finger vein images because they contain micro-level artefacts that not only differ in quality but also in light reflection properties compared to valid/live finger vein images.
We compare experimentally the performance of three approaches to ensemble-based classification on general multi-class datasets. These are the methods of random forest, error-correcting output codes (ECOC) and ECOC enhanced by the use of bootstrapping and class-separability weighting (ECOC-BW). These experiments suggest that ECOC-BW yields better generalisation performance than either random forest or unmodified ECOC. A bias-variance analysis indicates that ECOC benefits from reduced bias, when compared to random forest, and that ECOC-BW benefits additionally from reduced variance. One disadvantage of ECOC-based algorithms, however, when compared with random forest, is that they impose a greater computational demand leading to longer training times.
Page Owner: mb0012
Page Created: Wednesday 28 September 2011 14:54:39 by lb0014
Last Modified: Wednesday 26 October 2016 16:08:33 by ak0039
Expiry Date: Friday 28 December 2012 14:33:37
Assembly date: Fri Feb 23 00:17:53 GMT 2018
Content ID: 65577