Dr Krystian Mikolajczyk

Visiting Lecturer

Qualifications: PhD, MSc

Email:

Further information

Publications

Highlights

  • Kalal Z, Matas J, Mikolajczyk K. (2012) 'Tracking-Learning-Detection'. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34 (7), pp. 1409-1422.

    Abstract

    This paper investigates long-term tracking of unknown objects in a video stream. The object is defined by its location and extent in a single frame. In every frame that follows, the task is to determine the object's location and extent or indicate that the object is not present. We propose a novel tracking framework (TLD) that explicitly decomposes the long-term tracking task into tracking, learning and detection. The tracker follows the object from frame to frame. The detector localizes all appearances that have been observed so far and corrects the tracker if necessary. The learning estimates detector's errors and updates it to avoid these errors in the future. We study how to identify detector's errors and learn from them. We develop a novel learning method (P-N learning) which estimates the errors by a pair of "experts'': (i) P-expert estimates missed detections, and (ii) N-expert estimates false alarms. The learning process is modeled as a discrete dynamical system and the conditions under which the learning guarantees improvement are found. We describe our real-time implementation of the TLD framework and the P-N learning. We carry out an extensive quantitative evaluation which shows a significant improvement over state-of-the-art approaches.

  • Yan F, Kittler J, Mikolajczyk K, Tahir A. (2011) 'Non-Sparse Multiple Kernel Fisher Discriminant Analysis'. Journal of Machine Learning Research, 13, pp. 607-642.

    Abstract

    Sparsity-inducing multiple kernel Fisher discriminant analysis (MK-FDA) has been studied in the literature. Building on recent advances in non-sparse multiple kernel learning (MKL), we propose a non-sparse version of MK-FDA, which imposes a general `p norm regularisation on the kernel weights. We formulate the associated optimisation problem as a semi-infinite program (SIP), and adapt an iterative wrapper algorithm to solve it. We then discuss, in light of latest advances inMKL optimisation techniques, several reformulations and optimisation strategies that can potentially lead to significant improvements in the efficiency and scalability of MK-FDA. We carry out extensive experiments on six datasets from various application areas, and compare closely the performance of `p MK-FDA, fixed norm MK-FDA, and several variants of SVM-based MKL (MK-SVM). Our results demonstrate that `p MK-FDA improves upon sparse MK-FDA in many practical situations. The results also show that on image categorisation problems, `p MK-FDA tends to outperform its SVM counterpart. Finally, we also discuss the connection between (MK-)FDA and (MK-)SVM, under the unified framework of regularised kernel machines.

  • Cai H, Mikolajczyk K, Matas J. (2011) 'Learning linear discriminant projections for dimensionality reduction of image descriptors'. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33 (2), pp. 338-352.

    Abstract

    In this paper, we present Linear Discriminant Projections (LDP) for reducing dimensionality and improving discriminability of local image descriptors. We place LDP into the context of state-of-the-art discriminant projections and analyze its properties. LDP requires a large set of training data with point-to-point correspondence ground truth. We demonstrate that training data produced by a simulation of image transformations leads to nearly the same results as the real data with correspondence ground truth. This makes it possible to apply LDP as well as other discriminant projection approaches to the problems where the correspondence ground truth is not available, such as image categorization. We perform an extensive experimental evaluation on standard data sets in the context of image matching and categorization. We demonstrate that LDP enables significant dimensionality reduction of local descriptors and performance increases in different applications. The results improve upon the state-of-the-art recognition performance with simultaneous dimensionality reduction from 128 to 30.

  • Mikolajczyk K, Uemura H. (2011) 'Action recognition with appearance-motion features and fast search trees'. Computer Vision and Image Understanding, 115 (3), pp. 426-438.
  • Cai H, Yan F, Mikolajczyk K. (2010) 'Learning Weights for Codebook in Image Classification and Retrieval'. IEEE IEEE Conference on Computer Vision and Pattern Recognition, pp. 2320-2327.

    Abstract

    This paper presents a codebook learning approach for image classification and retrieval. It corresponds to learning a weighted similarity metric to satisfy that the weighted similarity between the same labeled images is larger than that between the differently labeled images with largest margin. We formulate the learning problem as a convex quadratic programming and adopt alternating optimization to solve it efficiently. Experiments on both synthetic and real datasets validate the approach. The codebook learning improves the performance, in particular in the case where the number of training examples is not sufficient for large size codebook.

  • Kalal Z, Matas J, Mikolajczyk K. (2010) 'P-N Learning: Bootstrapping Binary Classifiers by Structural Constraints'. IEEE COMPUTER SOC 2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), San Francisco, CA: 23rd IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 49-56.
  • Yan F, Mikolajczyk K, Barnard M, Cai H, Kittler J. (2010) 'Lp Norm Multiple Kernel Fisher Discriminant Analysis for Object and Image Categorisation'. IEEE Conference on Computer Vision and Pattern Recognition
  • Yan F, Kittler J, Mikolajczyk K, Tahir A. (2009) 'Non-Sparse Multiple Kernel Learning for Fisher Discriminant Analysis'. Proceedings of The Ninth IEEE International Conference on Data Mining, Miami, USA: ICDM '09, pp. 1064-1069.

    Abstract

    We consider the problem of learning a linear combination of pre-specified kernel matrices in the Fisher discriminant analysis setting. Existing methods for such a task impose an ¿1 norm regularisation on the kernel weights, which produces sparse solution but may lead to loss of information. In this paper, we propose to use ¿2 norm regularisation instead. The resulting learning problem is formulated as a semi-infinite program and can be solved efficiently. Through experiments on both synthetic data and a very challenging object recognition benchmark, the relative advantages of the proposed method and its ¿1 counterpart are demonstrated, and insights are gained as to how the choice of regularisation norm should be made.

  • Tahir MA, Kittler J, Mikolajczyk K, Yan F, Van De Sande KEA, Gevers T. (2009) 'Visual category recognition using spectral regression and kernel discriminant analysis'. 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009, , pp. 178-185.

    Abstract

    Visual category recognition (VCR) is one of the most important tasks in image and video indexing. Spectral methods have recently emerged as a powerful tool for dimensionality reduction and manifold learning. Recently, Spectral Regression combined with Kernel Discriminant Analysis (SR-KDA) has been successful in many classification problems. In this paper, we adopt this solution to VCR and demonstrate its advantages over existing methods both in terms of speed and accuracy. The distinctiveness of this method is assessed experimentally using an image and a video benchmark: the PASCAL VOC Challenge 08 and the Mediamill Challenge. From the experimental results, it can be derived that SR-KDA consistently yields significant performance gains when compared with the state-of-the art methods. The other strong point of using SR-KDA is that the time complexity scales linearly with respect to the number of concepts and the main computational complexity is independent of the number of categories. ©2009 IEEE.

  • Tuytelaars T, Mikolajczyk K. (2008) 'Local invariant feature detectors: A survey'. Foundations and Trends in Computer Graphics and Vision, 3 (3), pp. 177-280.

Journal articles

  • Balntas V, Tang HL, Mikolajczyk K. (2017) 'Binary Online Learned Descriptors'. IEEE Transactions on Pattern Analysis and Machine Intelligence,
    [ Status: Accepted ]

    Abstract

    We propose a novel approach to generate a binary descriptor optimized for each image patch independently. The approach is inspired by the linear discriminant embedding that simultaneously increases inter and decreases intra class distances. A set of discriminative and uncorrelated binary tests is established from all possible tests in an offline training process. The patch adapted descriptors are then efficiently built online from a subset of features which lead to lower intra-class distances and thus, to a more robust descriptor. We perform experiments on three widely used benchmarks and demonstrate improvements in matching performance, and illustrate that per-patch optimization outperforms global optimization.

  • Chan CH, Yan F, Kittler J, Mikolajczyk K. (2015) 'Full ranking as local descriptor for visual recognition: A comparison of distance metrics on sn'. Pattern Recognition, 48 (4), pp. 1324-1332.

    Abstract

    © 2014 Elsevier Ltd. All rights reserved.In this paper we propose to use the full ranking of a set of pixels as a local descriptor. In contrast to existing methods which use only partial ranking information, the full ranking encodes the complete comparative information among the pixels, while retaining invariance to monotonic photometric transformations. The descriptor is used within the bag-of-visual-words paradigm for visual recognition. We demonstrate that the choice of distance metric for assigning the descriptors to visual words is crucial to the performance, and provide an extensive evaluation of eight distance metrics for the permutation group Sn on four widely used face verification and texture classification benchmarks. The results demonstrate that (1) full ranking of pixels encodes more information than partial ranking, consistently leading to better performance; (2) full ranking descriptor can be trivially made rotation invariant; (3) the proposed descriptor applies to both image intensities and filter responses, and is capable of producing state-of-the-art performance.

  • Balntas V, Tang L, Mikolajczyk K. (2014) 'Improving object tracking with voting from false positive detections'. Proceedings - International Conference on Pattern Recognition, , pp. 1928-1933.

    Abstract

    © 2014 IEEE.Context provides additional information in detection and tracking and several works proposed online trained trackers that make use of the context. However, the context is usually considered during tracking as items with motion patterns significantly correlated with the target. We propose a new approach that exploits context in tracking-by-detection and makes use of persistent false positive detections. True detection as well as repeated false positives act as pointers to the location of the target. This is implemented with a generalised Hough voting and incorporated into a state-of-the art online learning framework. The proposed method presents good performance in both speed and accuracy and it improves the current state of the art results in a challenging benchmark.

  • Yan F, Kittler J, Windridge D, Christmas W, Mikolajczyk K, Cox S, Huang Q. (2014) 'Automatic annotation of tennis games: An integration of audio, vision, and learning'. Image and Vision Computing, 32 (11), pp. 896-903.

    Abstract

    Fully automatic annotation of tennis game using broadcast video is a task with a great potential but with enormous challenges. In this paper we describe our approach to this task, which integrates computer vision, machine listening, and machine learning. At the low level processing, we improve upon our previously proposed state-of-the-art tennis ball tracking algorithm and employ audio signal processing techniques to detect key events and construct features for classifying the events. At high level analysis, we model event classification as a sequence labelling problem, and investigate four machine learning techniques using simulated event sequences. Finally, we evaluate our proposed approach on three real world tennis games, and discuss the interplay between audio, vision and learning. To the best of our knowledge, our system is the only one that can annotate tennis game at such a detailed level. © 2014 Elsevier B.V.

  • Bowden R, Collomosse J, Mikolajczyk K. (2014) 'Guest Editorial: Tracking, Detection and Segmentation'. International Journal of Computer Vision,
    [ Status: Accepted ]
  • Chan, C. , Yan, F. , Kittler, K. , Mikolajczyk K. (2014) 'Full ranking as local descriptor for visual recognition: A comparison of distance metrics on Sn'. Pattern Recognition, Article number PR-D-14-00330R
  • Schubert F, Mikolajczyk K. (2013) 'Benchmarking GPU-based phase correlation for homography-based registration of aerial imagery'. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8048 LNCS (PART 2), pp. 83-90.

    Abstract

    Many multi-image fusion applications require fast registration methods in order to allow real-time processing. Although the most popular approaches, local-feature-based methods, have proven efficient enough for registering image pairs at real-time, some applications like multi-frame background subtraction, super-resolution or high-dynamic-range imaging benefit from even faster algorithms. A common trend to speed up registration is to implement the algorithms on graphic cards (GPUs). However not all algorithms are specially suited for massive parallelization via GPUs. In this paper we evaluate the speed of a well-known global registration method, i.e. phase correlation, for computing 8-DOF homographies. We propose a benchmark to compare a CPU- and GPU-based implementation using different systems and a dataset of aerial imagery. We demonstrate that phase correlation benefits from GPU-based implementations much more than local methods, significantly increasing the processing speed. © 2013 Springer-Verlag.

  • Koniusz P, Yan F, Mikolajczyk K. (2013) 'Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection'. Computer Vision and Image Understanding, 117 (5), pp. 479-492.

    Abstract

    Bag-of-Words lies at a heart of modern object category recognition systems. After descriptors are extracted from images, they are expressed as vectors representing visual word content, referred to as mid-level features. In this paper, we review a number of techniques for generating mid-level features, including two variants of Soft Assignment, Locality-constrained Linear Coding, and Sparse Coding. We also isolate the underlying properties that affect their performance. Moreover, we investigate various pooling methods that aggregate mid-level features into vectors representing images. Average pooling, Max-pooling, and a family of likelihood inspired pooling strategies are scrutinised. We demonstrate how both coding schemes and pooling methods interact with each other. We generalise the investigated pooling methods to account for the descriptor interdependence and introduce an intuitive concept of improved pooling. We also propose a coding-related improvement to increase its speed. Lastly, state-of-the-art performance in classification is demonstrated on Caltech101, Flower17, and ImageCLEF11 datasets. © 2012 Elsevier Inc. All rights reserved.

  • Boudissa A, Tan J, Kim H, Ishikawa S, Shinomiya T, Mikolajczyk K. (2013) 'A global-local approach to saliency detection'. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8048 LNCS (PART 2), pp. 332-337.

    Abstract

    In this paper, we present a novel approach to saliency detection. We define a visually salient region with the following two properties; global saliency i.e. the spatial redundancy, and local saliency i.e. the region complexity. The former is its probability of occurrence within the image, whereas the latter defines how much information is contained within the region, and it is quantified by the entropy. By combining the global spatial redundancy measure and local entropy, we can achieve a simple, yet robust saliency detector. We evaluate it quantitatively and compare to Itti et al. [6] as well as to the spectral residual approach [5] on publicly available data where it shows a significant improvement. © 2013 Springer-Verlag.

  • Tahir M, Yan F, Koniusz P, Awais M, Barnard M, Mikolajczyk K, Kittler J. (2012) 'A Robust and Scalable Visual Category and Action Recognition System using Kernel Discriminant Analysis with Spectral Regression'. IEEE Transactions on Multimedia,
    [ Status: Accepted ]
  • Kalal Z, Matas J, Mikolajczyk K. (2012) 'Tracking-Learning-Detection'. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34 (7), pp. 1409-1422.

    Abstract

    This paper investigates long-term tracking of unknown objects in a video stream. The object is defined by its location and extent in a single frame. In every frame that follows, the task is to determine the object's location and extent or indicate that the object is not present. We propose a novel tracking framework (TLD) that explicitly decomposes the long-term tracking task into tracking, learning and detection. The tracker follows the object from frame to frame. The detector localizes all appearances that have been observed so far and corrects the tracker if necessary. The learning estimates detector's errors and updates it to avoid these errors in the future. We study how to identify detector's errors and learn from them. We develop a novel learning method (P-N learning) which estimates the errors by a pair of "experts'': (i) P-expert estimates missed detections, and (ii) N-expert estimates false alarms. The learning process is modeled as a discrete dynamical system and the conditions under which the learning guarantees improvement are found. We describe our real-time implementation of the TLD framework and the P-N learning. We carry out an extensive quantitative evaluation which shows a significant improvement over state-of-the-art approaches.

  • Yan F, Kittler J, Mikolajczyk K, Tahir A. (2011) 'Non-Sparse Multiple Kernel Fisher Discriminant Analysis'. Journal of Machine Learning Research, 13, pp. 607-642.

    Abstract

    Sparsity-inducing multiple kernel Fisher discriminant analysis (MK-FDA) has been studied in the literature. Building on recent advances in non-sparse multiple kernel learning (MKL), we propose a non-sparse version of MK-FDA, which imposes a general `p norm regularisation on the kernel weights. We formulate the associated optimisation problem as a semi-infinite program (SIP), and adapt an iterative wrapper algorithm to solve it. We then discuss, in light of latest advances inMKL optimisation techniques, several reformulations and optimisation strategies that can potentially lead to significant improvements in the efficiency and scalability of MK-FDA. We carry out extensive experiments on six datasets from various application areas, and compare closely the performance of `p MK-FDA, fixed norm MK-FDA, and several variants of SVM-based MKL (MK-SVM). Our results demonstrate that `p MK-FDA improves upon sparse MK-FDA in many practical situations. The results also show that on image categorisation problems, `p MK-FDA tends to outperform its SVM counterpart. Finally, we also discuss the connection between (MK-)FDA and (MK-)SVM, under the unified framework of regularised kernel machines.

  • Awais M, Yan F, Mikolajczyk K, Kittler J. (2011) 'Augmented Kernel Matrix vs classifier fusion for object recognition'. BMVC 2011 - Proceedings of the British Machine Vision Conference 2011,

    Abstract

    Augmented Kernel Matrix (AKM) has recently been proposed to accommodate for the fact that a single training example may have different importance in different feature spaces, in contrast to Multiple Kernel Learning (MKL) that assigns the same weight to all examples in one feature space. However, the AKM approach is limited to small datasets due to its memory requirements. An alternative way to fuse information from different feature channels is classifier fusion (ensemble methods). There is a significant amount of work on linear programming formulations of classifier fusion (CF) in the case of binary classification. In this paper we derive primal and dual of AKM to draw its correspondence with CF. We propose a multiclass extension of binary v-LPBoost, which learns the contribution of each class in each feature channel. Existing approaches of CF promote sparse features combinations, due to regularization based on ℓ1-norm, and lead to a selection of a subset of feature channels, which is not good in case of informative channels. We also generalize existing CF formulations to arbitrary ℓp-norm for binary and multiclass problems which results in more effective use of complementary information. We carry out an extensive comparison and show that the proposed nonlinear CF schemes outperform its sparse counterpart as well as state-of-the-art MKL approaches. © 2011. The copyright of this document resides with its authors.

  • Mikolajczyk K, Uemura H. (2011) 'Action recognition with appearance-motion features and fast search trees'. Computer Vision and Image Understanding, 115 (3), pp. 426-438.
  • Cai H, Mikolajczyk K, Matas J. (2011) 'Learning linear discriminant projections for dimensionality reduction of image descriptors'. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33 (2), pp. 338-352.

    Abstract

    In this paper, we present Linear Discriminant Projections (LDP) for reducing dimensionality and improving discriminability of local image descriptors. We place LDP into the context of state-of-the-art discriminant projections and analyze its properties. LDP requires a large set of training data with point-to-point correspondence ground truth. We demonstrate that training data produced by a simulation of image transformations leads to nearly the same results as the real data with correspondence ground truth. This makes it possible to apply LDP as well as other discriminant projection approaches to the problems where the correspondence ground truth is not available, such as image categorization. We perform an extensive experimental evaluation on standard data sets in the context of image matching and categorization. We demonstrate that LDP enables significant dimensionality reduction of local descriptors and performance increases in different applications. The results improve upon the state-of-the-art recognition performance with simultaneous dimensionality reduction from 128 to 30.

  • Kalal Z, Mikolajczyk K, Matas J. (2010) 'Face-TLD: Tracking-learning-detection applied to faces'. Proceedings - International Conference on Image Processing, ICIP, , pp. 3789-3792.

    Abstract

    A novel system for long-term tracking of a human face in unconstrained videos is built on Tracking-Learning-Detection (TLD) approach. The system extends TLD with the concept of a generic detector and a validator which is designed for real-time face tracking resistent to occlusions and appearance changes. The off-line trained detector localizes frontal faces and the online trained validator decides which faces correspond to the tracked subject. Several strategies for building the validator during tracking are quantitatively evaluated. The system is validated on a sitcom episode (23 min.) and a surveillance (8 min.) video. In both cases the system detectstracks the face and automatically learns a multi-view model from a single frontal example and an unlabeled video

  • Tahir MA, Yan F, Barnard M, Awais M, Mikolajczyk K, Kittler J. (2010) 'The University of Surrey visual concept detection system at ImageCLEF@ICPR: Working notes'. Lecture Notes in Computer Science: Recognising Patterns in Signals, Speech, Images and Videos, 6388, pp. 162-170.

    Abstract

    Visual concept detection is one of the most important tasks in image and video indexing. This paper describes our system in the ImageCLEF@ICPR Visual Concept Detection Task which ranked first for large-scale visual concept detection tasks in terms of Equal Error Rate (EER) and Area under Curve (AUC) and ranked third in terms of hierarchical measure. The presented approach involves state-of-the-art local descriptor computation, vector quantisation via clustering, structured scene or object representation via localised histograms of vector codes, similarity measure for kernel construction and classifier learning. The main novelty is the classifier-level and kernel-level fusion using Kernel Discriminant Analysis with RBF/Power Chi-Squared kernels obtained from various image descriptors. For 32 out of 53 individual concepts, we obtain the best performance of all 12 submissions to this task.

  • Snoek CGM, van de Sande KEA, Uijlings JRR, Gevers T, Koelma DC, Smeulders AWM, Bugalho M, Trancoso I, Yan F, Tahir MA, Mikolajczyk K, Kittler J. (2009) 'Multi-frame, multi-modal, and multi-kernel concept detection in video'. 2009 TREC Video Retrieval Evaluation Notebook Papers,
  • Snoek CGM, van de Sande KEA, Uijlings JRR, Bugalho M, Trancoso I, Yan F, Tahir MA, Mikolajczyk K, Kittler J, Gevers T, Koelma DC, Smeulders AWM. (2009) 'Learning from video browse behavior'. 2009 TREC Video Retrieval Evaluation Notebook Papers,
  • Tuytelaars T, Mikolajczyk K. (2008) 'Local invariant feature detectors: A survey'. Foundations and Trends in Computer Graphics and Vision, 3 (3), pp. 177-280.
  • Mikolajczyk K, Leibe B, Schiele B. (2006) 'Multiple object class detection with a generative model'. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1, pp. 26-33.

    Abstract

    In this paper we propose an approach capable of simultaneous recognition and localization of multiple object classes using a generative model. A novel hierarchical representation allows to represent individual images as well as various objects classes in a single, scale and rotation invariant model. The recognition method is based on a codebook representation where appearance clusters built from edge based features are shared among several object classes. A probabilistic model allows for reliable detection of various objects in the same image. The approach is highly efficient due to fast clustering and matching methods capable of dealing with millions of high dimensional features. The system shows excellent performance on several object categories over a wide range of scales, in-plane rotations, background clutter, and partial occlusions. The performance of the proposed multi-object class detection approach is competitive to state of the art approaches dedicated to a single object class recognition problem. © 2006 IEEE.

  • Mikolajczyk K, Tuytelaars T, Schmid C, Zisserman A, Matas J, Schaffalitzky F, Kadir T, van Gool L. (2005) 'A comparison of affine region detectors'. INTERNATIONAL JOURNAL OF COMPUTER VISION, 65 (1-2), pp. 43-72.

Conference papers

  • Gaur A, Mikolajczyk K. (2014) 'Ranking images based on aesthetic qualities'. Proceedings - International Conference on Pattern Recognition, , pp. 3410-3415.

    Abstract

    © 2014 IEEE.We propose a novel approach for learning image representation based on qualitative assessments of visual aesthetics. It relies on a multi-node multi-state model that represents image attributes and their relations. The model is learnt from pair wise image preferences provided by annotators. To demonstrate the effectiveness we apply our approach to fashion image rating, i.e., comparative assessment of aesthetic qualities. Bag-of-features object recognition is used for the classification of visual attributes such as clothing and body shape in an image. The attributes and their relations are then assigned learnt potentials which are used to rate the images. Evaluation of the representation model has demonstrated a high performance rate in ranking fashion images.

  • Schubert F, Mikolajczyk K. (2014) 'Robust registration and filtering for moving object detection in aerial videos'. Proceedings - International Conference on Pattern Recognition, , pp. 2808-2813.

    Abstract

    © 2014 IEEE.In this paper we present a multi-frame motion detection approach for aerial platforms with a two-folded contribution. First, we propose a novel image registration method, which can robustly cope with a large variety of aerial imagery. We show that it can benefit from a hardware accelerated implementation using graphic cards, allowing processing at high frame rate. Second, to handle the inaccuracy of the registration and sensor noise that result in false-alarms, we present an efficient filtering step to reduce incorrect motion hypotheses that arise from background substraction. We show that the proposed filtering significantly improves the precision of the motion detection while maintaining high recall. We introduce a new dataset for evaluating aerial surveillance systems, which will be made available for comparison. We evaluate the registration performance in terms of accuracy and speed as well as the filtering in terms of motion detection performance.

  • Akin O, Mikolajczyk K. (2014) 'Online learning and detection with part-based, circulant structure'. Proceedings - International Conference on Pattern Recognition, , pp. 4229-4233.

    Abstract

    © 2014 IEEE.Circulant Structure Kernel (CSK) has recently been introduced as a simple and extremely efficient tracking method. In this paper, we propose an extension of CSK that explicitly addresses partial occlusion problems which the original CSK suffers from. Our extension is based on a part-based scheme, which improves the robustness and localisation accuracy. Furthermore, we improve the robustness of CSK for long-term tracking by incorporating it into an online learning and detection framework. We provide an extensive comparison to eight recently introduced tracking methods. Our experimental results show that the proposed approach significantly improves the original CSK and provides state-of-the-art results when combined with online learning approach.

  • Yan, F , Mikolajczyk K. (2014) 'Leveraging High Level Visual Information for Matching Images and Captions'. Asian Conference on Computer Vision
  • Schubert F, Mikolajczyk K. (2013) 'Performance evaluation of image filtering for classification and retrieval'. ICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods, , pp. 485-491.

    Abstract

    Much research effort in the literature is focused on improving feature extraction methods to boost the performance in various computer vision applications. This is mostly achieved by tailoring feature extraction methods to specific tasks. For instance, for the task of object detection often new features are designed that are even more robust to natural variations of a certain object class and yet discriminative enough to achieve high precision. This focus led to a vast amount of different feature extraction methods with more or less consistent performance across different applications. Instead of fine-tuning or re-designing new features to further increase performance we want to motivate the use of image filters for pre-processing. We therefore present a performance evaluation of numerous existing image enhancement techniques which help to increase performance of already well-known feature extraction methods. We investigate the impact of such image enhancement or filtering techniques on two state-of-the-art image classification and retrieval approaches. For classification we evaluate using a standard Pascal VOC dataset. For retrieval we provide a new challenging dataset. We find that gradient-based interest-point detectors and descriptors such as SIFT or HOG can benefit from enhancement methods and lead to improved performance.

  • Miksik O, Mikolajczyk K. (2012) 'Evaluation of local detectors and descriptors for fast feature matching'. IEEE Proceedings - International Conference on Pattern Recognition, Tsukuba, Japan: 21st International Conference on Pattern Recognition, pp. 2681-2684.

    Abstract

    Local feature detectors and descriptors are widely used in many computer vision applications and various methods have been proposed during the past decade. There have been a number of evaluations focused on various aspects of local features, matching accuracy in particular, however there has been no comparisons considering the accuracy and speed trade-offs of recent extractors such as BRIEF, BRISK, ORB, MRRID, MROGH and LIOP. This paper provides a performance evaluation of recent feature detectors and compares their matching precision and speed in randomized kd-trees setup as well as an evaluation of binary descriptors with efficient computation of Hamming distance. © 2012 ICPR Org Committee.

  • . (2012) 'British Machine Vision Conference, BMVC 2012, Surrey, UK, September 3-7, 2012'. BMVA Press BMVC,
  • Koniusz P, Mikolajczyk K. (2011) 'Spatial coordinate coding to reduce histogram representations, dominant angle and colour pyramid match'. IEEE 18th IEEE International Conference on Image Processing, Brussels, Belgium: ICIP 2011, pp. 661-664.

    Abstract

    Spatial Pyramid Match lies at a heart of modern object category recognition systems. Once image descriptors are expressed as histograms of visual words, they are further deployed across spatial pyramid with coarse-to-fine spatial location grids. However, such representation results in extreme histogram vectors of 200K or more elements increasing computational and memory requirements. This paper investigates alternative ways of introducing spatial information during formation of histograms. Specifically, we propose to apply spatial location information at a descriptor level and refer to it as Spatial Coordinate Coding. Alternatively, x, y, radius, or angle is used to perform semi-coding. This is achieved by adding one of the spatial components at the descriptor level whilst applying Pyramid Match to another. Lastly, we demonstrate that Pyramid Match can be applied robustly to other measurements: Dominant Angle and Colour. We demonstrate state-of-the art results on two datasets with means of Soft Assignment and Sparse Coding.

  • Awais M, Yan F, Mikolajczyk K, Kittler J. (2011) 'Novel fusion methods for pattern recognition'. Springer Lecture Notes in Computer Science: Proceedings of Machine Learning and Knowledge Discovery in Databases (Part 1), Athens, Greece: ECML PKDD 2011: Machine Learning and Knowledge Discovery in Databases 6911 (PART 1), pp. 140-155.

    Abstract

    Over the last few years, several approaches have been proposed for information fusion including different variants of classifier level fusion (ensemble methods), stacking and multiple kernel learning (MKL). MKL has become a preferred choice for information fusion in object recognition. However, in the case of highly discriminative and complementary feature channels, it does not significantly improve upon its trivial baseline which averages the kernels. Alternative ways are stacking and classifier level fusion (CLF) which rely on a two phase approach. There is a significant amount of work on linear programming formulations of ensemble methods particularly in the case of binary classification. In this paper we propose a multiclass extension of binary ν-LPBoost, which learns the contribution of each class in each feature channel. The existing approaches of classifier fusion promote sparse features combinations, due to regularization based on ℓ1-norm, and lead to a selection of a subset of feature channels, which is not good in the case of informative channels. Therefore, we generalize existing classifier fusion formulations to arbitrary ℓ p -norm for binary and multiclass problems which results in more effective use of complementary information. We also extended stacking for both binary and multiclass datasets. We present an extensive evaluation of the fusion methods on four datasets involving kernels that are all informative and achieve state-of-the-art results on all of them.

  • Awais M, Yan F, Mikolajczyk K, Kittler J. (2011) 'Augmented Kernel Matrix vs Classifier Fusion for Object Recognition'. BMVA Press Proceedings of the British Machine Vision Conference, Dundee: 22nd British Machine Vision Conference, pp. 60.1-60.11.
  • Awais M, Yan F, Mikolajczyk K, Kittler J. (2011) 'Two-stage augmented kernel matrix for object recognition'. Springer Lecture Notes in Computer Science: Multiple Classifier Systems, Naples, Italy: MCS 2011: 10th International Workshop on Multiple Classifier Systems 6713, pp. 137-146.

    Abstract

    Multiple Kernel Learning (MKL) has become a preferred choice for information fusion in image recognition problem. Aim of MKL is to learn optimal combination of kernels formed from different features, thus, to learn importance of different feature spaces for classification. Augmented Kernel Matrix (AKM) has recently been proposed to accommodate for the fact that a single training example may have different importance in different feature spaces, in contrast to MKL that assigns same weight to all examples in one feature space. However, AKM approach is limited to small datasets due to its memory requirements. We propose a novel two stage technique to make AKM applicable to large data problems. In first stage various kernels are combined into different groups automatically using kernel alignment. Next, most influential training examples are identified within each group and used to construct an AKM of significantly reduced size. This reduced size AKM leads to same results as the original AKM. We demonstrate that proposed two stage approach is memory efficient and leads to better performance than original AKM and is robust to noise. Results are compared with other state-of-the art MKL techniques, and show improvement on challenging object recognition benchmarks.

  • De Campos T, Barnard M, Mikolajczyk K, Kittler J, Yan F, Christmas W, Windridge D. (2011) 'An evaluation of bags-of-words and spatio-temporal shapes for action recognition'. 2011 IEEE Workshop on Applications of Computer Vision, WACV 2011, , pp. 344-351.

    Abstract

    Bags-of-visual-Words (BoW) and Spatio-Temporal Shapes (STS) are two very popular approaches for action recognition from video. The former (BoW) is an un-structured global representation of videos which is built using a large set of local features. The latter (STS) uses a single feature located on a region of interest (where the actor is) in the video. Despite the popularity of these methods, no comparison between them has been done. Also, given that BoW and STS differ intrinsically in terms of context inclusion and globality/locality of operation, an appropriate evaluation framework has to be designed carefully. This paper compares these two approaches using four different datasets with varied degree of space-time specificity of the actions and varied relevance of the contextual background. We use the same local feature extraction method and the same classifier for both approaches. Further to BoW and STS, we also evaluated novel variations of BoW constrained in time or space. We observe that the STS approach leads to better results in all datasets whose background is of little relevance to action classification. © 2010 IEEE.

  • Koniusz P, Mikolajczyk K. (2011) 'Soft assignment of visual words as linear coordinate coding and optimisation of its reconstruction error'. IEEE 18th IEEE International Conference on Image Processing, Brussels, Belgium: ICIP 2011, pp. 2413-2416.

    Abstract

    Visual Word Uncertainty also referred to as Soft Assignment is a well established technique for representing images as histograms by flexible assignment of image descriptors to a visual vocabulary. Recently, an attention of the community dealing with the object category recognition has been drawn to Linear Coordinate Coding methods. In this work, we focus on Soft Assignment as it yields good results amidst competitive methods. We show that one can take two views on Soft Assignment: an approach derived from Gaussian Mixture Model or special case of Linear Coordinate Coding. The latter view helps us propose how to optimise smoothing factor of Soft Assignment in a way that minimises descriptor reconstruction error and maximises classification performance. In turns, this renders tedious cross-validation towards establishing this parameter unnecessary and yields it a handy technique. We demonstrate state-of-the-art performance of such optimised assignment on two image datasets and several types of descriptors.

  • Yan F, Mikolajczyk K, Kittler J. (2011) 'Multiple Kernel Learning via Distance Metric Learning for Interactive Image Retrieval'. SPRINGER-VERLAG BERLIN MULTIPLE CLASSIFIER SYSTEMS, Naples, ITALY: 10th International Workshop on Multiple Classifier Systems 6713, pp. 147-156.
  • Yan F, Kittler J, Mikolajczyk K. (2010) 'Multiple Kernel Learning and Feature Space Denoising'. IEEE International Conference on Machine Learning and Cybernetics, Quingdao: 2010 International Conference on Machine Learning and Cybernetics (ICMLC) 4, pp. 1771-1776.

    Abstract

    We review a multiple kernel learning (MKL) technique called ℓp regularised multiple kernel Fisher discriminant analysis (MK-FDA), and investigate the effect of feature space denoising on MKL. Experiments show that with both the original kernels or denoised kernels, ℓp MK-FDA outperforms its fixed-norm counterparts. Experiments also show that feature space denoising boosts the performance of both single kernel FDA and ℓp MK-FDA, and that there is a positive correlation between the learnt kernel weights and the amount of variance kept by feature space denoising. Based on these observations, we argue that in the case where the base feature spaces are noisy, linear combination of kernels cannot be optimal. An MKL objective function which can take care of feature space denoising automatically, and which can learn a truly optimal (non-linear) combination of the base kernels, is yet to be found.

  • Cai H, Yan F, Mikolajczyk K. (2010) 'Learning Weights for Codebook in Image Classification and Retrieval'. IEEE IEEE Conference on Computer Vision and Pattern Recognition, pp. 2320-2327.

    Abstract

    This paper presents a codebook learning approach for image classification and retrieval. It corresponds to learning a weighted similarity metric to satisfy that the weighted similarity between the same labeled images is larger than that between the differently labeled images with largest margin. We formulate the learning problem as a convex quadratic programming and adopt alternating optimization to solve it efficiently. Experiments on both synthetic and real datasets validate the approach. The codebook learning improves the performance, in particular in the case where the number of training examples is not sufficient for large size codebook.

  • Kalal Z, Matas J, Mikolajczyk K. (2010) 'P-N Learning: Bootstrapping Binary Classifiers by Structural Constraints'. IEEE COMPUTER SOC 2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), San Francisco, CA: 23rd IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 49-56.
  • Koniusz P, Mikolajczyk K. (2010) 'On a quest for image descriptors based on unsupervised segmentation maps'. Proceedings of 20th International Conference on Pattern Recognition, Istanbul, Turkey: 2010 20th ICPR, pp. 762-765.

    Abstract

    This paper investigates segmentation-based image descriptors for object category recognition. In contrast to commonly used interest points the proposed descriptors are extracted from pairs of adjacent regions given by a segmentation method. In this way we exploit semi-local structural information from the image. We propose to use the segments as spatial bins for descriptors of various image statistics based on gradient, colour and region shape. Proposed descriptors are validated on standard recognition benchmarks. Results show they outperform state-of-the-art reference descriptors with 5.6x less data and achieve comparable results to them with 8.6x less data. The proposed descriptors are complementary to SIFT and achieve state-of-the-art results when combined together within a kernel based classifier.

  • Kalal Z, Mikolajczyk K, Matas J. (2010) 'Forward-backward error: Automatic detection of tracking failures'. Proceedings of 20th International Conference on Pattern Recognition, Istanbul, Turkey: 2010 20th ICPR, pp. 2756-2759.

    Abstract

    This paper proposes a novel method for tracking failure detection. The detection is based on the Forward-Backward error, i.e. the tracking is performed forward and backward in time and the discrepancies between these two trajectories are measured. We demonstrate that the proposed error enables reliable detection of tracking failures and selection of reliable trajectories in video sequences. We demonstrate that the approach is complementary to commonly used normalized cross-correlation (NCC). Based on the error, we propose a novel object tracker called Median Flow. State-of-the-art performance is achieved on challenging benchmark video sequences which include non-rigid objects.

  • Tahir MA, Kittler J, Mikolajczyk K, Yan F. (2010) 'Improving Multilabel Classification Performance by Using Ensemble of Multi-label Classifiers'. SPRINGER-VERLAG BERLIN MULTIPLE CLASSIFIER SYSTEMS, PROCEEDINGS, Cairo, EGYPT: 9th International Workshop on Multiple Classifier Systems 5997, pp. 11-21.
  • Tahir A, Yan F, Barnard M, Awais M, Mikolajczyk K, Kittler J. (2010) 'The University of Surrey Visual Concept Detection System at ImageCLEF 2010: Working Notes'. Springer Lecture Notes in Computer Science: Recognizing Patterns in Signals, Speech, Images and Videos: Contest Reports, Istanbul, Turkey: ICPR 2010

    Abstract

    Visual concept detection is one of the most important tasks in image and video indexing. This paper describes our system in the ImageCLEF@ICPR Visual Concept Detection Task which ranked first for large-scale visual concept detection tasks in terms of Equal Error Rate (EER) and Area under Curve (AUC) and ranked third in terms of hierarchical measure. The presented approach involves state-of-the-art local descriptor computation, vector quantisation via clustering, structured scene or object representation via localised histograms of vector codes, similarity measure for kernel construction and classifier learning. The main novelty is the classifier-level and kernel-level fusion using Kernel Discriminant Analysis with RBF/Power Chi-Squared kernels obtained from various image descriptors. For 32 out of 53 individual concepts, we obtain the best performance of all 12 submissions to this task.

  • Awais M, Mikolajczyk K. (2010) 'Feature pairs connected by lines for object recognition'. Proceedings of 20th International Conference on Pattern Recognition, Istanbul, Turkey: 2010 20th ICPR, pp. 3093-3096.

    Abstract

    In this paper we exploit image edges and segmentation maps to build features for object category recognition. We build a parametric line based image approximation to identify the dominant edge structures. Line ends are used as features described by histograms of gradient orientations. We then form descriptors based on connected line ends to incorporate weak topological constraints which improve their discriminative power. Using point pairs connected by an edge assures higher repeatability than a random pair of points or edges. The results are compared with state-of-the-art, and show significant improvement on challenging recognition benchmark Pascal VOC 2007. Kernel based fusion is performed to emphasize the complementary nature of our descriptors with respect to the state-of-the-art features.

  • Yan F, Mikolajczyk K, Barnard M, Cai H, Kittler J. (2010) 'Lp Norm Multiple Kernel Fisher Discriminant Analysis for Object and Image Categorisation'. IEEE Conference on Computer Vision and Pattern Recognition
  • Yan F, Mikolajczyk K, Kittler J, Tahir MA. (2010) 'Combining Multiple Kernels by Augmenting the Kernel Matrix'. SPRINGER-VERLAG BERLIN MULTIPLE CLASSIFIER SYSTEMS, PROCEEDINGS, Cairo, EGYPT: 9th International Workshop on Multiple Classifier Systems 5997, pp. 175-184.
  • Yan F, Kittler J, Mikolajczyk K, Tahir A. (2009) 'Non-Sparse Multiple Kernel Learning for Fisher Discriminant Analysis'. Proceedings of The Ninth IEEE International Conference on Data Mining, Miami, USA: ICDM '09, pp. 1064-1069.

    Abstract

    We consider the problem of learning a linear combination of pre-specified kernel matrices in the Fisher discriminant analysis setting. Existing methods for such a task impose an ¿1 norm regularisation on the kernel weights, which produces sparse solution but may lead to loss of information. In this paper, we propose to use ¿2 norm regularisation instead. The resulting learning problem is formulated as a semi-infinite program and can be solved efficiently. Through experiments on both synthetic data and a very challenging object recognition benchmark, the relative advantages of the proposed method and its ¿1 counterpart are demonstrated, and insights are gained as to how the choice of regularisation norm should be made.

  • Schubert F, Schertler K, Mikolajczyk K. (2009) 'A hands-on approach to high-dynamic-range and superresolution fusion'. Proceedings of the Ninth IEEE Computer Society Workshop on Application of Computer Vision, Snowbird, USA: 2009 9th IEEE WAVC

    Abstract

    This paper discusses a new framework to enhance image and video quality. Recent advances in high-dynamic-range image fusion and superresolution make it possible to extend the intensity range or to increase the resolution of the image beyond the limitations of the sensor. In this paper, we propose a new way to combine both of these fusion methods in a two-stage scheme. To achieve robust image enhancement in practical application scenarios, we adapt state-of-the-art methods for automatic photometric camera calibration, controlled image acquisition, image fusion and tone mapping. With respect to high-dynamic-range reconstruction, we show that only two input images can sufficiently capture the dynamic range of the scene. The usefulness and performance of this system is demonstrated on images taken with various types of cameras.

  • Kalal Z, Matas J, Mikolajczyk K. (2009) 'Online learning of robust object detectors during unstable tracking'. 2009 IEEE 12th International Conference on Computer Vision Workshops, Kyoto, Japan: 12th ICCV Worksshops, pp. 1417-1424.

    Abstract

    This work investigates the problem of robust, longterm visual tracking of unknown objects in unconstrained environments. It therefore must cope with frame-cuts, fast camera movements and partial/total object occlusions/dissapearances. We propose a new approach, called Tracking-Modeling-Detection (TMD) that closely integrates adaptive tracking with online learning of the object-specific detector. Starting from a single click in the first frame, TMD tracks the selected object by an adaptive tracker. The trajectory is observed by two processes (growing and pruning event) that robustly model the appearance and build an object detector on the fly. Both events make errors, the stability of the system is achieved by their cancellation. The learnt detector enables re-initialization of the tracker whenever previously observed appearance reoccurs. We show the real-time learning and classification is achievable with random forests. The performance and the long-term stability of TMD is demonstrated and evaluated on a set of challenging video sequences with various objects such as cars, people and animals.

  • Yan F, Mikolajczyk K, Kittler J, Tahir M. (2009) 'A Comparison of l(1) Norm and l(2) Norm Multiple Kernel SVMs in Image and Video Classification'. IEEE CBMI: 2009 INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING, Chania, GREECE: International Workshop on Content-Based Multimedia Indexing, pp. 7-12.
  • Tahir MA, Kittler J, Yan F, Mikolajczyk K. (2009) 'Kernel Discriminant Analysis using Triangular Kernel for Semantic Scene Classification'. IEEE CBMI: 2009 INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING, Chania, GREECE: International Workshop on Content-Based Multimedia Indexing, pp. 1-6.
  • Tahir MA, Kittler J, Mikolajczyk K, Yan F, Van De Sande KEA, Gevers T. (2009) 'Visual category recognition using spectral regression and kernel discriminant analysis'. 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009, , pp. 178-185.

    Abstract

    Visual category recognition (VCR) is one of the most important tasks in image and video indexing. Spectral methods have recently emerged as a powerful tool for dimensionality reduction and manifold learning. Recently, Spectral Regression combined with Kernel Discriminant Analysis (SR-KDA) has been successful in many classification problems. In this paper, we adopt this solution to VCR and demonstrate its advantages over existing methods both in terms of speed and accuracy. The distinctiveness of this method is assessed experimentally using an image and a video benchmark: the PASCAL VOC Challenge 08 and the Mediamill Challenge. From the experimental results, it can be derived that SR-KDA consistently yields significant performance gains when compared with the state-of-the art methods. The other strong point of using SR-KDA is that the time complexity scales linearly with respect to the number of concepts and the main computational complexity is independent of the number of categories. ©2009 IEEE.

  • Snoek C, Sande K, Rooij O, Huurnink B, Uijlings J, Liempt M, Bugalhoy M, Trancosoy I, Yan F, Tahir M, Mikolajczyk K, Kittler J, Rijke M, Geusebroek J, Gevers T, Worring M, Koelma D, Smeulders A. (2009) 'The MediaMill TRECVID 2009 Semantic Video Search Engine'. TRECVID Workshop,

    Abstract

    In this paper we describe our TRECVID 2009 video re- trieval experiments. The MediaMill team participated in three tasks: concept detection, automatic search, and in- teractive search. The starting point for the MediaMill con- cept detection approach is our top-performing bag-of-words system of last year, which uses multiple color descriptors, codebooks with soft-assignment, and kernel-based supervised learning. We improve upon this baseline system by explor- ing two novel research directions. Firstly, we study a multi- modal extension by including 20 audio concepts and fusion using two novel multi-kernel supervised learning methods. Secondly, with the help of recently proposed algorithmic re- nements of bag-of-word representations, a GPU implemen- tation, and compute clusters, we scale-up the amount of vi- sual information analyzed by an order of magnitude, to a total of 1,000,000 i-frames. Our experiments evaluate the merit of these new components, ultimately leading to 64 ro- bust concept detectors for video retrieval. For retrieval, a robust but limited set of concept detectors justi es the need to rely on as many auxiliary information channels as pos- sible. For automatic search we therefore explore how we can learn to rank various information channels simultane- ously to maximize video search results for a given topic. To further improve the video retrieval results, our interactive search experiments investigate the roles of visualizing pre- view results for a certain browse-dimension and relevance feedback mechanisms that learn to solve complex search top- ics by analysis from user browsing behavior. The 2009 edi- tion of the TRECVID benchmark has again been a fruitful participation for the MediaMill team, resulting in the top ranking for both concept detection and interactive search. Again a lot has been learned during this year's TRECVID campaign; we highlight the most important lessons at the end of this paper.

  • Tahir A, Kittler J, Yan F, Mikolajczyk K. (2009) 'Concept Learning for Image and Video Retrieval: the Inverse Random Under Sampling Approach'. European Signal Processing Conference, Glasgow: 17th European Signal Processing Conference (EUSIPCO 2009), pp. 574-578.
  • Tahir MA, Kittler J, Mikolajczyk K, Yan F. (2009) 'A Multiple Expert Approach to the Class Imbalance Problem Using Inverse Random under Sampling'. SPRINGER-VERLAG BERLIN MULTIPLE CLASSIFIER SYSTEMS, PROCEEDINGS, Univ Iceland, Reykjavik, ICELAND: 8th International Workshop on Multiple Classifier Systems 5519, pp. 82-91.
  • Koniusz P, Mikolajczyk K. (2009) 'Segmentation Based Interest Points and Evaluation of Unsupervised Image Segmentation Methods.'. British Machine Vision Association BMVC, , pp. 1-11.
  • Snoek C, Sande K, Rooij O, Huurnink B, Gemert J, Uijlings J, He J, Li X, Everts I, Nedovic V, Liempt M, Balen R, Yan F, Tahir M, Mikolajczyk K, Kittler J, Rijke M, Geusebroek J, Gevers T, Worring M, Smeulders A, Koelma D. (2008) 'The MediaMill TRECVID 2008 Semantic Video Search Engine'. TRECVID Workshop, TRECVID Workshop

    Abstract

    In this paper we describe our TRECVID 2008 video retrieval experiments. The MediaMill team participated in three tasks: concept detection, automatic search, and interac- tive search. Rather than continuing to increase the number of concept detectors available for retrieval, our TRECVID 2008 experiments focus on increasing the robustness of a small set of detectors using a bag-of-words approach. To that end, our concept detection experiments emphasize in particular the role of visual sampling, the value of color in- variant features, the influence of codebook construction, and the effectiveness of kernel-based learning parameters. For retrieval, a robust but limited set of concept detectors ne- cessitates the need to rely on as many auxiliary information channels as possible. Therefore, our automatic search ex- periments focus on predicting which information channel to trust given a certain topic, leading to a novel framework for predictive video retrieval. To improve the video retrieval re- sults further, our interactive search experiments investigate the roles of visualizing preview results for a certain browse- dimension and active learning mechanisms that learn to solve complex search topics by analysis from user brows- ing behavior. The 2008 edition of the TRECVID bench- mark has been the most successful MediaMill participation to date, resulting in the top ranking for both concept de- tection and interactive search, and a runner-up ranking for automatic retrieval. Again a lot has been learned during this year’s TRECVID campaign; we highlight the most im- portant lessons at the end of this paper.

  • Mikolajczyk K, Uemura H. (2008) 'Action recognition with motion-appearance vocabulary forest'. IEEE 2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, Anchorage, AK: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2229-2236.
  • Schubert F, Mikolajczyk K. (2008) 'Combining High-Resolution Images With Low-Quality Videos.'. British Machine Vision Association BMVC, , pp. 1-10.
  • Uemura H, Ishikawa S, Mikolajczyk K. (2008) 'Feature Tracking and Motion Compensation for Action Recognition.'. British Machine Vision Association BMVC, , pp. 1-10.
  • Cai H, Mikolajczyk K, Matas J. (2008) 'Learning linear discriminant projections for dimensionality reduction of image descriptors'. BMVC 2008 - Proceedings of the British Machine Vision Conference 2008, Leeds, UK: Proceedings of the British Machine Vision Conference, pp. 51.5-51.10.

    Abstract

    This paper proposes a general method for improving image descriptors using discriminant projections. Two methods based on Linear Discriminant Analysis have been recently introduced in [3, 11] to improve matching performance of local descriptors and to reduce their dimensionality. These methods require large training set with ground truth of accurate point-to-point correspondences which limits their applicability. We demonstrate the theoretical equivalence of these methods and provide a means to derive projection vectors on data without available ground truth. It makes it possible to apply this technique and improve performance of any combination of interest point detectors-descriptors. We conduct an extensive evaluation of the discriminative projection methods in various application scenarios. The results validate the proposed method in viewpoint invariant matching and category recognition.

  • Kalal Z, Matas J, Mikolajczyk K. (2008) 'Weighted Sampling for Large-Scale Boosting.'. British Machine Vision Association BMVC, , pp. 1-10.
  • Mikolajczyk K, Matas J. (2007) 'Improving descriptors for fast tree matching by optimal linear projection'. IEEE 2007 IEEE 11TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS 1-6, Rio de Janeiro, BRAZIL: 11th IEEE International Conference on Computer Vision, pp. 337-344.
  • Leibe B, Mikolajczyk K, Schiele B. (2006) 'Efficient Clustering and Matching for Object Class Recognition.'. British Machine Vision Association BMVC, , pp. 789-798.
  • Leibe B, Mikolajczyk K, Schiele B. (2006) 'Segmentation Based Multi-Cue Integration for Object Detection.'. British Machine Vision Association BMVC, , pp. 1169-1178.

Book chapters

  • Mikolajczyk K, Tuytelaars T. (2009) 'Local Image Features.'. in Li SZ, Jain AK (eds.) Encyclopedia of Biometrics Springer US , pp. 939-943.

Other Web pages

Page Owner: ees1km
Page Created: Thursday 16 September 2010 14:40:05 by lb0014
Last Modified: Thursday 26 May 2016 15:02:07 by ak0039
Expiry Date: Friday 16 December 2011 14:20:24
Assembly date: Thu Mar 23 09:26:52 GMT 2017
Content ID: 37282
Revision: 10
Community: 1379

Rhythmyx folder: //Sites/surrey.ac.uk/CVSSP/people
Content type: rx:StaffProfile