My publications

Publications

Z Kalal, K Mikolajczyk, J Matas (2010)Forward-backward error: Automatic detection of tracking failures, In: Proceedings of 20th International Conference on Pattern Recognitionpp. 2756-2759

This paper proposes a novel method for tracking failure detection. The detection is based on the Forward-Backward error, i.e. the tracking is performed forward and backward in time and the discrepancies between these two trajectories are measured. We demonstrate that the proposed error enables reliable detection of tracking failures and selection of reliable trajectories in video sequences. We demonstrate that the approach is complementary to commonly used normalized cross-correlation (NCC). Based on the error, we propose a novel object tracker called Median Flow. State-of-the-art performance is achieved on challenging benchmark video sequences which include non-rigid objects.

K Mikolajczyk, J Matas (2007)Improving descriptors for fast tree matching by optimal linear projection, In: 2007 IEEE 11TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS 1-6pp. 337-344
Z Kalal, J Matas, K Mikolajczyk (2008)Weighted Sampling for Large-Scale Boosting., In: BMVCpp. 1-10
H Cai, K Mikolajczyk, J Matas (2008)Learning linear discriminant projections for dimensionality reduction of image descriptors, In: BMVC 2008 - Proceedings of the British Machine Vision Conference 2008pp. 51.5-51.10

This paper proposes a general method for improving image descriptors using discriminant projections. Two methods based on Linear Discriminant Analysis have been recently introduced in [3, 11] to improve matching performance of local descriptors and to reduce their dimensionality. These methods require large training set with ground truth of accurate point-to-point correspondences which limits their applicability. We demonstrate the theoretical equivalence of these methods and provide a means to derive projection vectors on data without available ground truth. It makes it possible to apply this technique and improve performance of any combination of interest point detectors-descriptors. We conduct an extensive evaluation of the discriminative projection methods in various application scenarios. The results validate the proposed method in viewpoint invariant matching and category recognition.

M Kristan, R Pflugfelder, A Leonardis, J Matas, F Porikli, L Cehovin, G Nebehay, G Fernandez, T Vojir, A Gatt, A Khajenezhad, A Salahledin, A Soltani-Farani, A Zarezade, A Petrosino, A Milton, B Bozorgtabar, B Li, CS Chan, C Heng, D Ward, D Kearney, D Monekosso, HC Karaimer, HR Rabiee, J Zhu, J Gao, J Xiao, J Zhang, J Xing, K Huang, K Lebeda, L Cao, ME Maresca, MK Lim, M ELHelw, M Felsberg, P Remagnino, R Bowden, R Goecke, R Stolkin, SY Lim, S Maher, S Poullot, S Wong, S Satoh, W Chen, W Hu, X Zhang, Y Li, Z Niu (2013)The Visual Object Tracking VOT2013 challenge results, In: 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW)pp. 98-111 IEEE

Visual tracking has attracted a significant attention in the last few decades. The recent surge in the number of publications on tracking-related problems have made it almost impossible to follow the developments in the field. One of the reasons is that there is a lack of commonly accepted annotated data-sets and standardized evaluation protocols that would allow objective comparison of different tracking methods. To address this issue, the Visual Object Tracking (VOT) workshop was organized in conjunction with ICCV2013. Researchers from academia as well as industry were invited to participate in the first VOT2013 challenge which aimed at single-object visual trackers that do not apply pre-learned models of object appearance (model-free). Presented here is the VOT2013 benchmark dataset for evaluation of single-object visual trackers as well as the results obtained by the trackers competing in the challenge. In contrast to related attempts in tracker benchmarking, the dataset is labeled per-frame by visual attributes that indicate occlusion, illumination change, motion change, size change and camera motion, offering a more systematic comparison of the trackers. Furthermore, we have designed an automated system for performing and evaluating the experiments. We present the evaluation protocol of the VOT2013 challenge and the results of a comparison of 27 trackers on the benchmark dataset. The dataset, the evaluation tools and the tracker rankings are publicly available from the challenge website (http://votchallenge. net)

K Lebeda, J Matas, R Bowden (2013)Tracking the untrackable: How to track when your object is featureless, In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)7729 L(PART 2)pp. 347-359 Springer Berlin Heidelberg

We propose a novel approach to tracking objects by low-level line correspondences. In our implementation we show that this approach is usable even when tracking objects with lack of texture, exploiting situations, when feature-based trackers fails due to the aperture problem. Furthermore, we suggest an approach to failure detection and recovery to maintain long-term stability. This is achieved by remembering configurations which lead to good pose estimations and using them later for tracking corrections. We carried out experiments on several sequences of different types. The proposed tracker proves itself as competitive or superior to state-of-the-art trackers in both standard and low-textured scenes. © 2013 Springer-Verlag.

L Ellis, J Matas, R Bowden (2008)Online Learning and Partitioning of Linear Displacement Predictors for Tracking, In: Proceedings of the British Machine Vision Conferencepp. 33-42

A novel approach to learning and tracking arbitrary image features is presented. Tracking is tackled by learning the mapping from image intensity differences to displacements. Linear regression is used, resulting in low computational cost. An appearance model of the target is built on-the-fly by clustering sub-sampled image templates. The medoidshift algorithm is used to cluster the templates thus identifying various modes or aspects of the target appearance, each mode is associated to the most suitable set of linear predictors allowing piecewise linear regression from image intensity differences to warp updates. Despite no hard-coding or offline learning, excellent results are shown on three publicly available video sequences and comparisons with related approaches made.

Karel Lebeda, Jiri Matas, Richard Bowden (2013)Tracking the Untrackable: How to Track When Your Object Is Featureless, In: Lecture Notes in Computer Science7729pp. 343-355 Springer

We propose a novel approach to tracking objects by low-level line correspondences. In our implementation we show that this approach is usable even when tracking objects with lack of texture, exploiting situations, when feature-based trackers fails due to the aperture problem. Furthermore, we suggest an approach to failure detection and recovery to maintain long-term stability. This is achieved by remembering configurations which lead to good pose estimations and using them later for tracking corrections. We carried out experiments on several sequences of different types. The proposed tracker proves itself as competitive or superior to state-of-the-art trackers in both standard and low-textured scenes.

M Kristan, J Matas, A Leonardis, M Felsberg, L Cehovin, GF Fernandez, T Vojır, G Hager, G Nebehay, R Pflugfelder, A Gupta, A Bibi, A Lukezic, A Garcia-Martin, A Petrosino, A Saffari, AS Montero, A Varfolomieiev, A Baskurt, B Zhao, B Ghanem, B Martinez, B Lee, B Han, C Wang, C Garcia, C Zhang, C Schmid, D Tao, D Kim, D Huang, D Prokhorov, D Du, D-Y Yeung, E Ribeiro, FS Khan, F Porikli, F Bunyak, G Zhu, G Seetharaman, H Kieritz, HT Yau, H Li, H Qi, H Bischof, H Possegger, H Lee, H Nam, I Bogun, J-C Jeong, J-I Cho, J-Y Lee, J Zhu, J Shi, J Li, J Jia, J Feng, J Gao, JY Choi, J Kim, J Lang, JM Martinez, J Choi, J Xing, K Xue, K Palaniappan, K Lebeda, K Alahari, K Gao, K Yun, KH Wong, L Luo, L Ma, L Ke, L Wen, L Bertinetto, M Pootschi, M Maresca, M Danelljan, M Wen, M Zhang, M Arens, M Valstar, M Tang, M-C Chang, MH Khan, N Fan, N Wang, O Miksik, P Torr, Q Wang, R Martin-Nieto, R Pelapur, Richard Bowden, R Laganière, S Moujtahid, S Hare, Simon Hadfield, S Lyu, S Li, S-C Zhu, S Becker, S Duffner, SL Hicks, S Golodetz, S Choi, T Wu, T Mauthner, T Pridmore, W Hu, W Hübner, X Wang, X Li, X Shi, X Zhao, X Mei, Y Shizeng, Y Hua, Y Li, Y Lu, Y Li, Z Chen, Z Huang, Z Chen, Z Zhang, Z He, Z Hong (2015)The Visual Object Tracking VOT2015 challenge results, In: ICCV workshop on Visual Object Tracking Challengepp. 564-586

The Visual Object Tracking challenge 2015, VOT2015, aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 62 trackers are presented. The number of tested trackers makes VOT 2015 the largest benchmark on short-term tracking to date. For each participating tracker, a short description is provided in the appendix. Features of the VOT2015 challenge that go beyond its VOT2014 predecessor are: (i) a new VOT2015 dataset twice as large as in VOT2014 with full annotation of targets by rotated bounding boxes and per-frame attribute, (ii) extensions of the VOT2014 evaluation methodology by introduction of a new performance measure. The dataset, the evaluation kit as well as the results are publicly available at the challenge website.

L Ellis, N Dowson, J Matas, R Bowden (2007)Linear Predictors for Fast Simultaneous Modeling and Tracking, In: 2007 IEEE 11th International Conference on Computer Visionpp. 1-8 IEEE

An approach for fast tracking of arbitrary image features with no prior model and no offline learning stage is presented. Fast tracking is achieved using banks of linear displacement predictors learnt online. A multi-modal appearance model is also learnt on-the-fly that facilitates the selection of subsets of predictors suitable for prediction in the next frame. The approach is demonstrated in real-time on a number of challenging video sequences and experimentally compared to other simultaneous modeling and tracking approaches with favourable results.

Matej Kristan, Roman P Pflugfelder, Ales Leonardis, Jiri Matas, Luka Cehovin, Georg Nebehay, Tomas Vojir, Gustavo Fernandez, Alan Lukezi, Aleksandar Dimitriev, Alfredo Petrosino, Amir Saffari, Bo Li, Bohyung Han, CherKeng Heng, Christophe Garcia, Dominik Pangersic, Gustav Häger, Fahad Shahbaz Khan, Franci Oven, Horst Possegger, Horst Bischof, Hyeonseob Nam, Jianke Zhu, JiJia Li, Jin Young Choi, Jin-Woo Choi, Joao F Henriques, Joost van de Weijer, Jorge Batista, Karel Lebeda, Kristoffer Ofjall, Kwang Moo Yi, Lei Qin, Longyin Wen, Mario Edoardo Maresca, Martin Danelljan, Michael Felsberg, Ming-Ming Cheng, Philip Torr, Qingming Huang, Richard Bowden, Sam Hare, Samantha YueYing Lim, Seunghoon Hong, Shengcai Liao, Simon Hadfield, Stan Z Li, Stefan Duffner, Stuart Golodetz, Thomas Mauthner, Vibhav Vineet, Weiyao Lin, Yang Li, Yuankai Qi, Zhen Lei, ZhiHeng Niu (2015)The Visual Object Tracking VOT2014 Challenge Results, In: COMPUTER VISION - ECCV 2014 WORKSHOPS, PT II8926pp. 191-217

The Visual Object Tracking challenge 2014, VOT2014, aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 38 trackers are presented. The number of tested trackers makes VOT 2014 the largest benchmark on short-term tracking to date. For each participating tracker, a short description is provided in the appendix. Features of the VOT2014 challenge that go beyond its VOT2013 predecessor are introduced: (i) a new VOT2014 dataset with full annotation of targets by rotated bounding boxes and per-frame attribute, (ii) extensions of the VOT2013 evaluation methodology, (iii) a new unit for tracking speed assessment less dependent on the hardware and (iv) the VOT2014 evaluation toolkit that significantly speeds up execution of experiments. The dataset, the evaluation kit as well as the results are publicly available at the challenge website (http://​votchallenge.​net).

Z Kalal, J Matas, K Mikolajczyk (2012)Tracking-Learning-Detection, In: IEEE Transactions on Pattern Analysis and Machine Intelligence34(7)pp. 1409-1422 IEEE

This paper investigates long-term tracking of unknown objects in a video stream. The object is defined by its location and extent in a single frame. In every frame that follows, the task is to determine the object's location and extent or indicate that the object is not present. We propose a novel tracking framework (TLD) that explicitly decomposes the long-term tracking task into tracking, learning and detection. The tracker follows the object from frame to frame. The detector localizes all appearances that have been observed so far and corrects the tracker if necessary. The learning estimates detector's errors and updates it to avoid these errors in the future. We study how to identify detector's errors and learn from them. We develop a novel learning method (P-N learning) which estimates the errors by a pair of "experts'': (i) P-expert estimates missed detections, and (ii) N-expert estimates false alarms. The learning process is modeled as a discrete dynamical system and the conditions under which the learning guarantees improvement are found. We describe our real-time implementation of the TLD framework and the P-N learning. We carry out an extensive quantitative evaluation which shows a significant improvement over state-of-the-art approaches.

Z Kalal, J Matas, K Mikolajczyk (2009)Online learning of robust object detectors during unstable tracking, In: 2009 IEEE 12th International Conference on Computer Vision Workshopspp. 1417-1424

This work investigates the problem of robust, longterm visual tracking of unknown objects in unconstrained environments. It therefore must cope with frame-cuts, fast camera movements and partial/total object occlusions/dissapearances. We propose a new approach, called Tracking-Modeling-Detection (TMD) that closely integrates adaptive tracking with online learning of the object-specific detector. Starting from a single click in the first frame, TMD tracks the selected object by an adaptive tracker. The trajectory is observed by two processes (growing and pruning event) that robustly model the appearance and build an object detector on the fly. Both events make errors, the stability of the system is achieved by their cancellation. The learnt detector enables re-initialization of the tracker whenever previously observed appearance reoccurs. We show the real-time learning and classification is achievable with random forests. The performance and the long-term stability of TMD is demonstrated and evaluated on a set of challenging video sequences with various objects such as cars, people and animals.

K Mikolajczyk, T Tuytelaars, C Schmid, A Zisserman, J Matas, F Schaffalitzky, T Kadir, L van Gool (2005)A comparison of affine region detectors, In: INTERNATIONAL JOURNAL OF COMPUTER VISION65(1-2)pp. 43-72 SPRINGER
H Cai, K Mikolajczyk, J Matas (2011)Learning linear discriminant projections for dimensionality reduction of image descriptors, In: IEEE Transactions on Pattern Analysis and Machine Intelligence33(2)pp. 338-352 IEEE

In this paper, we present Linear Discriminant Projections (LDP) for reducing dimensionality and improving discriminability of local image descriptors. We place LDP into the context of state-of-the-art discriminant projections and analyze its properties. LDP requires a large set of training data with point-to-point correspondence ground truth. We demonstrate that training data produced by a simulation of image transformations leads to nearly the same results as the real data with correspondence ground truth. This makes it possible to apply LDP as well as other discriminant projection approaches to the problems where the correspondence ground truth is not available, such as image categorization. We perform an extensive experimental evaluation on standard data sets in the context of image matching and categorization. We demonstrate that LDP enables significant dimensionality reduction of local descriptors and performance increases in different applications. The results improve upon the state-of-the-art recognition performance with simultaneous dimensionality reduction from 128 to 30.

Z Kalal, J Matas, K Mikolajczyk (2011)Tracking-Learning-Detection, In: IEEE Transactions on Pattern Analysis and Machine Intelligence34(7)pp. 1409-1422 IEEE

This paper investigates long-term tracking of unknown objects in a video stream. The object is defined by its location and extent in a single frame. In every frame that follows, the task is to determine the object's location and extent or indicate that the object is not present. We propose a novel tracking framework (TLD) that explicitly decomposes the long-term tracking task into tracking, learning and detection. The tracker follows the object from frame to frame. The detector localizes all appearances that have been observed so far and corrects the tracker if necessary. The learning estimates detector's errors and updates it to avoid these errors in the future. We study how to identify detector's errors and learn from them. We develop a novel learning method (P-N learning) which estimates the errors by a pair of "experts'': (i) P-expert estimates missed detections, and (ii) N-expert estimates false alarms. The learning process is modeled as a discrete dynamical system and the conditions under which the learning guarantees improvement are found. We describe our real-time implementation of the TLD framework and the P-N learning. We carry out an extensive quantitative evaluation which shows a significant improvement over state-of-the-art approaches.

K Lebeda, S Hadfield, J Matas, R Bowden (2013)Long-Term Tracking Through Failure Cases, In: Proceeedings, IEEE workshop on visual object tracking challenge at ICCVpp. 153-160

Long term tracking of an object, given only a single instance in an initial frame, remains an open problem. We propose a visual tracking algorithm, robust to many of the difficulties which often occur in real-world scenes. Correspondences of edge-based features are used, to overcome the reliance on the texture of the tracked object and improve invariance to lighting. Furthermore we address long-term stability, enabling the tracker to recover from drift and to provide redetection following object disappearance or occlusion. The two-module principle is similar to the successful state-of-the-art long-term TLD tracker, however our approach extends to cases of low-textured objects. Besides reporting our results on the VOT Challenge dataset, we perform two additional experiments. Firstly, results on short-term sequences show the performance of tracking challenging objects which represent failure cases for competing state-of-the-art approaches. Secondly, long sequences are tracked, including one of almost 30000 frames which to our knowledge is the longest tracking sequence reported to date. This tests the re-detection and drift resistance properties of the tracker. All the results are comparable to the state-of-the-art on sequences with textured objects and superior on non-textured objects. The new annotated sequences are made publicly available

L Ellis, N Dowson, J Matas, R Bowden (2011)Linear regression and adaptive appearance models for fast simultaneous modelling and tracking, In: International Journal of Computer Vision95pp. 154-179 Springer Netherlands
P REMAGNINO, J ILLINGWORTH, J MATAS (1995)INTERNATIONAL CONTROL OF CAMERA LOOK DIRECTION AND VIEWPOINT IN AN ACTIVE VISION SYSTEM, In: IMAGE AND VISION COMPUTING13(2)pp. 79-88 BUTTERWORTH-HEINEMANN LTD
Z Kalal, K Mikolajczyk, J Matas (2010)Face-TLD: Tracking-learning-detection applied to faces, In: Proceedings - International Conference on Image Processing, ICIPpp. 3789-3792

A novel system for long-term tracking of a human face in unconstrained videos is built on Tracking-Learning-Detection (TLD) approach. The system extends TLD with the concept of a generic detector and a validator which is designed for real-time face tracking resistent to occlusions and appearance changes. The off-line trained detector localizes frontal faces and the online trained validator decides which faces correspond to the tracked subject. Several strategies for building the validator during tracking are quantitatively evaluated. The system is validated on a sitcom episode (23 min.) and a surveillance (8 min.) video. In both cases the system detectstracks the face and automatically learns a multi-view model from a single frontal example and an unlabeled video