Miroslaw Bober

Professor Miroslaw Bober


Professor of Video Processing
BSc, MSc, PhD
+44 (0)1483 684724
29 BA 00

Biography

Biography

Professor of Video ProcessingBSc, MSc, PhD, MIEEE

Miroslaw Bober joined Surrey in 2011 as Professor of Video Processing. He is leading the Visual Media Analysis team within the Centre for Vision, Speech and Signal Processing (CVSSP).

Prior to his appointment at Surrey Prof Bober was the General Manager of the Mitsubishi Electric R&D Centre Europe (MERCE-UK), and the Head of Research for its Visual & Sensing Division. He was leading this European Corporate R&D centre for 15 years. His technical achievements were recognized in numerous awards, including the Presidential Award for strengthening the TV business in Japan via an innovative “Visual Navigation” content access technology (2010) and the prestigious Mitsubishi Best Invention Award for his Image Signature Technology (2008-one winner selected globally).

Prof Bober received a BSc and an MSc degree with distinction in electrical engineering from the AGH University of Science and Technology (Krakow, Poland) (1990), an MSc degree in Machine Intelligence (with distinction) from Surrey University (1991), and a PhD degree in computer vision from Surrey University (1995).

Miroslaw has published over 60 peer-reviewed publications and is the named inventor on over 80 unique patent applications. He has held over 30 research and industrial grants, with the value exceeding £16M. He is a member of the British Standards Institution (BSI) committee IST/37, responsible for UK contributions into MPEG and JPEG and represents UK in the area of image and video analysis and associated metadata. Prof Bober is chairing MPEG technical work on Compact Descriptors for Visual Search (CDVS - standard ISO/IEC FDIS 15938-13) and Compact Descriptors for Video Analysis (CDVA - work in progress).

Research interests

My research focuses on novel techniques in Signal Processing, Computer Vision and Machine Learning and their applications in industry, healthcare, big-data and security. I have a particular interest in image and video analysis & retrieval (visual search, object recognition, analysis of motion, shape & texture). The broad research objective is to develop unique methods and technology solutions for visual content understanding that can dramatically improve on existing state-of-the art leading to new applications. My algorithms for shape analysis and image/video fingerprinting as well as visual search are considered world-leading and were selected for ISO International standards within MPEG and used by, e.g. Metropolitan Police.

Teaching

  • EEE3034 - Media Casting (Module Coordinator)
  • EEE3029 - Multimedia Systems and Component Technology
  • EEEM001 - Image and Video Compression
  • EEE3035 - Engineering Professional Studies

Departmental duties

  • Programme Director for MSc in Multimedia Signal Processing and Communications
  • Industrial Tutor for undergraduate industrial placement year
  • Personal tutor for undergraduate students (L1, L2, L3, L4)
  • MSc-level tutor
  • Member of Faculty Research Degrees Committee (FRDC)
  • Member of the Departmental Industrial Advisory Board (IAB)

Affiliations

I have extensive collaboration links with universities and research institutions in Europe (UK, Switzerland, Germany, Poland, France, Spain), US, Japan and China. I have also worked with the following companies: the BBC (UK), Bang and Olufsen (DE), CEDEO (IT), Casio (JP), Ericsson (SE), Huawei (DE), Mitsubishi Electric (JP), RAI television (IT), Renesas Electronics (JP), Telecom Italia (IT), and Visual Atoms (UK).

Current Projects & Research Funding

I am the project coordinator and PI for the BRIDGET FP-7 project [5.28 M€], where my team is responsible for the development of ultra large-scale visual search and media analysis algorithms for the broadcast industry. The project aims to open new dimensions for multimedia content creation and consumption by bridging the gap between the Broadcast and Internet. Project partners include RAI television, Huawei, Telecom Italia and more.

CODAM is my latest project (PI) and is funded by the TSB creative media call [£1.05 M]. My team is working with the BBC and Visual Atoms to develop an advanced video asset management system with unique visual fingerprinting and visual search capabilities. It will aid content creation and deployment by enabling visual content tracking, identification and searching across multiple devices and platforms, and across diverse digital media ecosystems and markets. Where is the original version of the low-quality clip? Which video clip has been used most often in BBC programmes? Is it a stock shot of a red double decker bus, or an excerpt from a royal wedding? Is there other footage in the archive that shows the same event but can provide a fresh viewpoint? The CODAM system will answer these questions, track the origins of video clips across multi-platform productions and search for related material. It will take the form of a modular software system that can identify individual video clips in edited programmes, and perform object or scene recognition to find similar footage in an archive without relying on manually entered and often incomplete metadata.

News

Media Contacts

Contact the press team

Email:

mediarelations@surrey.ac.uk

Phone: +44 (0)1483 684380 / 688914 / 684378
Out-of-hours: +44 (0)7773 479911
Senate House, University of Surrey
Guildford, Surrey GU2 7XH

My publications

Publications

Husain S, Bober M (2016) Improving large-scale image retrieval through robust aggregation of local descriptors, IEEE Transactions on Pattern Analysis and Machine Intelligence
Visual search and image retrieval underpin numerous applications, however the task is still challenging predominantly due to the variability of object appearance and ever increasing size of the databases, often exceeding billions of images. Prior art methods rely on aggregation of local scale-invariant descriptors, such as SIFT, via mechanisms including Bag of Visual Words (BoW), Vector of Locally Aggregated Descriptors (VLAD) and Fisher Vectors (FV). However, their performance is still short of what is required. This paper presents a novel method for deriving a compact and distinctive representation of image content called Robust Visual Descriptor with Whitening (RVD-W). It significantly advances the state of the art and delivers world-class performance. In our approach local descriptors are rank-assigned to multiple clusters. Residual vectors are then computed in each cluster, normalized using a direction-preserving normalization function and aggregated based on the neighborhood rank. Importantly, the residual vectors are de-correlated and whitened in each cluster before aggregation, leading to a balanced energy distribution in each dimension and significantly improved performance. We also propose a new post-PCA normalization approach which improves separability between the matching and non-matching global descriptors. This new normalization benefits not only our RVD-W descriptor but also improves existing approaches based on FV and VLAD aggregation. Furthermore, we show that the aggregation framework developed using hand-crafted SIFT features also performs exceptionally well with Convolutional Neural Network (CNN) based features. The RVD-W pipeline outperforms state-of-the-art global descriptors on both the Holidays and Oxford datasets. On the large scale datasets, Holidays1M and Oxford1M, SIFT-based RVD-W representation obtains a mAP of 45.1% and 35.1%, while CNN-based RVD-W achieve a mAP of 63.5% and 44.8%, all yielding superior performance to the state-of-the-art.
Mokhtarian F, Bober MZ (2003) Curvature Scale Space Representation: Theory, Applications, and MPEG-7 Standardization, Springer Netherlands
MPEG-7 is the first international standard which contains a number of key techniques from Computer Vision and Image Processing. The Curvature Scale Space technique was selected as a contour shape descriptor for MPEG-7 after substantial and comprehensive testing, which demonstrated the superior performance of the CSS-based descriptor. Curvature Scale Space Representation: Theory, Applications, and MPEG-7 Standardization is based on key publications on the CSS technique, as well as its multiple applications and generalizations. The goal was to ensure that the reader will have access to the most fundamental results concerning the CSS method in one volume. These results have been categorized into a number of chapters to reflect their focus as well as content. The book also includes a chapter on the development of the CSS technique within MPEG standardization, including details of the MPEG-7 testing and evaluation processes which led to the selection of the CSS shape descriptor for the standard. The book can be used as a supplementary textbook by any university or institution offering courses in computer and information science.
Sibiryakov A, Bober M (2006) Real-time multi-frame analysis of dominant translation, 18th International Conference on Pattern Recognition, Vol 1, Proceedings pp. 55-58 IEEE COMPUTER SOC
Sibiryakov A, Bober M (2007) Low-complexity motion analysis for mobile video devices, Digest of Technical Papers - IEEE International Conference on Consumer Electronics
This paper describes a low-complexity method for estimation of translational motion in video sequences, based on the principle that 2D image translation results in translation of image projections. The method uses a single pixel-by-pixel frame scan to extract a compact frame-descriptor. The translation between frames is determined by matching frame-descriptors using fixed-point implementation of the Phase Correlation Method (PCM). The method gives reliable results even when video images are blurred by significant and rapid motion and in the presence of independent local motions. Real-time performance has been achieved in a low-cost DSP board1. © 2007 IEEE.
Bober MZ (2007) Method and device for processing and for searching for an object by signals corresponding to images,
A method of representing an object appearing in a still or video image, by processing signals corresponding to the image, comprises deriving the peak values in CSS space for the object outline and applying a non-linear transformation to said peak values to arrive at a representation of the outline.
Bober M, Kittler J (1994) Robust motion analysis, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition pp. 947-952
We develop a new robust algorithm for the estimation of optic flow and extraction of other motion-relevant information. A novel combination of the Hough Transform, and robust statistical methods results in unbiased estimates for multiple motions, parallel segmentation and estimation and increased robustness to noise and changes of illumination. The algorithm is fast, due to application of multiresolution in both image and parameter space. A simple, translational motion model and a complex one coping with rotation and change of scale are applied. Also, an accuracy measure for the derived estimate is introduced. The paper includes experimental tests of this new approach and its comparison with several other widely-cited methods. The experiments were aimed at assessing the effect of noise, change of illumination and multiple motions on the algorithms performance. The results show that our approach is significantly more robust than other methods.
Bober MZ, Cooper J (2011) Method and apparatus for representing and searching for an object in an image,
A method representing an object appearing in still or video image for use in searching, wherein the object appears in the image with a first two-dimensional outline, by processing signals corresponding to the image, comprises deriving a view descriptor of the first outline of the object and deriving at least one additional view descriptor of the outline of the object in a different view, and associating the two or more view descriptors to form an object descriptor.
Bober MZ (2002) Hough transform based method of estimating parameters,
A Hough transform based method of estimating N parameters a=(a.sub.1, . . . , a.sub.N) of motion of a region Y in a first image to a following image, the first and following images represented, in a first spatial resolution, by intensities at pixels having coordinates in a coordinate system, the method including: determining the total support H(Y,a) as a sum of the values of an error function for the intensities at pixels in the region Y; determining the motion parameters a that give the total support a minimum value; the determining being made in steps of an iterative process moving along a series of parameter estimates a.sub.1, a.sub.2, . . . by calculating partial derivatives dH.sub.i =MH(Y,a.sub.n)/Ma.sub.n,i of the total support for a parameter estimate a.sub.n with respect to each of the parameters a.sub.i and evaluating the calculated partial derivatives for taking a new a.sub.n+1 ; and wherein, in the evaluating of the partial derivatives, the partial derivatives dH.sub.i are first scaled by multiplying by scaling factors dependent on the spatial extension of the region to produce scaled partial derivatives dHN.sub.i.
Bober MZ, Paschalakis S, Brasnett P (2010) Video Identification, 12/693,220
A method and apparatus for processing a first sequence of images and a second sequence of images to compare the first and second sequences is disclosed. Each of a plurality of the images in the first sequence and each of a plurality of the images in the second sequence is processed by (i) processing the image data for each of a plurality of pixel neighbourhoods in the image to generate at least one respective descriptor element for each of the pixel neighbourhoods, each descriptor element comprising one or more bits; and (ii) forming a plurality of words from the descriptor elements of the image such that each word comprises a unique combination of descriptor element bits. The words for the second sequence are generated from the same respective combinations of descriptor element bits as the words for the first sequence. Processing is performed to compare the first and second sequences by comparing the words generated for the plurality of images in the first sequences with the words...
Bober M, Kucharski K, Skarbek W (2003) Face recognition by Fisher and scatter linear discriminant analysis, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2756 pp. 638-645
Fisher linear discriminant analysis (FLDA) based on variance ratio is compared with scatter linear discriminant (SLDA) analysis based on determinant ratio. It is shown that each optimal FLDA data model is optimal SLDA data model but not opposite. The novel algorithm 2SS4LDA (two singular subspaces for LDA) is presented using two singular value decompositions applied directly to normalized multiclass input data matrix and normalized class means data matrix. It is controlled by two singular subspace dimension parameters q and r, respectively. It appears in face recognition experiments on the union of MPEG-7, Altkom, and Feret facial databases that 2SS4LDA reaches about 94% person identification rate and about 0.21 average normalized mean retrieval rank. The best face recognition performance measures are achieved for those combinations of q, r values for which the variance ratio is close to its maximum, too. None such correlation is observed for SLDA separation measure. © Springer-Verlag Berlin Heidelberg 2003.
Bober MZ, Preteux F, W-Y Kim (2002) Shape Descriptors, John Wiley & Sons Inc
Introduction to MPEG-7 takes a systematic approach to the standard and provides a unique overview of the principles and concepts behind audio-visual indexing, ...
Ong EJ, Bober M (2016) Improved Hamming distance search using variable length Hashing, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2016-January pp. 2000-2008
This paper addresses the problem of ultra-large-scale search in Hamming spaces. There has been considerable research on generating compact binary codes in vision, for example for visual search tasks. However the issue of efficient searching through huge sets of binary codes remains largely unsolved. To this end, we propose a novel, unsupervised approach to thresholded search in Hamming space, supporting long codes (e.g. 512-bits) with a wide-range of Hamming distance radii. Our method is capable of working efficiently with billions of codes delivering between one to three orders of magnitude acceleration, as compared to prior art. This is achieved by relaxing the equal-size constraint in the Multi-Index Hashing approach, leading to multiple hash-tables with variable length hash-keys. Based on the theoretical analysis of the retrieval probabilities of multiple hash-tables we propose a novel search algorithm for obtaining a suitable set of hash-key lengths. The resulting retrieval mechanism is shown empirically to improve the efficiency over the state-of-the-art, across a range of datasets, bit-depths and retrieval thresholds.
Bober MZ (2007) Method for efficient coding of shape descriptor parameters,
A method of representing an object appearing in a still or video image, by processing signals corresponding to the image, comprises deriving a plurality of sets of co-ordinate values representing the shape of the object and quantising the co-ordinate values to derive a coded representation of the shape, and further comprises quantising a first co-ordinate value over a first quantisation range and quantising a smaller co-ordinate value over a smaller range.
Bober MZ, Sibiryakov A (2010) Dominant motion analysis,
A method of representing a 2-dimensional image comprises deriving at least one 1-dimensional representation of the image by projecting the image onto at least one axis, and applying a Fourier transform to said 1-dimensional representation. The representation can be used for estimation of dominant motion between images.
Husain S, Bober M (2014) Robust and scalable aggregation of local features for ultra large-scale retrieval, 2014 IEEE International Conference on Image Processing, ICIP 2014 pp. 2799-2803
© 2014 IEEE.This paper is concerned with design of a compact, binary and scalable image representation that is easy to compute, fast to match and delivers beyond state-of-the-art performance in visual recognition of objects, buildings and scenes. A novel descriptor is proposed which combines rank-based multi-assignment with robust aggregation framework and cluster/bit selection mechanisms for size scalability. Extensive performance evaluation is presented, including experiments within the state-of-The art pipeline developed by the MPEG group standardising Compact Descriptors for Visual Search (CVDS).
Bober M, Kittler J (1996) Video coding for mobile communications - MPEG4 perspective, IEE Colloquium (Digest) (248)
Bober MZ, O'Callaghan R (2006) Mutual-Rank Similarity-Space for Navigating, Visualising and Clustering in Image Databases, 11/990,452
A method of representing a group of data items comprises, for each of a plurality of data items in the group, determining the similarity between said data item and each of a plurality of other data items in the group, assigning a rank to each pair on the basis of similarity, wherein the ranked similarity values for each of said plurality of data items are associated to reflect the overall relative similarities of data items in the group.
Bober M (1999) Motion analysis for video coding and retrieval, IEE Colloquium (Digest) (103) pp. 51-56
The use of a robust, low-level motion estimator based on a Robust Hough Transform (RHT) in a range of tasks, such as optical flow estimation, and motion estimation for video coding and retrieval from video sequences was discussed. RHT derived not only pixels displacements, but also provided direct motion segmentation and other motion-related clues. The RHT algorithm employed an affine region-to-region transformation model and was invariant to illumination changes, in addition to being statistically robust. It was found that RHT did not base the correspondence analyses on any specific type of feature, but used textured regions in the image as non-localized features.
Paschalakis S, Iwamoto K, Sprljan N, Oami R, Nomura T, Yamada A, Bober M (2012) The MPEG-7 Video Signature Tools for Content Identification, IEEE Transactions on Circuits and Systems for Video Technology 22 (7) pp. 1050-1063 IEEE
This paper presents the core technologies of the Video Signature Tools recently standardized by ISO/IEC MPEG as an amendment to the MPEG-7 Standard (ISO/IEC 15938). The Video Signature is a high-performance content fingerprint which is suitable for desktop-scale to web-scale deployment and provides high levels of robustness to common video editing operations and high temporal localization accuracy at extremely low false alarm rates, achieving a detection rate in the order of 96% at a false alarm rate in the order of five false matches per million comparisons. The applications of the Video Signature are numerous and include rights management and monetization, distribution management, usage monitoring, metadata association, and corporate or personal database management. In this paper we review the prior work in the field, we explain the standardization process and status, and we provide the details and evaluation results of the Video Signature Tools.
Bober M, Brasnett P (2009) MPEG-7 visual signature tools, Proceedings - 2009 IEEE International Conference on Multimedia and Expo, ICME 2009 pp. 1540-1543
The MPEG-7 standard offers a comprehensive set of audiovisual content Description Tools to support applications enabling effective and efficient access to multimedia content. MPEG recently identified a need for a set of new, unique descriptors necessary for detection of duplicate or derived visual media content. The MPEG-7 Visual Signature Tools are characterized by efficient detection at low false positive rates, high robustness and enable very fast searching. This paper outlines application scenarios for visual signatures (also known as fingerprints or robust hashes) and discusses the requirements, evaluation process and selection methodology employed by MPEG. The Image Signature technology selected by MPEG and its performance is also described and the latest work on Video Signatures and associated standardisation time-line is summarized. ©2009 IEEE.
Kucharski K, Skarbek W, Bober M (2005) Dual LDA - An effective feature space reduction method for face recognition, IEEE International Conference on Advanced Video and Signal Based Surveillance - Proceedings of AVSS 2005 2005 pp. 336-341
Linear Discriminant Analysis (LDA) is a popular feature extraction technique that aims at creating a feature set of enhanced discriminatory power. The authors introduced a novel approach Dual LDA (DLDA) and proposed an efficient SVD-based implementation. This paper focuses on feature space reduction aspect of DLDA achieved in course of proper choice of the parameters controlling the DLDA algorithm. The comparative experiments conducted on a collection of five facial databases consisting in total of more than 10000 photos show that DLDA outperforms by a great margin the methods reducing the feature space by means of feature subset selection. © 2005 IEEE.
Cordara G, Bober M, Reznik Y (2012) Special issue on visual search and augmented reality, Signal Processing: Image Communication
ZHANG K, BOBER M, KITTLER J (1995) VARIABLE BLOCK SIZE VIDEO CODING WITH MOTION PREDICTION AND MOTION SEGMENTATION, DIGITAL VIDEO COMPRESSION: ALGORITHMS AND TECHNOLOGIES 1995 2419 pp. 62-70 SPIE - INT SOC OPTICAL ENGINEERING
Brasnett P, Paschalakis S, Bober M (2010) Recent developments on standardisation of MPEG-7 visual signature tools, 2010 IEEE International Conference on Multimedia and Expo, ICME 2010 pp. 1347-1352
This paper presents the latest developments and possible new directions for future work in standardisation of Visual Signature Tools within the Moving Picture Experts Group (MPEG). The tools, which include the Image Signature descriptor and the recently completed Video Signature descriptor, form a part of the MPEG-7 specification. They enable fast and robust detection of duplicate or derived visual media content, images and videos. Descriptors of this type are sometimes also referred to as fingerprints or robust hashes. Here we mainly focus on introducing the technology behind the recently completed Video Signature Tools and describe some recent developments and demonstration applications for the Image Signature Tools. Finally, we briefly present MPEG exploratory investigations on requirements of searching for different images containing the same visual objects within the mobile visual search framework. © 2010 IEEE.
Bober MZ, Zaharia R, Cieplinski L (2009) Adaptive quantization of depth signal in 3D visual coding,
A method of representing an image or sequence of images using a depth map comprises transforming an n-bit depth map representation into an m-bit depth map representation, where m
Bober MZ, Paschalakis S (2010) Methods of representing and analysing images,
A method of representing at least one image comprises deriving at least one descriptor based on color information and color interrelation information for at least one region of the image, the descriptor having at least one descriptor element, derived using values of pixels in said region, wherein at least one descriptor element for a region is derived using a non-wavelet transform. The representations may be used for image comparisons.
Bober MZ (2010) Method and apparatus for representing and searching for an object using shape,
A method of representing an object appearing in a still or video image for use in searching, wherein the object appears in the image with a first two-dimensional outline, by processing signals corresponding to the image, comprises deriving a view descriptor of the first outline of the object and deriving at least one additional view descriptor of the outline of the object in a different view, and associating the two or more view descriptors to form an object descriptor.
Georgis N, Kittler J, Bober M (2000) Accurate Recovery of Dense Depth Map for 3D Motion Based Coding, European Transactions on Telecommunications 11 (2) pp. 219-232
The problem of scene structure recovery from image motion is considered in the context of motion compensated video coding. In order to compensate for full 3D motion of the camera, the scene depth map together with camera egomotion need to be estimated from the image sequence. An important prerequisite of the shape and motion recovery is the establishment of correspondences between the points in two successive frames of a video sequence. In the paper we abandon the traditional feature based approach and develop a correspondence analysis technique based on robust matching of local image intensities under affine transformation. We show that the technique not only establishes true correspondences, but the resulting field of corresponding points is dense, and their displacement is obtained with a subpixel accuracy and free of bias. We demonstrate that these dense high quality matches yield accurate estimates of the imaging geometry which are several orders of magnitude better than estimates obtained with feature based methods. This accuracy is reflected in a superior quality of estimates of the scene structure. The experiments are performed with images of a scene with known geometry, obtained using a calibrated camera. The obtained depth estimates are dense, with error of the order of 1% which contrasts with the sparse depth map and greater than 10% errors offered by a typical feature based approach. The technique is also applied to coding. Preliminary results indicate that significant improvement of the predicted image can be achieved by using accurate dense depth estimates.
Bober M, Georgis N, Kittler J (1998) On Accurate and Robust Estimation of Fundamental Matrix, Computer Vision and Image Understanding 72 (1) pp. 39-53
This paper is concerned with accurate and robust estimation of the fundamental matrix. We show that, given certain conditions, a basic linear algorithm can yield excellent accuracy, in some cases two orders of magnitude better than sophisticated algorithms. The key element of the success is the relative accuracy of displacement estimates used as input and the use of statistical distribution of the errors. We propose a low-level, gradient-based (as opposed to feature-based) algorithm based on the Hough Transform to extract the low-level measurements. We show that it is much more efficient, both in terms of computational expense and accuracy of the final estimate, to remove the errors in the intermediate representation (optic-flow) than to attempt to improve the final estimate by complicated nonlinear algorithms. Experimental results are also included. © 1998 Academic Press.
Bober MZ, Paschalakis S (2010) MPEG image and video signature, pp. 81-95
© 2012 Springer Science+Business Media, LLC. All rights reserved. MPEG-7, formally called ISO/IEC 15938 Multimedia Content Description Interface, is an international standard aimed at providing an interoperable solution to the description of various types of multimedia content, irrespective of their representation format. It is quite different from standards such as MPEG-1, MPEG-2 and MPEG-4, which aim to represent the content itself, since MPEG-7 aims to represent information about the content, known as metadata, so as to allow users to search, identify, navigate and browse audio-visual content more effectively. The MPEG-7 standard provides not only elementary visual and audio descriptors, but also multi-media description schemes, which combine the elementary audio and visual descriptors, a description definition language (DDL), and a binary compression scheme for the efficient compression and transportation of MPEG-7 metadata. In addition, reference software and conformance testing information are also part of the MPEG-7 standard and provide valuable tools to the development of standard-compliant systems.
Bober M (2001) MPEG-7: Evolution or revolution?, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2124
© Springer-Verlag Berlin Heidelberg 2001.The ISO MPEG-7 Standard, also known as a Multimedia Content Description Interface, will be soon finalized. After several years of intensive work on technology development, implementation and testing by almost all major players in the digital multimedia arena, the results of this international project will be assessed by the most cruel and demanding judge: the market. Will it meet all the high expectations of the developers and, above all, future users? Will it result in a revolution, evolution or will it just simply pass unnoticed? In this invited lecture, I will review the components of the MPEG-7 Standard in the context of some novel applications. I will go beyond the classical image/ video retrieval scenarios, and look into more generic image/object recognition framework relying on the MPEG-7 technology. Such a framework is applicable to a wide range of new applications. The benefits of using standardized technology, over other state-of-the art techniques from computer vision, image processing, and database retrieval, will be investigated. Demonstrations of the generic object recognition system will be presented, followed by some other examples of emerging applications made possible by the Standard. In conclusion, I will assess the potential impact of this new standard on emerging services, products and future technology developments.
Windridge D, Bober M (2014) A Kernel-Based Framework for Medical Big-Data Analytics, 8401 pp. 197-208-197-208 Springer Berlin Heidelberg
ZHANG K, BOBER M, KITTLER J (1994) ROBUST MOTION ESTIMATION AND MULTISTAGE VECTOR QUANTISATION FOR SEQUENCE COMPRESSION, ICIP-94 - PROCEEDINGS, VOL II pp. 452-456 I E E E, COMPUTER SOC PRESS
Bober MZ, Paschalakis S (2012) MPEG image and video signature, 9781441961846 pp. 81-95
© 2012 Springer Science+Business Media, LLC. All rights reserved.MPEG-7, formally called ISO/IEC 15938 Multimedia Content Description Interface, is an international standard aimed at providing an interoperable solution to the description of various types of multimedia content, irrespective of their representation format. It is quite different from standards such as MPEG-1, MPEG-2 and MPEG-4, which aim to represent the content itself, since MPEG-7 aims to represent information about the content, known as metadata, so as to allow users to search, identify, navigate and browse audio-visual content more effectively. The MPEG-7 standard provides not only elementary visual and audio descriptors, but also multi-media description schemes, which combine the elementary audio and visual descriptors, a description definition language (DDL), and a binary compression scheme for the efficient compression and transportation of MPEG-7 metadata. In addition, reference software and conformance testing information are also part of the MPEG-7 standard and provide valuable tools to the development of standard-compliant systems.
Sibiryakov A, Bober M (2007) Graph-based multiple panorama extraction from unordered image sets, Proceedings of SPIE - The International Society for Optical Engineering 6498
This paper presents a multi-image registration method, which aims at recognizing and extracting multiple panoramas from an unordered set of images without user input. A method for panorama recognition introduced by Lowe and Brown [1] is based on extraction of a full set of scale invariant image features and fast matching in feature space, followed by post-processing procedures. We propose a different approach, where the full set of descriptors is not required, and a small number of them are used to register a pair of images. We propose feature point indexing based on corner strength value. By matching descriptor pairs with similar corner strengths we update clusters in rotation-scale accumulators, and a probabilistic approach determines when these clusters are further processed with RANSAC to find inliers of image homography. If the number of inliers and global similarity between images are sufficient, a fast geometry-guided point matching is performed to improve the accuracy of registration. A global registration graph, whose node weights are proportional to the image similarity in the area of overlap, is updated with each new registration. This allows the prediction of undiscovered image registrations by finding the shortest paths and corresponding transformation chains. We demonstrate our approach using typical image collections containing multiple panoramic sequences. © 2007 SPIE-IS&T.
Cieplinski L, Bober M (1997) Scalable image coding using Gaussian pyramid vector quantization with resolution-independent block size, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 4 pp. 2949-2952
We present a new approach to multiresolution vector quantization. Its main advantage is exploitation of long-range correlations in the image by keeping vector size constant, independent of the image scale. We also developed a variable block-rate version of the algorithm, which allows better utilization of the available bit budget by refining only those areas of the image which are not efficiently approximated by lower resolutions of the pyramid.
Wang Z, Duan LY, Lin J, Huang T, Gao W, Bober M (2015) Component hashing of variable-length binary aggregated descriptors for fast image search, 2014 IEEE International Conference on Image Processing, ICIP 2014 pp. 2217-2221
© 2014 IEEE.Compact locally aggregated binary features have shown great advantages in image search. As the exhaustive linear search in Hamming space still entails too much computational complexity for large datasets, recent works proposed to directly use binary codes as hash indices, yielding a dramatic increase in speedup. However, these methods cannot be directly applied to variable-length binary features. In this paper, we propose a Component Hashing (CoHash) algorithm to handle the variable-length binary aggregated descriptors indexing for fast image search. The main idea is to decompose the distance measure between variable-length descriptors into aligned component-to-component matching problems independently, and build multiple hash tables for the visual word components. Given a query, its candidate neighbors are found by using the query binary sub-vectors as indices into their corresponding hash tables. In particular, a bit selection based on conditional mutual information maximization is proposed to reduce the dimensionality of visual word components, which provides a light storage of indices and balances the retrieval accuracy and search cost. Extensive experiments on benchmark datasets show that our approach is 20
Bober MZ, Zaharia R (2004) Occupant monitoring apparatus, 10/838,370
A seat occupant monitoring apparatus which includes a sensor reactive to a force distribution applied to the seat by the occupant, means for making a plurality of measurements of the force distribution and means for monitoring the occupant based on the measurements. The measurements are used to classify the occupant and on a basis of the classification, parameters of the occupant's environment such as a seat orientation, the rate of deployment of an air bag or control of a seat belt pre-tensioner are altered. A neural network or other learning based technique is used for the classification.
Bober MZ (2010) Method and apparatus for motion vector field encoding,
A method and apparatus for representing motion in a sequence of digitized images derives a dense motion vector field and vector quantizes the motion vector field.
Santamaria C, Bober M, Szajnowski W, Aso N (2004) Analysis of remotely-sensed imagery using the level-crossing statistics texture descriptor, Proceedings of SPIE - The International Society for Optical Engineering 5573 pp. 115-125
In this paper, we present a novel approach for the extraction of the Level-crossing Statistics (LCS) texture descriptor and the application of this descriptor to the processing of remote sensing data. The LCS is a recently presented statistical texture descriptor that first maps the images into ID signals using space-filling curves, then applies a signal-dependent sampling and finally extracts texture parameters (such as crossing rate, crossing slope and sojourn time) from the ID signal. In the new extraction approach introduced in this paper, a pyramidal decomposition is employed to extract texture features of different spatial resolution. Despite the simplicity of the texture features used, our approach offers state-of-the art performance in the texture classification and texture segmentation tasks, outperforming other tested algorithms. In the remote sensing field, the LCS descriptor has been tested in segmentation and classification scenarios. A land-use/land-cover analysis system has been designed and the new texture descriptor has shown very good results in the supervised segmentation of satellite images, even when very few training samples are provided to the system.
Sibiryakov A, Bober M (2005) A method of statistical template matching and its application to face and facial feature detection, WSEAS Transactions on Information Science and Applications 2 (9) pp. 1285-1293
This paper addresses a problem of robust, accurate and fast object detection in complex environments, such as cluttered backgrounds and low-quality images. To overcome the problems with existing methods, we propose a new object detection approach, called Statistical Template Matching. It is based on generalized description of the object by a set of template regions and statistical testing of object/non-object hypotheses. A similarity measure between the image and a template is derived from the Fisher criterion. We show how to apply our method to face and facial feature detection tasks, and demonstrate its performance in some difficult cases, such as moderate variation of scale factor of the object, local image warping and distortions caused by image compression. The method is very fast; its speed is independent of the template size and depends only on the template complexity.
Sibiryakov A, Bober M (2006) Image registration using RST-clustering and its application in remote sensing, Proceedings of SPIE - The International Society for Optical Engineering 6365
In this paper we address the problem of registering images acquired under unknown conditions including acquisition at different times, from different points of view and possibly with different type of sensors, where conventional approaches based on feature correspondence or area correlation are likely to fail or provide unreliable estimates. The result of image registration can be used as initial step for many remote sensing applications such as change detection, terrain reconstruction and image-based sensor navigation. The key idea of the proposed method is to estimate a global parametric transformation between images (e.g. perspective or affine transformation) from a set of local, region-based estimates of rotation-scale-translation (RST) transformation. These RST- transformations form a cluster in rotation-scale space. Each RST- transformation is registered by matching in log-polar space the regions centered at locations of the corresponding interest points. Estimation of the correspondence between interest points is performed simultaneously with registration of the local RST-transformations. Then a sub-set of corresponding points or, equivalently, a sub-set of local RST-transformations is selected by a robust estimation method and a global transformation, which is not biased by outliers, is computed from it. The method is capable of registering images without any a priori knowledge about the transformation between them. The method was tested on many images taken under different conditions by different sensors and on thousands of calibrated image pairs. In all cases the method shows very accurate registration results. We demonstrate the performance of our approach using several datasets and compare it with another state-of-the-art method based on the SIFT descriptor.
Madeo S, Bober M (2016) Fast, Compact and Discriminative: Evaluation of Binary Descriptors for Mobile Applications, IEEE Transactions on Multimedia PP (99)
© 2016 IEEE.Local feature descriptors underpin many diverse applications, supporting object recognition, image registration, database search, 3D reconstruction and more. The recent phenomenal growth in mobile devices and mobile computing in general has created demand for descriptors that are not only discriminative, but also compact in size and fast to extract and match. In response, a large number of binary descriptors have been proposed, each claiming to overcome some limitations of the predecessors. This paper provides a comprehensive evaluation of several promising binary designs. We show that existing evaluation methodologies are not sufficient to fully characterize descriptors' performance and propose a new evaluation protocol and a challenging dataset. In contrast to the previous reviews, we investigate the effects of the matching criteria, operating points and compaction methods, showing that they all have a major impact on the systems' design and performance. Finally, we provide descriptor extraction times for both general-purpose systems and mobile devices, in order to better understand the real complexity of the extraction task. The objective is to provide a comprehensive reference and a guide that will help in selection and design of the future descriptors.
Brasnett P, Bober M (2008) Fast and robust image identification, Proceedings - International Conference on Pattern Recognition
This paper presents an image identifier robust to common modifications. A multi-resolution Trace transform is introduced that constructs a set of 1D representations of an image. A binary identifier is extracted from each representation using a Fourier transform. Experimental evaluation of the algorithm and three state-of-the-art methods was carried out on a set of over 60,000 unique images with results demonstrating that the method outperforms the prior art methods in terms of detection, robustness and speed and achieves detection rate over 99% at a false-positive rate below 1 per million, with search speed exceeding 10 million image pairs per second on a desktop PC. © 2008 IEEE.
Bober M, Asai K, Divakaran A (2001) A MPEG-4/7 based internet video and still image browsing system, MULTIMEDIA SYSTEMS AND APPLICATIONS III 4209 pp. 33-38 SPIE-INT SOC OPTICAL ENGINEERING
Kittler J, Matas J, Bober M, Nguyen L (1995) Image interpretation: exploiting multiple cues, IEE Conference Publication (410) pp. 1-5
Multiple cues play a crucial role in image interpretation. A vision system that combines shape, colour, motion, prior scene knowledge and object motion behaviour is described. We show that the use of interpretation strategies which depend on the image data, temporal context and visual goals significantly simplifies the complexity of the image interpretation problem and makes it computationally feasible.
Bober MZ (2005) Method and device for displaying or searching for object in image and computer-readable storage medium,
A method of representing an object appearing in a still or video image, by processing signals corresponding to the image, comprises deriving the peak values in CSS space for the object outline and applying a non-linear transformation to said peak values to arrive at a representation of the outline.
Pla F, Bober M (1997) Estimating translation/deformation motion through phase correlation, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 1310 pp. 653-660
© Springer-Verlag Berlin Heidelberg 1997.Phase correlation techniques have been used in image registration to estimate image displacements. These techniques have been also used to estimate optical flow by applying it locally. In this work a different phase correlation-based method is proposed to deal with a deformation/translation motion model, instead of the pure translations that the basic phase correlation technique can estimate. Some experimentals results are also presented to show the accuracy of the motion paramenters estimated and the use of the phase correlation to estimate optical flow.
Bober MZ (2009) Method, apparatus, computer program, computer system, and computer-readable storage medium for representing and searching for an object in an image,
A method of representing an object appearing in a still or video image, by processing signals corresponding to the image, comprises deriving a plurality of numerical values associated with features appearing on the outline of an object starting from an arbitrary point on the outline and applying a predetermined ordering to said values to arrive at a representative of the outline.
Bober M, Kittler J (1994) Estimation of complex multimodal motion: an approach based on robust statistics and Hough transform, Image and Vision Computing 12 (10) pp. 661-668
An application of robust statistics in a Hough transform based motion estimation approach is presented. The algorithm is developed and experiments are performed, proving its superior performance in terms of estimate accuracy, convergence, robustness and better segmentation. Comparative results with standard methods are also included. © 1994.
Smith RS, Bober M, Windeatt T (2011) A comparison of random forest with ECOC-based classifiers, Lecture Notes in Computer Science: Multiple Classifier Systems 6713 pp. 207-216 Springer
We compare experimentally the performance of three approaches to ensemble-based classification on general multi-class datasets. These are the methods of random forest, error-correcting output codes (ECOC) and ECOC enhanced by the use of bootstrapping and class-separability weighting (ECOC-BW). These experiments suggest that ECOC-BW yields better generalisation performance than either random forest or unmodified ECOC. A bias-variance analysis indicates that ECOC benefits from reduced bias, when compared to random forest, and that ECOC-BW benefits additionally from reduced variance. One disadvantage of ECOC-based algorithms, however, when compared with random forest, is that they impose a greater computational demand leading to longer training times.
Berriss WP, Price WG, Bober MZ (2003) The use of MPEG-7 for intelligent analysis and retrieval in video surveillance, IEE Colloquium (Digest) 3-10062 pp. 41-45
The incorporation of intelligent video processing algorithms into digital surveillance systems has been examined in this work. In particular, the use of the latest standard in multi-media feature extraction and matching is discussed. The use of such technology makes a system very different to current surveillance systems which store text-based meta-data. In our system, descriptions based upon shape and colour are extracted in real-time from two sequences of video recorded from a real-life scenario. The stored database of descriptions can then be searched using a query description constructed by the operator; this query is then compared with every description stored for the video sequence. We show examples of the fast and accurate search made possible with this latest technology for multimedia content description applied to a video surveillance database.
Bober MZ (2007) Method and apparatus for representing and searching for an object using shape,
A method is disclosed for representing a sequence of images constituting a moving image by processing signals corresponding to the image. An object appearing in one image is identified in the sequence in a first perspective view, and the same object appearing in another image is identified in the sequence in a second perspective view. A view descriptor of the outline of the object in the first perspective view is derived and at least one additional view descriptor of the outline of the object in another perspective view is also derived. The two or more view descriptors are associated to form a descriptor which is a single indexable entity for the sequence of images.
Bober MZ, Sibiryakov A (2007) Sparse Integral Image Descriptors with Application to Motion Analysis, 12/375,998
A method of representing an image comprises deriving at least one 1-dimensional representation of the image by projecting the image onto an axis, wherein the projection involves summing values of selected pixels in a respective line of the image perpendicular to said axis, characterised in that the number of selected pixels is less than the number of pixels in the line.
Bober MZ (2010) Method and apparatus for representing and searching for an object using shape,
A method is disclosed for representing a sequence of images constituting a moving image by processing signals corresponding to the image. An object appearing in one image is identified in the sequence in a first perspective view, and the same object appearing in another image is identified in the sequence in a second perspective view. A view descriptor of the outline of the object in the first perspective view is derived and at least one additional view descriptor of the outline of the object in another perspective view is also derived. The two or more view descriptors are associated to form a descriptor which is a single indexable entity for the sequence of images.
Ghanbari S, Cieplinski L, Bober MZ (2003) Recovery of lost motion vectors for error concealment in video coding, Picture Coding Symposium pp. 239-242
Transmission of compressed video over error prone channels such as mobile networks is a challenging issue. Maintaining an acceptable quality of service in such an environment demands additional post-processing tools to limit the impact of uncorrected transmission errors. Significant visual degradation of a video stream occurs when the motion vector component is corrupted. In this paper, an effective and computationally efficient method for the recovery of lost motion vectors (MVs) is proposed. The novel idea selects a neighbouring block MV that has the minimum distance from an estimated MV. Simulation results are presented, including comparison with existing methods. Our method follows the performance of the best existing method by approximately 0.1-0.5 dB. However, it has a significant advantage in that it is 50% computationally simpler. This makes our method ideal for use in mobile handsets and other applications with limited processing power.
Paschalakis S, Bober M (2004) Real-time face detection and tracking for mobile videoconferencing, Real-Time Imaging 10 (2) pp. 81-94
This paper addresses the issue of face detection and tracking in the context of a mobile videoconferencing application. While the integration of such technology into a mobile videophone is advantageous, allowing face stabilization, reduced bandwidth requirements and smaller display sizes, its deployment in such an environment may not be straightforward, since most face detection methods reported in the literature assume at least modest processing capabilities and memory and, usually, floating-point capabilities. The face detection and tracking method which is presented here achieves high performance, robustness to illumination variations and geometric changes, such as viewpoint and scale changes, and at the same time entails a significantly reduced computational complexity. Our method requires only integer operations and very small amounts of memory, of the order of a few hundred bytes, facilitating a real-time implementation on small microprocessors or custom hardware. In this context, this paper will also examine an FPGA implementation of the proposed algorithmic framework which, as will be seen, achieves extremely high frame processing rates at low clock speeds. © 2004 Elsevier Ltd. All rights reserved.
Bober MZ (2001) Method, apparatus, computer program, computer system and computer-readable storage for representing and searching for an object in an image,
A method of representing an object appearing in a still or video image, by processing signals corresponding to the image, comprises deriving the peak values in CSS space for the object outline and applying a non-linear transformation to said peak values to arrive at a representation of the outline.
Bober M, Petrou M, Kittler J (1998) Nonlinear motion estimation using the supercoupling approach, IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (5) pp. 550-555
This paper presents the application of a very efficient multiresolution transformation, which is related to the renormalization group approach of physics, to the problem of motion segmentation. The proposed approach is much faster and yields much better results than the full resolution approach. The problem is formulated as one of global optimization where a cost function is constructed to combine the information obtained by various processors as well as the constraints we impose to the problem. The cost function is optimized using the supercoupling multiresolution approach. © 1998 IEEE.
Paschalakis S, Bober M (2003) A low cost FPGA system for high speed face detection and tracking, Proceedings - 2003 IEEE International Conference on Field-Programmable Technology, FPT 2003 pp. 214-221
We present an FPGA face detection and tracking system for audiovisual communications, with a particular focus on mobile videoconferencing. The advantages of deploying such a technology in a mobile handset are many, including face stabilisation, reduced bitrate, and higher quality video on practical display sizes. Most face detection methods, however, assume at least modest general purpose processing capabilities, making them inappropriate for real-time applications, especially for power-limited devices,as well as modestcustom hardware implementations. We present a method which achieves a very high detection and tracking performance and, at the same time, entails a significantly reduced computational complexity, allowing real-time implementations on custom hardware or simple microprocessors. We then propose an FPGA implementation which entails very low logic and memory costs and achieves extremely high processing rates at very low clock speeds.
Bober MZ, Paschalakis S (2006) Method and apparatus for video navigation, 11/991,092
A method of deriving a representation of a video sequence comprises deriving metadata expressing at least one temporal characteristic of a frame or group of frames, and one or both of metadata expressing at least one content-based characteristic of a frame or group of frames and relational metadata expressing relationships between at least one content-based characteristic of a frame or group of frames and at least one other frame or group of frames, and associating said metadata and/or relational metadata with the respective frame or group of frames.
Bober MZ, Brasnett P (2009) SCALE ROBUST FEATURE-BASED IDENTIFIERS FOR IMAGE IDENTIFICATION, 12/989,362
A method for deriving an image identifier comprises deriving a scale-space representation of an image, and processing the scale-space representation to detect a plurality of feature points having values that are maxima or minima. A representation is derived for a scale-dependent image region associated with one or more of the detected plurality of feature points. In an embodiment, the size of the image region is dependent on the scale associated with the corresponding feature point. An image identifier is derived using the representations derived for the scale-dependent image regions. The image identifiers may be used in a method for comparing images.
Bober MZ, Szajnowski W (2006) Image Analysis and representation, 11/886,232
A one-dimensional representation of an image is obtained using a mapping function defining a closed scanning curve. The function is decomposed into component signals which represent different parts of the bandwidth of the representation using bi-directional filters to achieve zero group delay.
O'Callaghan R, Bober M (2005) MPEG-7 visual-temporal clustering for digital image collections, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 3689 LNCS pp. 339-350
We present a novel, yet simple algorithm for clustering large collections of digital images. The method is applicable to consumer digital photo libraries, where it can be used to organise a photo-album, enhancing the search/browse capability and simplifying the interface in the process. The method is based on standard MPEG-7 visual content descriptors, which, when combined with date and time metadata, provide powerful cues to the semantic structure of the photo collection. Experiments are presented showing how the proposed method closely matches consensus human judgements of cluster structure. © Springer-Verlag Berlin Heidelberg 2005.
Jin Y, Mokhtarian F, Bober M, Illingworth J (2008) Fuzzy chamfer distance and its probabilistic formulation for visual tracking, 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR
The paper presents a fuzzy chamfer distance and its probabilistic formulation for edge-based visual tracking. First, connections of the chamfer distance and the Hausdorff distance with fuzzy objective functions for clustering are shown using a reformulation theorem. A fuzzy chamfer distance (FCD) based on fuzzy objective functions and a probabilistic formulation of the fuzzy chamfer distance (PFCD) based on data association methods are then presented for tracking, which can all be regarded as reformulated fuzzy objective functions and minimized with iterative algorithms. Results on challenging sequences demonstrate the performance of the proposed tracking method. ©2008 IEEE.
Bober MZ, Santamaria C (2010) Algorithm for line tracking using a circular sector search window,
A method of identifying or tracking a line in an image comprises determining a start point that belongs to a line, and identifying a plurality of possible end points belonging to the line using a search window, and calculating values for a plurality of paths connecting the start point and the end points to determine an optimum end point and path, characterised in that the search window is non-rectangular.
Kucharski K, Skarbek W, Bober M (2005) Feature space reduction for face recognition with dual Linear Discriminant Analysis, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 3691 LNCS pp. 587-595
Linear Discriminant Analysis (LDA) is widely known feature extraction technique that aims at creating a feature set of enhanced discriminatory power. It was addressed by many researchers and proved to be especially successful approach in face recognition. The authors introduced a novel approach Dual LDA (DLDA) and proposed an efficient SVD-based implementation controlled by two parameters. In this paper DLDA is analyzed from the feature space reduction point of view and the role of the parameters is explained. The comparative experiments conducted on facial database consisting of nearly 2000 individuals show superiority of this approach over class of feature selection methods that choose the features one by one relying on classic statistical measures. © Springer-Verlag Berlin Heidelberg 2005.
Bober M, Kittler J (1996) Combining the hough transform and multiresolution MRF's for the robust motion estimation, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 1035 pp. 91-100
© Springer-Verlag Berlin Heidelberg 1996.The paper presents a novel approach to the Robust Analysis of Complex Motion. It employs a low-level robust motion estimator, conceptually based on the Hough Transform, and uses Multiresolution Markov Random Fields for the global interpretation of the local, low-level estimates. Motion segmentation is performed in the front-end estimator, in parallel with the motion parameter estimation process. This significantly improves the accuracy of estimates, particularly in the vicinity of motion boundaries, facilitates the detection of such boundaries, and allows the use of larger regions, thus improving robustness. The measurements extracted from the sequence in the front-end estimator include displacement, the spatial derivatives of the displacement, confidence measures, and the location of motion boundaries. The measurements are then combined within the MRF framework, employing the supercoupling approach for fast convergence. The excellent performance, in terms of estimate accuracy, boundary detection and robustness is demonstrated on synthetic and real-word sequences.
Bober MZ, Szajnowski WJ (2008) Method and apparatus for image processing,
A method of analysing an image comprises performing a Hough transform on points in an image space to an n-dimensional Hough space, selecting points in the Hough space representing features in the image space, and analysing m of the n variables for the selected points, where m is less than n, for information about the features in the image space.
Bober MZ, Sibiryakov A (2007) SPARSE INTEGRAL IMAGE DESCRIPTORS WITH APPLICATION TO MOTION ANALYSIS, 12/375,998
A method of representing an image comprises deriving at least one 1-dimensional representation of the image by projecting the image onto an axis, wherein the projection involves summing values of selected pixels in a respective line of the image perpendicular to said axis, characterised in that the number of selected pixels is less than the number of pixels in the line.
Bober MZ, Brasnett P (2008) HIGH PERFORMANCE IMAGE IDENTIFICATION, 12/663,479
A method and apparatus for deriving a representation of an image is described. The method involves processing signals corresponding to the image. A two-dimensional function of the image, such as a Trace transform (T (d, ¸)), of the image using at least one functional T, is derived and processed using a mask function (²) to derive an intermediate representation of the image, corresponding to a one-dimensional function. In one embodiment, the mask function defines pairs of image bands of the Trace transform in the Trace domain. The representation of the image may be derived by applying existing techniques to the derived one-dimensional function.
Bober MZ, Sibiryakov A (2006) Fast Method of Object Detection by Statistical Template Matching, 11/884,699
A method of detecting an object in an image comprises comparing a template with a region of an image and determining a similarity measure, wherein the similarity measure is determined using a statistical measure. The template comprises a number of regions corresponding to parts of the object and their spatial relations. The variance of the pixels within the total template is set in relation to the variances of the pixels in all individual regions, to provide a similarity measure.
PETROU M, BOBER M, KITTLER J (1994) MULTIRESOLUTION MOTION SEGMENTATION, PROCEEDINGS OF THE 12TH IAPR INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION - CONFERENCE A: COMPUTER VISION & IMAGE PROCESSING pp. 379-383 I E E E, COMPUTER SOC PRESS
Zhang K, Bober M, Kittler J (1996) Hybrid codec for very low bit rate video coding, IEEE International Conference on Image Processing 1 pp. 641-644
In segmentation based video compression algorithms, the quality of the segmentation and the efficiency of the boundary coding have strong influence on the coding efficiency and image quality. In this paper, we propose a hybrid codec using motion segmentation and employing variable size block structure. The image is segmented based on motion and grey level information and represented by variable size blocks combined with inner block partition. A method of partitioning using fixed patterns is developed to encode the motion boundaries inside a block. This approach improves the boundary coding efficiency by a factor of three as compared with the straight line approximation used in our earlier approach. The experimental results show that the proposed algorithm decreases the bit rate and preserves good image quality.
Berriss WP, Price WG, Bober MZ (2002) <VISS>": A video intelligent surveillance system, Proceedings of SPIE - The International Society for Optical Engineering 4861 pp. 13-21
Video surveillance is gaining increasing popularity as a possible response to various threats such as terrorism, vandalism and crime. The need for automated analysis of the events monitored by video cameras and support for fast search and browsing of such recorded video data is evident. In this paper we present ", a prototype system that uses advanced video segmentation and MPEG-7 technology to analyse and index visual events in real time. Visual features such as shape, colour and texture are extracted and used to describe the images stored on the system. A search of large volumes of data can be performed very quickly. We show examples of the fast search made possible with ".
Cordara G, Bober M, Reznik Y (2013) Special issue on visual search and augmented reality, Signal Processing: Image Communication 28 (4) pp. 309-310
Ravi D, Bober M, Farinella GM, Guarnera M, Battiato S (2016) Semantic segmentation of images exploiting DCT based features and random forest, Pattern Recognition 52 pp. 260-273
This paper presents an approach for generating class-specific image segmentation. We introduce two novel features that use the quantized data of the Discrete Cosine Transform (DCT) in a Semantic Texton Forest based framework (STF), by combining together colour and texture information for semantic segmentation purpose. The combination of multiple features in a segmentation system is not a straightforward process. The proposed system is designed to exploit complementary features in a computationally efficient manner. Our DCT based features describe complex textures represented in the frequency domain and not just simple textures obtained using differences between intensity of pixels as in the classic STF approach. Differently than existing methods (e.g., filter bank) just a limited amount of resources is required. The proposed method has been tested on two popular databases: CamVid and MSRC-v2. Comparison with respect to recent state-of-the-art methods shows improvement in terms of semantic segmentation accuracy.
Zhang K, Bober M, Kittler J (1997) Image sequence coding using multiple-level segmentation and affine motion estimation, IEEE Journal on Selected Areas in Communications 15 (9) pp. 1704-1713
A very low bit-rate video codec using multiple-level segmentation and affine motion compensation is presented. The translational motion model is adequate to motion compensate small regions even when complex motion is involved; however, it is no longer capable of delivering satisfactory results when applied to large regions or the whole frame. The proposed codec is based on a variable block size algorithm enhanced with global motion compensation, inner block segmentation, and a set of motion models used adaptively in motion compensation. The experimental results show that the proposed method gives better results in terms of the bit rate under the same PSNR constraint for most of the tested sequences as compared with the fixed block size approach and traditional variable block size codec in which only translational motion compensation is utilized.
Zhang K, Bober M, Kittler J (1996) Video coding using affine motion compensated prediction, 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6 pp. 1978-1981 I E E E
Skarbek W, Kucharski K, Bober M (2004) Dual LDA for face recognition, Fundamenta Informaticae 61 (3-4) pp. 303-334
The complete theory for Fisher and dual discriminant analysis is presented as the background of the novel algorithms. LDA is found as composition of projection onto the singular subspace for within-class normalised data with the projection onto the singular subspace for between-class normalised data. The dual LDA consists of those projections applied in reverse order. The experiments show that using suitable composition of dual LDA transformations gives as least as good results as recent state-of-the-art solutions.
Zhang K, Bober M, Kittler J (1995) Motion based image segmentation for video coding, IEEE International Conference on Image Processing 3 pp. 476-479
A robust and stable scene segmentation is a prerequisite for the object based coding. Various approaches to this complex task have been proposed, including segmentation of optic flow, grey-level based segmentation or simple division of the scene into moving and stationary regions. In this paper, we propose an algorithm which combines all three approaches in order to get a more robust and accurate segmentation of the moving objects. The experimental results show that the proposed algorithm can significantly reduce over-segmentation and maintain accurate motion boundaries. The use of the proposed approach in video coding can increase the PSNR and reduce the bit rate.
Bober MZ, Cooper J, Paschalakis S (2007) Method and apparatus for detecting and/or tracking one or more colour regions in an image or sequence of images,
Drawing
Description
Claims

Patent number: 7236629
Filing date: 8 Apr 2003
Issue date: 26 Jun 2007
Application number: 10/408,316

A method of detecting a region having predetermined colour characteristics in an image comprises transforming colour values of pixels in the image from a first colour space to a second colour space, using the colour values in the second colour space to determine probability values expressing a match between pixels and the predetermined colour characteristics, where the probability values range over a multiplicity of values, using said probability values to identify pixels at least approximating to said predetermined colour characteristics, grouping pixels which at least approximate to said predetermined colour characteristics, and extracting information about each group, wherein pixels are weighted according to the respective multiplicity of probability values, and the weightings are used when grouping the pixels and/or when extracting information about a group.

Bober M (2001) MPEG-7 visual shape descriptors, IEEE Transactions on Circuits and Systems for Video Technology 11 (6) pp. 716-719
This paper describes techniques and tools for shape representation and matching, developed in the context of MPEG-7 standardization. The application domains for each descriptor are considered, and the contour-based shape descriptor is presented in some detail. Example applications are also shown.
Bober MZ, Santamaria C (2009) Direction-sensitive line detection operator for image data,
A method of detecting lines in an image comprises using one or more masks for detecting lines in one or more directions out of horizontal, vertical, left diagonal and right diagonal, and further comprises one or more additional masks for detecting lines in one or more additional directions.
Bober MZ, Paschalakis S (2010) Methods of representing and analysing images,
A method of representing and analyzing images comprises producing a plurality of descriptors of an image at one or more scales and for one ore more color channels, said descriptors capturing color content and interrelation information within the regions, and associating the descriptors in a plurality of ways based on their characteristics such as scale, color channel, feature semantics, and region, and comparing such representations of images to assess the similarity of images.
Bober MZ (2009) Method, apparatus, computer program, computer system, and computer-readable storage medium for representing and searching for an object in an image,
A method of representing an object appearing in a still or video image, by processing signals corresponding to the image, comprises deriving a plurality of numerical values associated with features appearing on the outline of an object starting from an arbitrary point on the outline and applying a predetermined ordering to said values to arrive at a representative of the outline.
Bober MZ (2007) Method, apparatus, computer program, computer system and computer-readable storage for representing and searching for an object in an image,
A method of representing an object appearing in a still or video image, by processing signals corresponding to the image, comprises deriving a curvature scale space (CSS) representation of the object outline by smoothing the object outline, deriving at least one additional parameter reflecting the shape or mass distribution of a smoothed version of the original curve, and associating the CSS representation and the additional parameter as a shape descriptor of the object.
Messina A, Burgos FM, Preda M, Lepsoy S, Bober M, Bertola D, Paschalakis S (2015) Making second screen sustainable in media production: The BRIDGET approach, TVX 2015 - Proceedings of the ACM International Conference on Interactive Experiences for TV and Online Video pp. 155-160
This paper presents work in progress of the European Commission FP7 project BRIDGET "BRIDging the Gap for Enhanced broadcasT". The project is developing innovative technology and the underlying architecture for efficient production of second screen applications for broadcasters and media companies. The project advancements include novel front-end authoring tools as well as back-end enabling technologies such as visual search, media structure analysis and 3D A/V reconstruction to support new editorial workflows.
Bober MZ, Skarbek W (2008) Method and apparatus for processing image data,
A method of representing a data distribution derived from an object or image by processing signals corresponding to the object or image comprising deriving an approximate representation of the data distribution and analysing the errors of the data elements when expressed in terms of the approximate representation.
Bober MZ (2009) Method, apparatus, computer program, computer system, and computer-readable storage medium for representing and searching for an object in an image,
A method of representing an object appearing in a still or video image, by processing signals corresponding to the image, comprises deriving a curvature scale space (CSS) representation of the object outline by smoothing the object outline, deriving at least one additional parameter reflecting the shape or mass distribution of a smoothed version of the original curve, and associating the CSS representation and the additional parameter as a shape descriptor of the object.
Bober MZ (2007) Method and device for processing and for searching for an object by signals corresponding to images,
A method of representing an object appearing in a still or video image, by processing signals corresponding to the image, comprises deriving the peak values in CSS space for the object outline and applying a non-linear transformation to said peak values to arrive at a representation of the outline.
Paschalakis S, Lee P, Bober M (2003) An FPGA system for the high speed extraction, normalization and classification of moment descriptors, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2778 pp. 543-552
We propose a new FPGA system for the high speed extraction, normalization and classification of moment descriptors. Moments are extensively used in computer vision, most recently in the MPEG-7 standard for the region shape descriptor. The computational complexity of such methods has been partially addressed by the proposal of custom hardware architectures for the fast computation of moments. However, a complete system for the extraction, normalization and classification of moment descriptors has not yet been suggested. Our system is a hybrid, relying partly on a very fast parallel processing structure and partly on a custom built, low cost, reprogrammable processing unit. Within the latter, we also propose FPGA circuits for low cost double precision floating-point arithmetic. Our system achieves the extraction and classification of invariant descriptors for hundreds or even thousands of intensity or color images per second and is ideal for high speed and/or volume applications. © Springer-Verlag Berlin Heidelberg 2003.
Bober MZ, Sibiryakov A (2011) Robust image registration,
A method of estimating a transformation between a pair of images, comprises estimating local transformations for a plurality of regions of the images to derive a set of estimated transformations, and selecting a subset of said estimated local transformations as estimated global transformations for the image.
Santamaria C, Bober M, Szajnowski W (2004) Texture analysis using level-crossing statistics, Proceedings - International Conference on Pattern Recognition 2 pp. 712-715
We present a novel statistical texture descriptor employing level-crossing statistics. Images are first mapped into ID signals using space-filling curves, such as Peano or Hilbert curves, and texture features are extracted via signal-dependent sampling. Texture parameters are based on the level-crossing statistics of the ID signal, i.e. crossing rate, crossing slope and sojourn time. Despite the simplicity of texture features used, our approach offers state-of-the art performance in the texture classification and texture segmentation tasks, outperforming other tested algorithms.
Bober MZ (2007) Method and device for processing and for searching for an object by signals corresponding to images,
A method of representing an object appearing in a still or video image, by processing signals corresponding to the image, comprises deriving the peak values in CSS space for the object outline and applying a non-linear transformation to said peak values to arrive at a representation of the outline.
Husain S, Bober M (2014) ROBUST AND SCALABLE AGGREGATION OF LOCAL FEATURES FOR ULTRA LARGE-SCALE RETRIEVAL, 2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP) pp. 2799-2803 IEEE
© 2014 IEEE. This paper is concerned with design of a compact, binary and scalable image representation that is easy to compute, fast to match and delivers beyond state-of-the-art performance in visual recognition of objects, buildings and scenes. A novel descriptor is proposed which combines rank-based multi-assignment with robust aggregation framework and cluster/bit selection mechanisms for size scalability. Extensive performance evaluation is presented, including experiments within the state-of-The art pipeline developed by the MPEG group standardising Compact Descriptors for Visual Search (CVDS).
Bober M, Price W, Atkinson J (2000) Contour shape descriptor for MPEG-7 and its applications, Digest of Technical Papers - IEEE International Conference on Consumer Electronics pp. 286-287
This paper presents a technical overview and performance evaluation of the contour shape descriptor adopted for the MPEG-7 Experimental Model [2]. It also describes several examples of its application to new MPEG-7 compliant products.
Badenas J, Bober M, Pla F (2001) Segmenting traffic scenes from grey level and motion information, Pattern Analysis and Applications 4 (1) pp. 28-38
This paper is concerned with an efficient estimation and segmentation of 2D motion from image sequences, with the focus on traffic monitoring applications. In order to reduce the computational load to achieve real-time implementation, the proposed approach makes use of simplifying assumptions that the camera is stationary, and that the projection of vehicles motion on the image plane can he approximated by translation. We show that satisfactory results can be achieved even under such apparently restrictive assumptions. The use of 2D motion analysis and the pre-segmentation stage significantly reduces the computational load, and the region-based motion estimator gives robustness to noise and changes in the illumination conditions.
Badenas J, Bober M, Pla F (1997) Motion and intensity-based segmentation and its application to traffice monitoring, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 1310 pp. 502-509
© Springer-Verlag Berlin Heidelberg 1997.This paper is concerned with an efficient estimation and segmentation of 2-D motion from image sequences, with the focus on traffic monitoring applications. In order to reduce the computational load and facilitate real-time implementation, the proposed approach makes use of simplifying assumptions that the camera is stationary and that the projection of vehicles motion on the image plane can be approximated by translation. We show that a good performance can be achieved even under such apparently restrictive assumptions. To further reduce processing time, we perform gray-level based segmentation that extracts regions of uniform intensity. Subsequently, we estimate motion for the regions. Regions moving with the coherent motion are allowed to merge. The use of 2D motion analysis and the pre-segmentation stage significantly reduces the computational load, and the region-based estimator gives robustness to noise and changes of illumination.
Husain S, Bober MZ (2016) On Aggregation of local binary descriptors, ICME MMC 2016 Proceedings
This paper addresses the problem of aggregating local binary descriptors for large scale image retrieval in mobile scenarios. Binary descriptors are becoming increasingly popular, especially in mobile applications, as they deliver high matching speed, have a small memory footprint and are fast to extract. However, little research has been done on how to efficiently aggregate binary descriptors. Direct application of methods developed for conventional descriptors, such as SIFT, results in unsatisfactory performance. In this paper we introduce and evaluate several algorithms to compress high-dimensional binary local descriptors, for efficient retrieval in large databases. In addition, we propose a robust global image representation; Binary Robust Visual Descriptor (B-RVD), with rank-based multi-assignment of local descriptors and direction-based aggregation, achieved by the use of L1-norm on residual vectors. The performance of the B-RVD is further improved by balancing the variances of residual vector directions in order to maximize the discriminatory power of the aggregated vectors. Standard datasets and measures have been used for evaluation showing significant improvement of around 4% mean Average Precision as compared to the state-of-the-art.
Bober MZ, Szajnowski WJ (2008) Determining statistical descriptors of a signal from a set of its samples,
An entity is subjected to an interrogating signal, and the reflection from the entity is repeatedly sampled to obtain a first set of values each dependent on the intensity of the reflected signal. A logarithmic transformation is applied to the sample values to obtain a second set of values. A set of descriptor values is derived, the set comprising at least a first descriptor value (L) representing the difference between the mean and the median of the second set of values, and a second descriptor value (D) representing the mean of the absolute value of the deviation between each second set value and an average of the second set of values.
Paschalakis S, Wnukowicz K, Bober M (2011) Low-cost hierarchical video segmentation for consumer electronics applications, Digest of Technical Papers - IEEE International Conference on Consumer Electronics pp. 79-80
This paper presents a new technique for the hierarchical segmentation of video content. Such a segmentation allows users to quickly and efficiently navigate a video, for example to find a specific point, skip unwanted parts, skim parts previously viewed, and so on. The proposed method achieves high segmentation accuracy at a low computational cost, making it suitable for implementation on mainstream consumer electronics. ©2011 IEEE.
Bober MZ (2005) Method and apparatus for motion vector field encoding,
Method and apparatus for motion vector field encoding

Abstract

A method and apparatus for representing motion in a sequence of digitized images derives a dense motion vector field and vector quantizes the motion vector field.

Tirunagari S, Poh N, Bober MZ, Windridge D (2015) Windowed DMD as a microtexture descriptor for finger vein counter-spoofing in biometrics., WIFS pp. 1-6
Recent studies have shown that it is possible to attack a finger vein (FV) based biometric system using printed materials. In this study, we propose a novel method to detect spoofing of static finger vein images using Windowed Dynamic mode decomposition (W-DMD). This is an atemporal variant of the recently proposed Dynamic Mode Decomposition for image sequences. The proposed method achieves better results when compared to established methods such as local binary patterns (LBP), discrete wavelet transforms (DWT), histogram of gradients (HoG), and filter methods such as range-filters, standard deviation filters (STD) and entropy filters, when using SVM with a minimum intersection kernel. The overall pipeline which consists ofW-DMD and SVM, proves to be efficient, and convenient to use, given the absence of additional parameter tuning requirements. The effectiveness of our methodology is demonstrated using FV-Spoofing-Attack database which is publicly available. Our test results show that W-DMD can successfully detect printed finger vein images because they contain micro-level artefacts that not only differ in quality but also in light reflection properties compared to valid/live finger vein images.
Tirunagari S, Poh N, Wells K, Bober M, Gorden I, Windridge D (2017) Movement correction in DCE-MRI through windowed and reconstruction dynamic mode decomposition, Machine Vision and Applications 28 (3-4) pp. 393-407 Springer Verlag
Images of the kidneys using dynamic contrast-enhanced magnetic resonance renography (DCE-MRR) contains unwanted complex organ motion due to respiration. This gives rise to motion artefacts that hinder the clinical assessment of kidney function. However, due to the rapid change in contrast agent within the DCE-MR image sequence, commonly used intensity-based image registration techniques are likely to fail. While semi-automated approaches involving human experts are a possible alternative, they pose significant drawbacks including inter-observer variability, and the bottleneck introduced through manual inspection of the multiplicity of images produced during a DCE-MRR study. To address this issue, we present a novel automated, registration-free movement correction approach based on windowed and reconstruction variants of dynamic mode decomposition (WR-DMD). Our proposed method is validated on ten different healthy volunteers? kidney DCE-MRI data sets. The results, using block-matching-block evaluation on the image sequence produced by WR-DMD, show the elimination of 99%99% of mean motion magnitude when compared to the original data sets, thereby demonstrating the viability of automatic movement correction using WR-DMD.
Husain S, Bober M (2014) Robust and scalable aggregation of local features for ultra large-scale retrieval, Image Processing (ICIP), 2014 IEEE International Conference on pp. 2799-2803
This paper is concerned with design of a compact, binary and scalable image representation that is easy to compute, fast to match and delivers beyond state-of-the-art performance in visual recognition of objects, buildings and scenes. A novel descriptor is proposed which combines rank-based multi-assignment with robust aggregation framework and cluster/bit selection mechanisms for size scalability. Extensive performance evaluation is presented, including experiments within the state-of-the art pipeline developed by the MPEG group standardising Compact Descriptors for Visual Search (CVDS).