Dr John Collomosse
Lecturer
Email: j.collomosse@surrey.ac.uk
Phone: Work: 01483 68 6035
Room no: 04 AB 05
Further information
Biography
Further details can be found on my personal web page.
Publications
Highlights
- .
(2012) 'State of the Art: A Taxonomy of Artistic Stylization Techniques for Images and Video'. IEEE IEEE Transactions on Visualization and Computer Graphics, volume forthcomingFull text is available at: http://epubs.surrey.ac.uk/721461/
Abstract
This paper surveys the field of non-photorealistic rendering (NPR), focusing on techniques for transforming 2D input (images and video) into artistically stylized renderings. We first present a taxonomy of the 2D NPR algorithms developed over the past two decades, structured according to the design characteristics and behavior of each technique. We then describe a chronology of development from the semi-automatic paint systems of the early nineties, through to the automated painterly rendering systems of the late nineties driven by image gradient analysis. Two complementary trends in the NPR literature are then addressed, with reference to our taxonomy. First, the fusion of higher level computer vision and NPR, illustrating the trends toward scene analysis to drive artistic abstraction and diversity of style. Second, the evolution of local processing approaches toward edge-aware filtering for real-time stylization of images and video. The survey then concludes with a discussion of open challenges for 2D NPR identified in recent NPR symposia, including topics such as user and aesthetic evaluation.
- .
(2012) 'Probabilistic Motion Diffusion of Labeling Priors for Coherent Video Segmentation'. IEEE IEEE Transactions on Multimedia, 14 (2), pp. 389-400.Full text is available at: http://epubs.surrey.ac.uk/721460/
Abstract
We present a robust algorithm for temporally coherent video segmentation. Our approach is driven by multi-label graph cut applied to successive frames, fusing information from the current frame with an appearance model and labeling priors propagated forwarded from past frames. We propagate using a novel motion diffusion model, producing a per-pixel motion distribution that mitigates against cumulative estimation errors inherent in systems adopting “hard” decisions on pixel motion at each frame. Further, we encourage spatial coherence by imposing label consistency constraints within image regions (super-pixels) obtained via a bank of unsupervised frame segmentations, such as mean-shift. We demonstrate quantitative improvements in accuracy over state-of-the-art methods on a variety of sequences exhibiting clutter and agile motion, adopting the Berkeley methodology for our comparative evaluation.
- .
(2012) 'Visual Sentences for Pose Retrieval over Low-resolution Cross-media Dance Collections'. IEEE IEEE Transactions on Multimedia, Full text is available at: http://epubs.surrey.ac.uk/605306/
Abstract
We describe a system for matching human posture (pose) across a large cross-media archive of dance footage spanning nearly 100 years, comprising digitized photographs and videos of rehearsals and performances. This footage presents unique challenges due to its age, quality and diversity. We propose a forest-like pose representation combining visual structure (self-similarity) descriptors over multiple scales, without explicitly detecting limb positions which would be infeasible for our data. We explore two complementary multi-scale representations, applying passage retrieval and latent Dirichlet allocation (LDA) techniques inspired by the the text retrieval domain, to the problem of pose matching. The result is a robust system capable of quickly searching large cross-media collections for similarity to a visually specified query pose. We evaluate over a crosssection of the UK National Research Centre for Dance’s (UK-NRCD), and the Siobhan Davies Replay’s (SDR) digital dance archives, using visual queries supplied by dance professionals. We demonstrate significant performance improvements over two base-lines; classical single and multi-scale Bag of Visual Words (BoVW) and spatial pyramid kernel (SPK) matching.
- .
(2012) 'Virtual Volumetric Graphics on Commodity Displays using 3D Viewer Tracking'. Springer International Journal of Computer Vision (IJCV), Full text is available at: http://epubs.surrey.ac.uk/605307/
Abstract
Three dimensional (3D) displays typically rely on stereo disparity, requiring specialized hardware to be worn or embedded in the display. We present a novel 3D graphics display system for volumetric scene visualization using only standard 2D display hardware and a pair of calibrated web cameras. Our computer vision-based system requires no worn or other special hardware. Rather than producing the depth illusion through disparity, we deliver a full volumetric 3D visualization - enabling users to interactively explore 3D scenes by varying their viewing position and angle according to the tracked 3D position of their face and eyes. We incorporate a novel wand-based calibration that allows the cameras to be placed at arbitrary positions and orientations relative to the display. The resulting system operates at real-time speeds (~25 fps) with low latency (120-225 ms) delivering a compelling natural user interface and immersive experience for 3D viewing. In addition to objective evaluation of display stability and responsiveness, we report on user trials comparing users' timings on a spatial orientation task.
- .
(2007) 'RTcams: A new perspective on nonphotorealistic rendering from photographs'. IEEE COMPUTER SOC IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 13 (5), pp. 966-979.Full text is available at: http://epubs.surrey.ac.uk/721466/
Journal articles
- .
(2012) 'State of the Art: A Taxonomy of Artistic Stylization Techniques for Images and Video'. IEEE IEEE Transactions on Visualization and Computer Graphics, volume forthcomingFull text is available at: http://epubs.surrey.ac.uk/721461/
Abstract
This paper surveys the field of non-photorealistic rendering (NPR), focusing on techniques for transforming 2D input (images and video) into artistically stylized renderings. We first present a taxonomy of the 2D NPR algorithms developed over the past two decades, structured according to the design characteristics and behavior of each technique. We then describe a chronology of development from the semi-automatic paint systems of the early nineties, through to the automated painterly rendering systems of the late nineties driven by image gradient analysis. Two complementary trends in the NPR literature are then addressed, with reference to our taxonomy. First, the fusion of higher level computer vision and NPR, illustrating the trends toward scene analysis to drive artistic abstraction and diversity of style. Second, the evolution of local processing approaches toward edge-aware filtering for real-time stylization of images and video. The survey then concludes with a discussion of open challenges for 2D NPR identified in recent NPR symposia, including topics such as user and aesthetic evaluation.
- .
(2012) 'Probabilistic Motion Diffusion of Labeling Priors for Coherent Video Segmentation'. IEEE IEEE Transactions on Multimedia, 14 (2), pp. 389-400.Full text is available at: http://epubs.surrey.ac.uk/721460/
Abstract
We present a robust algorithm for temporally coherent video segmentation. Our approach is driven by multi-label graph cut applied to successive frames, fusing information from the current frame with an appearance model and labeling priors propagated forwarded from past frames. We propagate using a novel motion diffusion model, producing a per-pixel motion distribution that mitigates against cumulative estimation errors inherent in systems adopting “hard” decisions on pixel motion at each frame. Further, we encourage spatial coherence by imposing label consistency constraints within image regions (super-pixels) obtained via a bank of unsupervised frame segmentations, such as mean-shift. We demonstrate quantitative improvements in accuracy over state-of-the-art methods on a variety of sequences exhibiting clutter and agile motion, adopting the Berkeley methodology for our comparative evaluation.
- .
(2012) 'Virtual Volumetric Graphics on Commodity Displays using 3D Viewer Tracking'. Springer International Journal of Computer Vision (IJCV), Full text is available at: http://epubs.surrey.ac.uk/605307/
Abstract
Three dimensional (3D) displays typically rely on stereo disparity, requiring specialized hardware to be worn or embedded in the display. We present a novel 3D graphics display system for volumetric scene visualization using only standard 2D display hardware and a pair of calibrated web cameras. Our computer vision-based system requires no worn or other special hardware. Rather than producing the depth illusion through disparity, we deliver a full volumetric 3D visualization - enabling users to interactively explore 3D scenes by varying their viewing position and angle according to the tracked 3D position of their face and eyes. We incorporate a novel wand-based calibration that allows the cameras to be placed at arbitrary positions and orientations relative to the display. The resulting system operates at real-time speeds (~25 fps) with low latency (120-225 ms) delivering a compelling natural user interface and immersive experience for 3D viewing. In addition to objective evaluation of display stability and responsiveness, we report on user trials comparing users' timings on a spatial orientation task.
- .
(2012) 'Visual Sentences for Pose Retrieval over Low-resolution Cross-media Dance Collections'. IEEE IEEE Transactions on Multimedia, Full text is available at: http://epubs.surrey.ac.uk/605306/
Abstract
We describe a system for matching human posture (pose) across a large cross-media archive of dance footage spanning nearly 100 years, comprising digitized photographs and videos of rehearsals and performances. This footage presents unique challenges due to its age, quality and diversity. We propose a forest-like pose representation combining visual structure (self-similarity) descriptors over multiple scales, without explicitly detecting limb positions which would be infeasible for our data. We explore two complementary multi-scale representations, applying passage retrieval and latent Dirichlet allocation (LDA) techniques inspired by the the text retrieval domain, to the problem of pose matching. The result is a robust system capable of quickly searching large cross-media collections for similarity to a visually specified query pose. We evaluate over a crosssection of the UK National Research Centre for Dance’s (UK-NRCD), and the Siobhan Davies Replay’s (SDR) digital dance archives, using visual queries supplied by dance professionals. We demonstrate significant performance improvements over two base-lines; classical single and multi-scale Bag of Visual Words (BoVW) and spatial pyramid kernel (SPK) matching.
- . (2011) 'Stylized Ambient Displays of Digital Media Collections'. Computers and Graphics, 1 (35)
- . (2010) 'Older User Experience: An Evaluation with a Location-based Mobile Multimedia Service'. IEEE IEEE Vehicular Technology Magazine, 5 Article number 1 , pp. 31-38-31-38.
- .
(2007) 'RTcams: A new perspective on nonphotorealistic rendering from photographs'. IEEE COMPUTER SOC IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 13 (5), pp. 966-979.Full text is available at: http://epubs.surrey.ac.uk/721466/
- .
(2007) 'Reverse storyboarding for video retrieval'. IET Conference Publications, (534 CP)doi: 10.1049/cp:20070049
- .
(2007) 'Screen codes: Efficient data transfer from video displays to mobile devices'. IET Conference Publications, (534 CP)doi: 10.1049/cp:20070065Full text is available at: http://epubs.surrey.ac.uk/721465/
Abstract
We present "screen codes" - a space- and time-efficient, aesthetically compelling method for transferring data from a display (e.g. a VDU or projected public display) to a camera equipped mobile device. Screen codes encode data as a grid of luminosity fluctuations within an arbitrary image, displayed on the video screen. These fluctuations, manifested as a "twinkling" within the image, are observed by the mobile device over time and decoded to reconstruct the data. Observation is passive; there is no back-channel from the camera to the display. Novel spatial and temporal coding strategies are employed, tailored to channel noise conditions. The display may be observed from any angle or orientation.
- . (2006) 'Video motion analysis for the synthesis of dynamic cues and Futurist art'. ACADEMIC PRESS INC ELSEVIER SCIENCE GRAPHICAL MODELS, 68 (5-6), pp. 402-414.
- .
(2006) 'Salience-adaptive painterly rendering using genetic search'. WORLD SCIENTIFIC PUBL CO PTE LTD INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 15 (4), pp. 551-575.Full text is available at: http://epubs.surrey.ac.uk/721467/
- .
(2006) 'Real-time environment mapping for stylised augmented reality'. IET Conference Publications, (516 CP), pp. 184-184.doi: 10.1049/cp:20061950
- .
(2006) 'Viewpoint invariant image retrieval for context in urban environments'. IET Conference Publications, (516 CP), pp. 177-177.doi: 10.1049/cp:20061943
- . (2005) 'Video Paintbox: The fine art of video painting'. PERGAMON-ELSEVIER SCIENCE LTD COMPUTERS & GRAPHICS-UK, 29 (6), pp. 862-870.
- .
(2005) 'Stroke surfaces: Temporally coherent artistic animations from video'. IEEE COMPUTER SOC IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 11 (5), pp. 540-549.doi: 10.1109/TVCG.2005.85
- .
(2003) 'Cubist style rendering from photographs'. IEEE COMPUTER SOC IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 9 (4), pp. 443-453.Full text is available at: http://epubs.surrey.ac.uk/721463/
Conference papers
- .
(2012) 'Annotated Free-hand Sketches for Video Retrieval using Object Semantics and Motion'. Springer Lecture Notes in Computer Science Proc. Intl. Conf. on Multimedia Modelling, Austria: International Conference on Multimedia Modelling (MMM) 7131, pp. 473-484.Full text is available at: http://epubs.surrey.ac.uk/605304/
Abstract
We present a novel video retrieval system that accepts annotated free-hand sketches as queries. Existing sketch based video retrieval (SBVR) systems enable the appearance and movements of objects to be searched naturally through pictorial representations. Whilst visually expressive, such systems present an imprecise vehicle for conveying the semantics (e.g. object types) within a scene. Our contribution is to fuse the semantic richness of text with the expressivity of sketch, to create a hybrid `semantic sketch' based video retrieval system. Trajectory extraction and clustering are applied to pre-process each clip into a video object representation that we augment with object classification and colour information. The result is a system capable of searching videos based on the desired colour, motion path, and semantic labels of the objects present. We evaluate the performance of our system over the TSF dataset of broadcast sports footage.
- .
(2011) 'A Bag of Features Approach to Ambient Fall Detection for Domestic Elder-care'. 1st Intl. Symposium on Ambient Technologies (AMBIENT), Barcelona, Spain: 1st Intl. Symposium on Ambient Technologies (AMBIENT)Full text is available at: http://epubs.surrey.ac.uk/605305/
Abstract
Falls in the home are a major source of injury for the elderly. The affordability of commodity video cameras is prompting the development of ambient intelligent environments to monitor the occurence of falls in the home. This paper describes an automated fall detection system, capable of tracking movement and detecting falls in real-time. In particular we explore the application of the Bag of Features paradigm, frequently applied to general activity recognition in Computer Vision, to the domestic fall detection problem. We show that fall detection is feasible using such a framework, evaluted our approach in both controlled test scenarios and domestic scenarios exhibiting uncontrolled fall direction and visually cluttered environments.
- . (2011) 'A Bag of Visual Words based Query Generative Model'. ACM MultiMedia Modelling (MMM 2011)
- .
(2011) 'A Bag-of-Regions approach to Sketch-based Image Retrieval'. IEEE International Conference on Image Processing (ICIP), International Conference on Image Processing (ICIP), pp. 3661-3664.Full text is available at: http://epubs.surrey.ac.uk/605303/
Abstract
This paper presents a sketch-based image retrieval system using a bag-of-region representation of images. Regions from the nodes of a hierarchical region tree range in various scales of details. They have appealing properties for object level inference such as the naturally encoded shape and scale information of objects and the specified domains on which to compute features without being affected by clutter from outside the region. The proposed approach builds shape descriptor on the salient shape among the clutters and thus yields significant performance improvements over the previous results on three leading descriptors in Bag-of-Words framework for sketch based image retrieval. Matched region also facilitates the localization of sketched object within the retrieved image.
- .
(2010) 'Gradient Field Descriptor for Sketch based Retrieval and Localization'. IEEE Proceedings of Intl. Conf. on Image Proc. (ICIP), Hong Kong: IEEE International Conference on Image Processing, pp. 1025-1028.Full text is available at: http://epubs.surrey.ac.uk/605299/
Abstract
We present an image retrieval system driven by free-hand sketched queries depicting shape. We introduce Gradient Field HoG (GF-HOG) as a depiction invariant image descriptor, encapsulating local spatial structure in the sketch and facilitating efficient codebook based retrieval. We show improved retrieval accuracy over 3 leading descriptors (Self Similarity, SIFT, HoG) across two datasets (Flickr160, ETHZ extended objects), and explain how GF-HOG can be combined with RANSAC to localize sketched objects within relevant images. We also demonstrate a prototype sketch driven photo montage application based on our system.
- .
(2010) 'Multi-label Propagation for Coherent Video Segmentation and Artistic Stylization'. IEEE Proceedings of Intl. Conf. on Image Proc. (ICIP), Hong Kong: ICIP, pp. 3005-3008.Full text is available at: http://epubs.surrey.ac.uk/605300/
Abstract
We present a new algorithm for segmenting video frames into temporally stable colored regions, applying our technique to create artistic stylizations (e.g. cartoons and paintings) from real video sequences. Our approach is based on a multilabel graph cut applied to successive frames, in which the color data term and label priors are incrementally updated and propagated over time. We demonstrate coherent segmentation and stylization over a variety of home videos.
- .
(2010) 'Motion-sketch based Video Retrieval using a Trellis Levenshtein Distance'. Intl. Conf. on Pattern Recognition (ICPR), Istanbul, Turkey: Intl. Conference on Pattern Recognition (ICPR) 2010Full text is available at: http://epubs.surrey.ac.uk/605296/
Abstract
We present a fast technique for retrieving video clips using free-hand sketched queries. Visual keypoints within each video are detected and tracked to form short trajectories, which are clustered to form a set of spacetime tokens summarising video content. A Viterbi process matches a space-time graph of tokens to a description of colour and motion extracted from the query sketch. Inaccuracies in the sketched query are ameliorated by computing path cost using a Levenshtein (edit) distance. We evaluate over datasets of sports footage.
- .
(2010) 'Video Stylization for Digital Ambient Displays of Home Movies'. Annecy, France : ACM Press Proceedings ACM 4th Intl. Symposium on Non-photorealistic Animation and Rendering (NPAR), ACM Symposium on Non-photorealistic Animation and Rendering (NPAR) 2010, pp. 137-146.Full text is available at: http://epubs.surrey.ac.uk/605297/
Abstract
Falling hardware costs have prompted an explosion in casual video capture by domestic users. Yet, this video is infrequently accessed post-capture and often lies dormant on users’ PCs. We present a system to breathe life into home video repositories, drawing upon artistic stylization to create a “Digital Ambient Display” that automatically selects, stylizes and transitions between videos in a semantically meaningful sequence. We present a novel algorithm based on multi-label graph cut for segmenting video into temporally coherent region maps. These maps are used to both stylize video into cartoons and paintings, and measure visual similarity between frames for smooth sequence transitions. We demonstrate coherent segmentation and stylization over a variety of home videos.
- .
(2009) 'Storyboard sketches for content based video retrieval'. IEEE Proceedings of Intl. Conf. Computer Vision (ICCV), Kyoto: International Conference on Computer Vision (ICCV), pp. 245-252.Full text is available at: http://epubs.surrey.ac.uk/600590/
Abstract
We present a novel Content Based Video Retrieval (CBVR) system, driven by free-hand sketch queries depicting both objects and their movement (via dynamic cues; streak-lines and arrows). Our main contribution is a probabilistic model of video clips (based on Linear Dynamical Systems), leading to an algorithm for matching descriptions of sketched objects to video. We demonstrate our model fitting to clips under static and moving camera conditions, exhibiting linear and oscillatory motion. We evaluate retrieval on two real video data sets, and on a video data set exhibiting controlled variation in shape, color, motion and clutter.
- . (2009) 'Mobile Augmented Reality based 3D Snapshots'. Proc. GI-Workshop on VR/AR,
- .
(2009) 'An Evolutionary Approach to Automatic Video Editing'. IEEE COMPUTER SOC 2009 CONFERENCE FOR VISUAL MEDIA PRODUCTION: CVMP 2009, London, ENGLAND: 6th European Conference for Visual Media Production, pp. 127-134.doi: 10.1109/CVMP.2009.8
- . (2009) 'Storyboard sketches for content based video retrieval'. IEEE 2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), Kyoto, JAPAN: 12th IEEE International Conference on Computer Vision, pp. 245-252.
- . (2008) 'Arty Shapes'. Proceedings of Computational Aesthetics, , pp. 65-73-65-73.
- . (2008) 'Screen Codes: Visual Hyperlinks for Displays'. ICM Proceedings of the 9th Workshop on Mobile Computing Systems and Applications, Napa Valley, USA: HotMobile '08
- . (2008) 'Free-hand Sketch Grouping for Video Retrieval'. IEEE 19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, Tampa, FL: 19th International Conference on Pattern Recognition (ICPR 2008), pp. 884-887.
- . (2006) 'Empathic Painting: Interactive stylization using observed emotional state'. ACM Press Proceedings ACM 4th Intl. Symposium on Non-photorealistic Animation and Rendering (NPAR), , pp. 87-96-87-96.
- . (2006) 'Supervised genetic search for parameter selection in painterly rendering'. SPRINGER-VERLAG BERLIN APPLICATIONS OF EVOLUTIONARY COMPUTING, PROCEEDINGS, Budapest, HUNGARY: EvoWorkshops 2006 3907, pp. 599-610.
- . (2005) 'Rendering cartoon-style motion cues in post-production video'. ACADEMIC PRESS INC ELSEVIER SCIENCE GRAPHICAL MODELS, Bath Univ, Bath, ENGLAND: 1st International Conference on Vision, Video and Graphics 67 (6), pp. 549-564.
- . (2005) 'Motion analysis in video: dolls, dynamic cues and Modern Art'. Eurographics Proceedings of 2nd Intl. Conf on Vision Video and Graphics (VVG), , pp. 109-116-109-116.
- . (2005) 'Genetic paint: A search for salient paintings'. SPRINGER-VERLAG BERLIN APPLICATIONS OF EVOLUTIONARY COMPUTING, PROCEEDINGS, Lausanne, SWITZERLAND: EvoWorkshops 2005 3449, pp. 437-447.
- . (2004) 'A Mid-level Description of Video, with Application to Non-photorealistic Animation'. Proceedings 15th British Machine Vision Conference (BMVC), 1, pp. 7-16-7-16.
- . (2004) 'A Trainable Low-level Feature Detector'. IEEE Proceedings Intl. Conference on Pattern Recognition (ICPR), 1, pp. 708-711-708-711.
- . (2003) 'Video analysis for Cartoon-style Special Effects'. Proceedings 14th British Machine Vision Conference (BMVC), 2, pp. 749-758-749-758.
- . (2003) 'Cartoon-style Rendering of Motion from Video'. Eurographics Proceedings Video, Vision and Graphics (VVG), , pp. 117-124-117-124.
- . (2002) 'Painterly rendering using image salience'. IEEE COMPUTER SOC 20TH EUROGRAPHICS UK CONFERENCE, PROCEEDINGS, DE MONTFORT UNIV, LEICESTER, ENGLAND: 20th Eurographics UK Conference, pp. 122-128.
Book chapters
- . (2007) 'Evolutionary search for the artistic rendering of photographs'. in (ed.) The Art of Artificial Evolution: A Handbook Springer-Verlag Article number 2 , pp. 39-62-39-62.
Patents
- . (2009) Content Encoder and Decoder and Methods of Encoding and Decoding Content.
- . (2009) Method of Generating a Sequence of Display Frames For Display on a Display Device. US: Article number 12/179857
- . (2009) Encoder and Decoder and Methods of Encoding and Decoding Sequence Information.
Theses and dissertations
- .
(2004) Higher Level Techniques for the Artistic Rendering of Images and Video. University of BathFull text is available at: http://epubs.surrey.ac.uk/600592/
