Dr Janko Calic
Lecturer
Qualifications: PhD
Email: j.calic@surrey.ac.uk
Phone: Work: 01483 68 4739
Room no: 06 BB 01
Further information
Biography
I am a Lecturer in Multimedia systems and HCI at the I-Lab, a part of the Centre for Communication Systems Research, Department of Electronic Engineering, Faculty of Engineering and Physical Sciences at University of Surrey.
Previously, I was with the Department of Computer Science at University of Bristol from 2003-2007. I was awarded my PhD at Multimedia & Vision Research Group, Queen Mary, University of London in 2004.
Research Interests
My main research interest is to find new ways of interacting with visual multimedia. The background for this stems from my PhD in content based video retrieval, at Queen Mary, University of London. At present I am mainly focusing on video processing for HCI and novel metaphors for visual interaction with digital media.
Further details can be found on my personal web page.
Publications
Highlights
- .
(2012) 'Scalable comic-like summaries and layout disturbance'. IEEE Multimedia, IEEE Transactions on, 14 (4), pp. 1290-1297.Full text is available at: http://epubs.surrey.ac.uk/533728/
Abstract
This paper describes an efficient system for scalable video summarisation that exploits comic-like summaries and multi-scale representations to facilitate interactivity and balance between content coverage and compactness. Due to the layout disturbance induced by the transitions between scales, a new heuristic algorithm is proposed to restrict changes to bounded summary segments. Conducted user evaluations show that the proposed methodology improves usability while keeping the summaries compact and informative.
- . (2010) 'Interactive search and browsing interface for large-scale visual repositories'. SPRINGER MULTIMEDIA TOOLS AND APPLICATIONS, 49 (3), pp. 513-528.
- . (2009) 'Object tracking in surveillance videos using compressed domain features from scalable bit-streams'. ELSEVIER SCIENCE BV SIGNAL PROCESS-IMAGE, 24 (10), pp. 814-824.
- . (2009) New Video Abstractions for Summarisation and Browsing. in Grgić M, Delac K, Ghanbari M (eds.) Recent Advances in Multimedia Signal Processing and Communications Springer Verlag Article number 3
- .
(2009) 'Fast analysis of scalable video for adaptive browsing interfaces'. Elsevier Computer Vision and Image Understanding, 113 (3), pp. 425-434.Full text is available at: http://epubs.surrey.ac.uk/532147/
Abstract
Driven by a high demand for user-centred video interfaces and recent advances in scalable video coding technology, this work introduces a novel framework for video browsing by utilising inherently hierarchical compressed-domain features of scalable video and e cient dynamic video summarisation. This approach enables instant adaptability of generated video summaries to user requirements, available channel bandwidth as well as display size. By utilising compressed domain features an e cient hierarchical analysis of motion activity at di erent layers of complexity is achieved. Exploiting a contour evolution algorithm, a scale space of temporal video descriptors is generated, enabling dynamic video summarisation in real-time. Given the spatial resources of terminal display and generated video summary, the nal browsing layout is generated utilising an unsupervised robust spectral clus- tering technique and a fast discrete optimisation algorithm. Results show excellent scalability of the video browsing interface and good algorithm efficiency.
- .
(2007) 'Efficient Layout of Comic-Like Video Summaries'. IEEE Circuits and Systems Society Circuits and Systems for Video Technology, IEEE Transactions on, 17 (7), pp. 931-936.Full text is available at: http://epubs.surrey.ac.uk/532145/
Abstract
In order to represent large amounts of information in the form of a video key-frame summary, this paper studies narrative grammar of comics, and using its universal and intuitive rules, lays out visual summaries in an efficient and user centered way. The system ranks importance of key-frame sizes in the final layout by balancing the dominant visual representability and discovery of unanticipated content utilizing a specific cost function and an unsupervised robust spectral clustering technique. A final layout is created using an optimization algorithm based on dynamic programming. Algorithm efficiency and robustness are demonstrated by comparing the results with the optimal panelling solutions.
Journal articles
- . (2012) 'Lagrange-based video encoder optimisation to enhance motion representation in the compressed-domain'. Proceedings - IEEE International Conference on Multimedia and Expo, , pp. 479-484.
- .
(2012) 'Scalable comic-like summaries and layout disturbance'. IEEE Multimedia, IEEE Transactions on, 14 (4), pp. 1290-1297.Full text is available at: http://epubs.surrey.ac.uk/533728/
Abstract
This paper describes an efficient system for scalable video summarisation that exploits comic-like summaries and multi-scale representations to facilitate interactivity and balance between content coverage and compactness. Due to the layout disturbance induced by the transitions between scales, a new heuristic algorithm is proposed to restrict changes to bounded summary segments. Conducted user evaluations show that the proposed methodology improves usability while keeping the summaries compact and informative.
- . (2012) 'How do users select stereoscopic 3D content?'. 3DTV-Conference,
- .
(2012) 'Analysis of user requirements in interactive 3D video systems'. Advances in Human-Computer Interaction, 2012doi: 10.1155/2012/343197Full text is available at: http://epubs.surrey.ac.uk/738451/
Abstract
The recent development of three dimensional (3D) display technologies has resulted in a proliferation of 3D video production and broadcasting, attracting a lot of research into capture, compression and delivery of stereoscopic content. However, the predominant design practice of interactions with 3D video content has failed to address its differences and possibilities in comparison to the existing 2D video interactions. This paper presents a study of user requirements related to interaction with the stereoscopic 3D video. The study suggests that the change of view, zoom in/out, dynamic video browsing, and textual information are the most relevant interactions with stereoscopic 3D video. In addition, we identified a strong demand for object selection that resulted in a follow-up study of user preferences in 3D selection using virtual-hand and ray-casting metaphors. These results indicate that interaction modality affects users' decision of object selection in terms of chosen location in 3D, while user attitudes do not have significant impact. Furthermore, the ray-casting-based interaction modality using Wiimote can outperform the volume-based interaction modality using mouse and keyboard for object positioning accuracy. © 2012 Haiyue Yuan et al.
- . (2010) 'Interactive search and browsing interface for large-scale visual repositories'. SPRINGER MULTIMEDIA TOOLS AND APPLICATIONS, 49 (3), pp. 513-528.
- . (2009) 'Object tracking in surveillance videos using compressed domain features from scalable bit-streams'. ELSEVIER SCIENCE BV SIGNAL PROCESS-IMAGE, 24 (10), pp. 814-824.
- .
(2009) 'Fast analysis of scalable video for adaptive browsing interfaces'. Elsevier Computer Vision and Image Understanding, 113 (3), pp. 425-434.Full text is available at: http://epubs.surrey.ac.uk/532147/
Abstract
Driven by a high demand for user-centred video interfaces and recent advances in scalable video coding technology, this work introduces a novel framework for video browsing by utilising inherently hierarchical compressed-domain features of scalable video and e cient dynamic video summarisation. This approach enables instant adaptability of generated video summaries to user requirements, available channel bandwidth as well as display size. By utilising compressed domain features an e cient hierarchical analysis of motion activity at di erent layers of complexity is achieved. Exploiting a contour evolution algorithm, a scale space of temporal video descriptors is generated, enabling dynamic video summarisation in real-time. Given the spatial resources of terminal display and generated video summary, the nal browsing layout is generated utilising an unsupervised robust spectral clus- tering technique and a fast discrete optimisation algorithm. Results show excellent scalability of the video browsing interface and good algorithm efficiency.
- .
(2009) 'Object tracking in surveillance videos using compressed-domain features from scalable bit-streams'. Elsevier Signal Processing: Image Communication, 24 (10), pp. 814-824.Full text is available at: http://epubs.surrey.ac.uk/532156/
Abstract
Recent developments in the video coding technology brought new possibilities of utilising inherently embedded features of the encoded bit-stream in applications such as video adaptation and analysis. Due to the proliferation of surveillance videos there is a strong demand for highly efficient and reliable algorithms for object tracking. This paper presents a new approach for the fast compressed domain analysis utilising motion data from the encoded bit-streams in order to achieve low-processing complexity of object tracking in the surveillance videos. The algorithm estimates the trajectory of video objects by using compressed domain motion vectors extracted directly from standard H.264/MPEG-4 Advanced Video Coding (AVC) and Scalable Video Coding (SVC) bit-streams. The experimental results show comparable tracking precision when evaluated against the standard algorithms in uncompressed domain, while maintaining low computational complexity and fast processing time, thus making the algorithm suitable for real time and streaming applications where good estimates of object trajectories have to be computed fast.
- .
(2007) 'Efficient Layout of Comic-Like Video Summaries'. IEEE Circuits and Systems Society Circuits and Systems for Video Technology, IEEE Transactions on, 17 (7), pp. 931-936.Full text is available at: http://epubs.surrey.ac.uk/532145/
Abstract
In order to represent large amounts of information in the form of a video key-frame summary, this paper studies narrative grammar of comics, and using its universal and intuitive rules, lays out visual summaries in an efficient and user centered way. The system ranks importance of key-frame sizes in the final layout by balancing the dominant visual representability and discovery of unanticipated content utilizing a specific cost function and an unsupervised robust spectral clustering technique. A final layout is created using an optimization algorithm based on dynamic programming. Algorithm efficiency and robustness are demonstrated by comparing the results with the optimal panelling solutions.
- .
(2007) 'Compact visualisation of video summaries'. Hindawi Publishing Corporation (now SpringerOpen) EURASIP Journal on Advances in Signal Processing, 2007 Article number 19496 doi: 10.1155/2007/19496Full text is available at: http://epubs.surrey.ac.uk/533729/
Abstract
This paper presents a system for compact and intuitive video summarisation aimed at both high-end professional production environments and small-screen portable devices. To represent large amounts of information in the form of a video key-frame summary, this paper studies the narrative grammar of comics, and using its universal and intuitive rules, lays out visual summaries in an efficient and user-centered way. In addition, the systemexploits visual attention modelling and rapid serial visual presentation to generate highly compact summaries on mobile devices. A robust real-time algorithm for key-frame extraction is presented. The system ranks importance of key-frame sizes in the final layout by balancing the dominant visual representability and discovery of unanticipated content utilising a specific cost function and an unsupervised robust spectral clustering technique. A final layout is created using an optimisation algorithm based on dynamic programming. Algorithm efficiency and robustness are demonstrated by comparing the results with a manually labelled ground truth and with optimal panelling solutions.
- .
(2006) 'Analysing animal behaviour in wildlife videos using face detection and tracking'. IEE Vision, Image and Signal Processing, IEE Proceedings -, 153 (3), pp. 305-312.Full text is available at: http://epubs.surrey.ac.uk/531566/
Abstract
An algorithm that categorises animal locomotive behaviour by combining detection and tracking of animal faces in wildlife videos is presented. As an example, the algorithm is applied to lion faces. The detection algorithm is based on a human face detection method, utilising Haar-like features and AdaBoost classifiers. The face tracking is implemented by applying a specific interest model that combines low-level feature tracking with the detection algorithm. By combining the two methods in a specific tracking model, reliable and temporally coherent detection/tracking of animal faces is achieved. The information generated by the tracker is used to automatically annotate the animal's locomotive behaviour. The annotation classes of locomotive processes for a given animal species are predefined by a large semantic taxonomy on wildlife domain. The experimental results are presented.
- .
(2004) 'A rule-based video annotation system'. IEEE Circuits and Systems Society Circuits and Systems for Video Technology, IEEE Transactions on, 14 (5), pp. 622-633.Full text is available at: http://epubs.surrey.ac.uk/531495/
Abstract
A generic system for automatic annotation of videos is introduced. The proposed approach is based on the premise that the rules needed to infer a set of high-level concepts from low-level descriptors cannot be defined a priori. Rather, knowledge embedded in the database and interaction with an expert user is exploited to enable system learning. Underpinning the system at the implementation level is preannotated data that dynamically creates signification links between a set of low-level features extracted directly from the video dataset and high-level semantic concepts defined in the lexicon. The lexicon may consist of words, icons, or any set of symbols that convey the meaning to the user. Thus, the lexicon is contingent on the user, application, time, and the entire context of the annotation process. The main system modules use fuzzy logic and rule mining techniques to approximate human-like reasoning. A rule-knowledge base is created on a small sample selected by the expert user during the learning phase. Using this rule-knowledge base, the system automatically assigns keywords from the lexicon to nonannotated video clips in the database. Using common low-level video representations, the system performance was assessed on a database containing hundreds of broadcasting videos. The experimental evaluation showed robust and high annotation accuracy. The system architecture offers straightforward expansion to relevance feedback and autonomous learning capabilities.
- . (2004) 'Highly Efficient Low-Level Feature Extraction For Video Representation And Retrieval'. PhD thesis,
- . (2004) 'Tracking animals in wildlife videos using face detection'. European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology,
- .
(2002) 'Temporal Segmentation of MPEG video streams'. Hindawi Publishing Corporation (now SpringerOpen) EURASIP Journal on Applied Signal Processing (now Advances in Signal Processing), 2002 (6), pp. 561-565.Full text is available at: http://epubs.surrey.ac.uk/532154/
Abstract
Many algorithms for temporal video partitioning rely on the analysis of uncompressed video features. Since the information relevant to the partitioning process can be extracted directly from the MPEG compressed stream, higher efficiency can be achieved utilizing information from the MPEG compressed domain. This paper introduces a real-time algorithm for scene change detection that analyses the statistics of the macroblock features extracted directly from the MPEG stream. A method for extraction of the continuous frame difference that transforms the 3D video stream into a 1D curve is presented. This transform is then further employed to extract temporal units within the analysed video sequence. Results of computer simulations are reported.
- . (2002) 'New perspectives of video indexing and retrieval'. MPhil Transfer Report,
Conference papers
- . (2012) 'User requirements elicitation of stereoscopic 3D video interaction'. Proceedings of the 2012 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2012, , pp. 31-36.
- . (2011) 'Video encoder optimization for efficient video analysis in resource-limited systems'. IEEE IEEE 6th International Conference on Industrial and Information Systems, Kandy, Sri Lanka: ICIIS 2011, pp. 175-180.
- .
(2011) 'Application-aware video coding architecture using camera and object motion-models'. IEEE IEEE 6th International Conference on Industrial and Information Systems, Kandy, Sri Lanka: ICIIS 2011, pp. 76-81.Full text is available at: http://epubs.surrey.ac.uk/532142/
Abstract
The proliferation of video consumption, especially over mobile devices, has created a demand for efficient interactive video applications and high-level video analysis. This is particularly significant in real-time applications and resource-limited scenarios. Pixel-domain video processing is often inefficient for many of these applications due to its complexity, whereas compressed domain processing offer fast but unreliable results. In order to achieve fast and effective video processing, this paper proposes a novel video encoding architecture that facilitate efficient compressed domain processing, while maintaining compliance with the mainstream coding standards. This is achieved by optimizing the accuracy of motion information embedded in the compressed video, in addition to compression efficiency. In a motion detection application, we demonstrate that the motion estimated by the proposed encoder can be directly used to extract object information, as opposed to conventionally coded video. The incurred rate distortion overheads can be weighed against the reduced processing required for video analysis targeting a wide spectrum of computer vision applications.
- . (2010) 'User study of the free-eye photo browsing interface'. 11th International Workshop on Image Analysis for Multimedia Interactive Services, Delft, The Netherlands: 11th WIAMIS
- .
(2010) 'Facilitating Motion-Based Vision Applicatons By Combined Video Analysis And Coding'. IEEE Proceedings of IEEE Conference on Acoustics Speech and Signal Processing, Dallas, USA: ICASSP 2010, pp. 1102-1105.Full text is available at: http://epubs.surrey.ac.uk/532146/
Abstract
In order to jointly optimise the quality of video coding on one hand and video analysis on the other, this paper proposes a novel approach to enhance the reusable information content in compressed video domain. By introducing a hierarchical content driven motion estimation mechanism at the encoder, complemented by a statistical prediction of region-of-interest, this approach reduces the complexity and yet increases robustness of the compressed domain vision analysis applications. Taking the object tracking application as an example, we demonstrate that the motion vectors generated by the proposed method can be directly used to extract object information, achieving tracking performance comparable with a pixel domain approach. In addition, we show that the incurred rate distortion (RD) overheads and the effect on encoder complexity are minimal, especially when compared to the reduction of processing required for video analysis targeting a wide spectrum of computer vision applications.
- .
(2009) 'FreeEye - Interactive Intuitive Interface for Large-scale Image Browsing'. ACM Proceedings of the 17th ACM International Conference on Multimedia, Beijing, China: ACM Multimedia 2009, pp. 757-760.Full text is available at: http://epubs.surrey.ac.uk/532148/
Abstract
Intuitive interfaces have become increasingly important multimedia applications, from personal photo collection to professional management systems. This paper presents a novel intuitive interactive interface for browsing of large image collections that visualizes underlying structure of the dataset by its size and spatial relations. In order to achieve this, images are initially clustered using an unsupervised graph-based clustering algorithm. By selecting images in a hierarchical layout of the screen, user can intuitively navigate through the collection. The experimental results demonstrate a significant speed-up in a content search scenario compared to a standard browsing interface, as well as inherent intuitiveness of the system.
- .
(2009) 'FreeEye - Intuitive Summarisation of Photo Collections'. Assocation of Computing Machinery Proceedings of the 17th International Conference on Multimedia 2009, Beijing, China: ACM Multimedia 2009, pp. 1127-1128.Full text is available at: http://epubs.surrey.ac.uk/532149/
Abstract
This paper presents user evaluation of the FreeEye tool for intuitive browsing and summarization of large-scale photo collections. The tool was tested in three different personal photo selection scenarios: a short-time event, a vacation and a yearbook. The experiments were conducted with five participants, evaluating their satisfaction with the summarization result and the overall process. The results demonstrate good usability of the FreeEye tool and improvement when compared to the standard methods of the participants for selection from large personal photo collections.
- .
(2009) 'Combining Activity Theory and Grounded Theory for the Design of Collaborative Interfaces'. Springer Lecture Notes in Computer Science: Human Centred Design, San Diego, USA: HCI International 2009 5619, pp. 312-321.Full text is available at: http://epubs.surrey.ac.uk/532143/
Abstract
In remote tabletop collaboration multiple users interact with the system and with each other. Thus, two levels of interaction human-computer interaction and human-human interaction exist in parallel. In order to improve remote tabletop systems for multiple users both levels have to be taken into account. This requires an in-depth analysis achieved by qualitative methods. This paper illustrates how a combination of Activity Theory and Grounded Theory can help researchers and designers to improve and develop better collaborative interfaces. Findings reported here are based on three video recordings that have been collected during a quasi-experiment.
- .
(2008) 'Dynamic layout of visual summaries for scalable video'. IEEE Proceedings of International Workshop on Content-Based Multimedia Indexing, London, UK: CBMI 2008, pp. 46-50.Full text is available at: http://epubs.surrey.ac.uk/532144/
Abstract
The paper brings a novel method for generating visual summaries of scalable videos. The generated summaries can dynamically adapt to requirements defined by display size, userpsilas needs or channel limitations. It utilises compressed domain features coupled with efficient contour evolution algorithm in order to generate a scale space of temporal video descriptors. The layout of the visual summary is created using an efficient graph clustering technique and a fast discrete optimisation algorithm, enabling dynamic video summarisation in real-time. The experimental results show good scalability of the dynamic layout and highly efficient generation of visual summaries.
- .
(2008) 'HIERARCHICAL MOTION ANALYSIS FOR FAST SUMMARISATION OF SCALABLE CODED VIDEO'. IEEE 2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, Hannover, GERMANY: IEEE International Conference on Multimedia and Expo (ICME 2008), pp. 1309-1312.Full text is available at: http://epubs.surrey.ac.uk/2394/
- .
(2008) 'FLEXIBLE GENERATION OF VIDEO SUMMARIES FROM LAYERED VIDEO BIT-STREAMS'. IEEE 2008 15TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-5, San Diego, CA: 15th IEEE International Conference on Image Processing (ICIP 2008), pp. 2516-2519.Full text is available at: http://epubs.surrey.ac.uk/2354/
- . (2008) 'User Centric Media of the Future Internet'. IEEE COMPUTER SOC NGMAST 2008: SECOND INTERNATIONAL CONFERENCE ON NEXT GENERATION MOBILE APPLICATIONS, SERVICES, AND TECHNOLOGIES, PROCEEDINGS, Cardiff, WALES: 2nd International Conference on Next Generation Mobile Applications, Services and Technologies, pp. 433-438.
- . (2008) 'Optimising Video Summaries Using Unsupervised Clustering'. CROATIAN SOCIETY ELECTRONICS MARINE PROCEEDINGS ELMAR-2008, VOLS 1 AND 2, Zadar, CROATIA: 50th International Symposium ELMAR, pp. 451-454.
- .
(2006) 'Optimising video summaries for mobile devices using visual attention modelling'. ACM Proceedings of the 2nd International Conference on Mobile Multimedia Communications, Alghero, Italy: MobiMedia '06Full text is available at: http://epubs.surrey.ac.uk/531568/
Abstract
In order to represent large video collections in the form of key-frame summary on small screen devices, this paper exploits methodology of the visual attention modelling and rapid serial visual presentation. This approach results in an intuitive layout of efficiently generated video summaries. A robust real-time algorithm for key-frame extraction is presented. The system ranks importance of key-frame regions in the final layout by exploiting visual attention modelling. A final layout is created using an optimisation algorithm based on dynamic programming. Algorithm efficiency and robustness are demonstrated by comparing the results with the manually labelled ground truth.
- . (2006) 'Real-time face detection and tracking of animals'. IEEE NEUREL 2006: Eight Seminar on Neural Network Applications in Electrical Engineering, Proceedings, Univ Begrade, Fac Elect Engn, Belgrade, SERBIA: 8th Seminar on Neural Network Applications in Electrical Engineering, pp. 27-32.
- .
(2005) 'An overview of multimodal video representation for semantic analysis'. IET The 2nd European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology, London, UK: EWIMT 2005 2005 (11099), pp. 39-45.doi: 10.1049/ic.2005.0708Full text is available at: http://epubs.surrey.ac.uk/532141/
Abstract
This paper gives an overview of approaches to video representation targeting semantic analysis for content-based indexing and retrieval. It highlights the major achievements of the existing methodologies and sheds new light to the challenges that are still unsolved. The problem of adaptive representation of digital multimedia is critically assessed and some novel ideas are presented. In addition, the concept of video multimodality is reevaluated and redefined in order to introduce the modalities like editing technique. An extensive literature survey on the topics involved is given.
- .
(2005) 'A Survey on Multimodal Video Representation for Semantic Retrieval'. IEEE Proceedings of The International Conference on 'Computer as a Tool', Belgrade, Serbia: EUROCON 2005 2, pp. 135-138.Full text is available at: http://epubs.surrey.ac.uk/532140/
Abstract
This paper surveys the approaches to video representation, focusing on semantic analysis for content-based indexing and retrieval. A problem of adaptive representation of digital multimedia is critically assessed and some novel ideas are presented. Furthermore, the concept of video multimodality is reevaluated and redefined in order to introduce modalities such as editing technique or affect to the audience
- . (2005) 'Towards intelligent content based retrieval of wildlife videos'. Proceedings of 6th International Workshop on Image Analysis for Multimedia Interactive Services, Montreux, Switzerland: WIAMIS 2005 5
- .
(2004) 'Spatial analysis in key-frame extraction using video segmentation'. Workshop on Image Analysis for Multimedia Interactive Services, Lisboa, Portugal: WIAMIS 2004Full text is available at: http://epubs.surrey.ac.uk/532153/
- . (2004) 'ICBR - Multimedia management system for intelligent content based retrieval'. SPRINGER-VERLAG BERLIN IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, Dublin, IRELAND: 3rd International Conference on Image and Video Retrieval (CIVR 2004) 3115, pp. 601-609.
- .
(2004) 'Automated visual recognition of individual African penguins'. Ushuaia, Argentina: Fifth International Penguin Conference
[ Status: Unpublished ]Full text is available at: http://epubs.surrey.ac.uk/531569/Abstract
African penguins (Spheniscus demersus) carry a pattern of black spots on their chests that does not change from season to season during their adult life. Further, as far as we can tell, no two penguins have exactly the same pattern. We have developed a real-time system that can confidently locate African penguins whose chests are visible within video sequences or still images. An extraction of the chest spot pattern allows the generation of a unique biometrical identifier for each penguin. Using these identifiers an authentication of filmed or photographed African penguins against a population database can be performed. This paper provides a detailed technical description of the developed system and outlines the scope and the conditions of application
- .
(2002) 'A multiresolution technique for video indexing and retrieval'. IEEE Proceedings of 2002 International Conference on Image Processing, Rochester, USA: ICIP 2002 1, pp. I-952-I-955.Full text is available at: http://epubs.surrey.ac.uk/532139/
Abstract
This paper presents a novel approach to multiresolution analysis and scalability in video indexing and retrieval. A scalable algorithm for video parsing and key-frame extraction is introduced. The technique is based on real-time analysis of MPEG motion variables and scalable metrics simplification by discrete contour evolution. Furthermore, a hierarchical key-frame retrieval method using scalable colour histogram analysis is presented. It offers customisable levels of detail in the descriptor space, where the relevance order is determined by degradation of the image, and not by degradation of the image histogram. To assess the performance of the approach, several experiments have been conducted. Selected results are reported.
- .
(2002) 'Temporal video segmentation for real-time key frame extraction'. IEEE Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, USA: ICASSP 2002 4, pp. IV-3632-IV-3635.Full text is available at: http://epubs.surrey.ac.uk/532155/
Abstract
The extensive amount of media coverage today, generates difficulties in identifying and selecting desired information. Browsing and retrieval systems become more and more necessary in order to support users with powerful and easy-to-use tools for searching, browsing and summarization of information content. The starting point for these tasks in video browsing and retrieval systems is the low level analysis of video content, especially the segmentation of video content into shots. This paper presents a fast and efficient way to detect shot changes using only the temporal distribution of macroblock types in MPEG compressed video. The notion of a dominant reference frame is introduced. A dominant frame denotes the reference frame (I or P) used as prediction reference for most of the macroblocks from a subsequent B frame.
- .
(2002) 'Efficient key-frame extraction and video analysis'. IEEE Proceedings of the International Conference on Information Technology: Coding and Computing, Las Vegas, USA: ITCC '02, pp. 28-33.Full text is available at: http://epubs.surrey.ac.uk/532150/
Abstract
Content-based video indexing and retrieval has its foundations in the analyses of the prime video temporal structures. Consequently, technologies for video segmentation and key-frame extraction have become crucial for the development of advanced digital video systems. Conventional algorithms for video partitioning and key-frame extraction are mainly implemented autonomously. By focusing the analysis on compressed video features, this paper introduces a real-time algorithm for scene change detection and key-frame extraction that generates frame difference metrics by analysing the statistics of the macro-block features extracted from an MPEG compressed stream. The key-frame extraction method is implemented using difference metrics in curve simplification by means of a discrete contour evolution algorithm. This approach resulted in a fast and robust algorithm. Results of computer simulations are reported.
- .
(2001) 'Towards real-time shot detection in the mpeg compressed domain'. Tampere University of Technology Proceedings of the 3rd Workshop on Image Analysis for Multimedia Interactive Services, Tampere, Finland: WIAMIS 2001Full text is available at: http://epubs.surrey.ac.uk/532151/
Abstract
As content based video indexing and retrieval has its foundations in the prime video structures, such as a shot or a scene, the algorithms for video partitioning have become crucial in contemporary development of digital video technology. Conventional algorithms for video partitioning mainly focus on the analysis of compressed video features, since the information relevant to the partitioning process can be extracted directly from the MPEG compressed stream and used for the detection of shot boundaries. However, most of the proposed algorithms do not show real time capabilities that are essential for video applications. This paper introduces a real time algorithm for cut detection. It analyses the statistics of the features extracted from the MPEG compressed stream, such as the macroblock type, and extends the same metrics to algorithms for gradual change detection. Our analysis led to a fast and robust algorithm for cut detection. Future research will be directed towards the use of the same concept for improving the real-time gradual change detection algorithms. Results of computer simulations are reported.
Book chapters
- . (2009) 'New Video Abstractions for Summarisation and Browsing'. in Grgić M, Delac K, Ghanbari M (eds.) Recent Advances in Multimedia Signal Processing and Communications Springer Verlag Article number 3
Teaching
I teach the following modules:
