10am - 11am

Wednesday 30 November 2022

Active Sampling for Computer Vision

PhD Viva Open Presentation by Yusuf Duman

All Welcome!


back to all events

This event has passed


Active Sampling for Computer Vision


Mammalian vision systems do not view an entire scene in one go. Instead, rapid eye movements known as saccades point the high density areas of photoreceptors in the retina toward areas of detail. Consequently, a detailed view of the scene can be built by the brain using a relatively small amount of information. By integrating the imaging in this manner the quality of the visual processing found deeper within the brain is improved as it only has to process the salient details.  

A scanning pixel camera presents a way of realising this in hardware. A low cost, low power sensor system that builds up an image of a scene by rapidly sampling a sensor that sits behind a moveable set of optics. Advances in micro-actuation allows the low-cost optics to be scanned across the scene in a programmable manner. This can lead to the lens-less zooming effects by simply varying the scan speed or the sample rate. Furthermore, the amount of information that this type of sensor provides can be varied by simply changing the scan pattern. 

However, a major drawback of this type of sensor system is that it takes a long time to image a full scene when compared to a traditional CCD camera. This motivates the work of this thesis to find a scan pattern that allows the best use of the saccade-like behaviour of a scanning pixel camera. By focusing on scene details relevant to a predefined computer vision task, this thesis demonstrates that it is possible to produce a scan pattern that allows us to overcome this major issue. In this thesis we provide methods of generating useful sample maps that enhance the abilities of a scanning pixel camera and make it an efficient part of a computer vision pipeline. 

By actively providing sample patterns to the scanning pixel camera, the sensor becomes an active part of the computer vision system, rather than simply a source of data. This is similar to the purpose of saccades in a mammalian vision system. In doing this we create another challenge that is addressed in this thesis. Namely, now that the downstream computer vision task only has a partial view of the scene, that may be effected by different types of artefacting found in scanning pixel camera, how do these tasks need to be adapted to deal with data in this form.   

This thesis takes the approach of making several assumptions about a scanning pixel camera to adapt existing computer vision techniques to find useful sample patterns. These initial assumptions include that scene is static and is imaged with full knowledge of its contents. We first use this simple model of a scanning pixel camera to establish the best possible way of generating sample maps. These assumptions are then progressively removed in order to finally reach a method that can be deployed on a real system. We begin by making the scene viewed dynamic, forcing the system to predict future steps even if it has a complete scan of the present. We then completely remove any knowledge of the scene to begin with, forcing the scanning pixel camera to explore the scene before it knows what to look at. 

The sample maps generated are designed to generate images to be used by a downstream computer vision, rather than viewed by a human. To evaluate this we apply this technique to a variety of computer vision tasks and demonstrate that such a piece of hardware can form a useful part of a computer vision system. These tasks include object classification, tracking and instance segmentation.

Attend the Event

This is a free online event open to everyone. You can attend via Zoom.