
Dr Frank Guerin
About
Biography
Frank Guerin joined the Computer Science Department at Surrey in 2020. Before this he was Lecturer/Senior Lecturer at the University of Aberdeen and before that a PhD student in Imperial College.
His research interest is in Artificial Intelligence, where he has published in robotics, vision, language processing, and machine learning. He has established a track record of taking ideas from psychology research and importing them to Artificial Intelligence areas such as developmental robotics, and more recently mainstream robotics; for example learning spatial relationships autonomously, inspired by infant early means-ends behaviour, and applying the psychologist Barsalou's ideas to allow a robot system to use context effectively.
Guerin devoted significant time to learning about psychology, and engaging with Psychologists. He initiated and was the main organiser for a 2013 Dagstuhl seminar bringing together leading international psychologists and roboticists. His IEEE TAMD journal paper bringing developmental psychology ideas to the robotics community is his most cited paper. He has been invited to speak at international meetings to provide a perspective that bridges the gap between psychology and robotics, e.g., OpenEASE Fall School 2019, Xperience Summer School 2013; workshops: ICRA 2013, ICDL-EpiRob 2014, RSS 2015, IROS 2015, FEEL project Paris 2014.
ResearchResearch interests
I am an AI researcher interested in the kinds of tasks that are easy for humans but hard for AI: in robotics, computer vision, and language. I have interest in psychology, and how humans do things, and I like to borrow ideas from human processes to implement them in AI systems that tackle real-world tasks.
I am interested in new approaches to knowledge representation and reasoning for AI systems which get over the rigidity and brittleness of classical approaches. Human knowledge of a concept such as "container" is very flexible to be applied to a wide range of objects (pots, cups, bags, boxes, rooms, buildings, ...) and applied in more abstract domains (political parties, controls on disease spread, damage from a scandal). The actions associated with container (insert, remove, escape, seal, breach, etc.) can also be adapted appropriately. These are not special or unusual or effortful applications of a concept for humans. Every human concept is effortlessly applied to a wide range of situations, and examples are everywhere in everyday cognition. It suggests that the human representation and reasoning machinery has a design which facilitates this.
I am looking for (non-classical) knowledge representation and reasoning which could allow AI systems to transfer knowledge of basic concepts in a human-like way. Vision example: give a system some knowledge of the types of tool (e.g. spatulas) that can lift pancakes or eggs from a pan, and enable it to transfer the concept to other objects which afford the same action. Manipulation example: give a system some knowledge of containers and container actions and enable it to apply this across a variety of scenarios. Language processing example: in understanding, given knowledge of concepts such as container and associated actions, to be able to recognise it in varied instantiations, e.g. where not literally used.
Paper about Projection idea in Artificial Intelligence https://arxiv.org/abs/2103.13512
Paper about task-driven representation in robotics Robot Manipulation in Open Environments
Highlighted Research
Research interests
I am an AI researcher interested in the kinds of tasks that are easy for humans but hard for AI: in robotics, computer vision, and language. I have interest in psychology, and how humans do things, and I like to borrow ideas from human processes to implement them in AI systems that tackle real-world tasks.
I am interested in new approaches to knowledge representation and reasoning for AI systems which get over the rigidity and brittleness of classical approaches. Human knowledge of a concept such as "container" is very flexible to be applied to a wide range of objects (pots, cups, bags, boxes, rooms, buildings, ...) and applied in more abstract domains (political parties, controls on disease spread, damage from a scandal). The actions associated with container (insert, remove, escape, seal, breach, etc.) can also be adapted appropriately. These are not special or unusual or effortful applications of a concept for humans. Every human concept is effortlessly applied to a wide range of situations, and examples are everywhere in everyday cognition. It suggests that the human representation and reasoning machinery has a design which facilitates this.
I am looking for (non-classical) knowledge representation and reasoning which could allow AI systems to transfer knowledge of basic concepts in a human-like way. Vision example: give a system some knowledge of the types of tool (e.g. spatulas) that can lift pancakes or eggs from a pan, and enable it to transfer the concept to other objects which afford the same action. Manipulation example: give a system some knowledge of containers and container actions and enable it to apply this across a variety of scenarios. Language processing example: in understanding, given knowledge of concepts such as container and associated actions, to be able to recognise it in varied instantiations, e.g. where not literally used.
Paper about Projection idea in Artificial Intelligence https://arxiv.org/abs/2103.13512
Paper about task-driven representation in robotics Robot Manipulation in Open Environments
Highlighted Research
Teaching
BSc:
COM3013 COMPUTATIONAL INTELLIGENCE
COM3025 DEEP LEARNING AND ADVANCED AI
MSc Data Science:
COMM002 MSC DISSERTATION
COMPUTATIONAL INTELLIGENCE
COMM056 ALIGNING BUSINESS VALUE WITH RESEARCH AND DEVELOPMENT
Publications
Robots acting in everyday environments need a good knowledge of how a manipulation action can affect pairs of objects in a relationship, such as "inside" or "behind" or "on top." These relationships afford certain means-end actions such as pulling a container to retrieve the contents, or pulling a tool to retrieve a desired object. We investigate how these relational affordances could be learned by a robot from its own action experience. A major challenge in this approach is to reduce the number of training samples needed to achieve accuracy, and hence we investigate an approach which can leverage past knowledge to accelerate current learning (which we call bootstrapping). We learn random forest-based affordance predictors from visual inputs and demonstrate two approaches to knowledge transfer for bootstrapping. In the first approach [direct bootstrapping (DB)], the state-space for a new affordance predictor is augmented with the output of previously learned affordances. In the second approach [category-based bootstrapping (CB)], we form categories that capture underlying commonalities of a pair of existing affordances and augment the state-space with this category classifier's output. In addition, we introduce a novel heuristic, which suggests how a large set of potential affordance categories can be pruned to leave only those categories which are most promising for bootstrapping future affordances. Our results show that both bootstrapping approaches outperform learning without bootstrapping. We also show that there is no significant difference in performance between DB and CB.
Metaphoric expressions are widespread in natural language, posing a significant challenge for various natural language processing tasks such as Machine Translation. Current word embedding based metaphor identification models cannot identify the exact metaphorical words within a sentence. In this paper, we propose an unsupervised learning method that identifies and interprets metaphors at word-level without any preprocessing, outperforming strong baselines in the metaphor identification task. Our model extends to interpret the identified metaphors, paraphrasing them into their literal counterparts, so that they can be better translated by machines. We evaluated this with two popular translation systems for English to Chinese, showing that our model improved the systems significantly.
A robot can feasibly be given knowledge of a set of tools for manipulation activities (e.g. hammer, knife, spatula). If the robot then operates outside a closed environment it is likely to face situations where the tool it knows is not available, but alternative unknown tools are present. We tackle the problem of finding the best substitute tool based solely on 3D vision data. Our approach has simple hand-coded models of known tools in terms of superquadrics and relationships among them. Our system attempts to fit these models to point clouds of unknown tools, producing a numeric value for how good a fit is. This value can be used to rate candidate substitutes. We explicitly control how closely each part of a tool must match our model, under direction from parameters of a target task. We allow bottom-up information from segmentation to dictate the sizes that should be considered for various parts of the tool. These ideas allow for a flexible matching so that tools may be superficially quite different, but similar in the way that matters. We evaluate our system's ratings relative to other approaches and relative to human performance in the same task. This is an approach to knowledge transfer, via a suitable representation and reasoning engine, and we discuss how this could be extended to transfer in planning.
We tackle the problem disentangling the latent space of an autoencoder in order to separate labelled attribute information from other characteristic information. This then allows us to change selected attributes while preserving other information. Our method, matrix subspace projection, is much simpler than previous approaches to latent space factorisation, for example not requiring multiple discriminators or a careful weighting among their loss functions. Furthermore our new model can be applied to autoencoders as a plugin, and works across diverse domains such as images or text. We demonstrate the utility of our method for attribute manipulation in autoencoders trained across varied domains, using both human evaluation and automated methods. The quality of generation of our new model (e.g. reconstruction, conditional generation) is highly competitive to a number of strong baselines.
End-to-end training with Deep Neural Networks (DNN) is a currently popular method for metaphor identification. However, standard sequence tagging models do not explicitly take advantage of linguistic theories of metaphor identification. We experiment with two DNN models which are inspired by two human metaphor identification procedures. By testing on three public datasets, we find that our models achieve state-of-the-art performance in end-to-end metaphor identification.
We address the problem of executing tool-using manipulation skills in scenarios where the objects to be used may vary. We assume that point clouds of the tool and target object can be obtained, but no interpretation or further knowledge about these objects is provided. The system must interpret the point clouds and decide how to use the tool to complete a manipulation task with a target object; this means it must adjust motion trajectories appropriately to complete the task. We tackle three everyday manipulations: scraping material from a tool into a container, cutting, and scooping from a container. Our solution encodes these manipulation skills in a generic way, with parameters that can be filled in at run-time via queries to a robot perception module; the perception module abstracts the functional parts of the tool and extracts key parameters that are needed for the task. The approach is evaluated in simulation and with selected examples on a PR2 robot.
Robots performing everyday tasks such as cooking in a kitchen need to be able to deal with variations in the household tools that may be available. Given a particular task and a set of tools available, the robot needs to be able to assess which would be the best tool for the task, and also where to grasp that tool and how to orient it. This requires an understanding of what is important in a tool for a given task, and how the grasping and orientation relate to performance in the task. A robot can learn this by trying out many examples. This learning can be faster if these trials are done in simulation using tool models acquired from the Web. We provide a semi-automatic pipeline to process 3D models from the Web, allowing us to train from many different tools and their uses in simulation. We represent a tool object and its grasp and orientation using 21 parameters which capture the shapes and sizes of principal parts and the relationships among them. We then learn a `task function' that maps this 21 parameter vector to a value describing how effective it is for a particular task. Our trained system can then process the unsegmented point cloud of a new tool and output a score and a way of using the tool for a particular task. We compare our approach with the closest one in the literature and show that we achieve significantly better results.