Dr Frank Guerin

Senior Lecturer
+44 (0)1483 689195
17 BB 02

Academic and research departments

Department of Computer Science.



Research interests



Lauriane Rat-Fischer, Frank Guerin (2020)Benchmarking in Developmental Robotics, In: Metrics of Sensory Motor Coordination and Integration in Robots and Animalspp. 73-86 Springer International Publishing

There is at present no standard benchmarking for assessing and comparing the various existing works in developmental robotics. Developmental robotics is more of a “basic science” research endeavour than mainstream robotics, which is more application focussed. For this reason benchmarking for developmental robotics will need a more scientific basis, rather than a specific application focus. The solution we propose is to benchmark developmental robotics efforts against human infant capabilities at various ages. The proposal here may allow the community to showcase their efforts by demonstration on common tasks, and so to enable the comparison of approaches. It may also provide an agenda of incremental targets for research in the field.

Mona Ahmadian, Frank Guerin, Andrew Gilbert (2023)MOFO: MOtion FOcused Self-Supervision for Video Understanding

Self-supervised learning (SSL) techniques have recently produced outstanding results in learning visual representations from unlabeled videos. However, despite the importance of motion in supervised learning techniques for action recognition, SSL methods often do not explicitly consider motion information in videos. To address this issue, we propose MOFO (MOtion FOcused), a novel SSL method for focusing representation learning on the motion area of a video for action recognition. MOFO automatically detects motion areas in videos and uses these to guide the self-supervision task. We use a masked autoencoder that randomly masks out a high proportion of the input sequence and forces a specified percentage of the inside of the motion area to be masked and the remainder from outside. We further incorporate motion information into the finetuning step to emphasise motion in the downstream task. We demonstrate that our motion-focused innovations can significantly boost the performance of the currently leading SSL method (VideoMAE) for action recognition. Our proposed approach significantly improves the performance of the current SSL method for action recognition, indicating the importance of explicitly encoding motion in SSL.

Frank Guerin, Paulo Ferreira (2020)Robot Manipulation in Open Environments: New Perspectives, In: IEEE transactions on cognitive and developmental systems12(3)8731700pp. 669-675 IEEE

The problem of performing everyday manipulation tasks robustly in open environments is currently beyond the capabilities of artificially intelligent robots; humans are required. The difficulty arises from the high variability in open environments; it is not feasible to program for, or train for, every variation. This correspondence paper presents the case for a new approach to the problem, based on three mutually dependent ideas: 1) highly transferable manipulation skills; 2) choice of representation: a scene can be modeled in several different ways; and 3) top-down processes by which the robot's task can influence the bottom-up processes interpreting a scene. The approach we advocate is supported by evidence from what we know about humans, and also the approach is implicitly taken by human designers in designing representations for robots. We present brief results of an implementation of these ideas in robot vision, and give some guidelines for how the key ideas can be implemented more generally in practical robot systems.

Shun Wang, Yucheng Li, Chenghua Lin, Loïc Barrault, Frank Guerin Metaphor Detection with Effective Context Denoising, In: arXiv.org

We propose a novel RoBERTa-based model, RoPPT, which introduces a target-oriented parse tree structure in metaphor detection. Compared to existing models, RoPPT focuses on semantically relevant information and achieves the state-of-the-art on several main metaphor datasets. We also compare our approach against several popular denoising and pruning methods, demonstrating the effectiveness of our approach in context denoising. Our code and dataset can be found at https://github.com/MajiBear000/RoPPT

Yucheng Li, Shun Wang, Chenghua Lin, Frank Guerin, Loïc Barrault FrameBERT: Conceptual Metaphor Detection with Frame Embedding Learning, In: arXiv (Cornell University)

In this paper, we propose FrameBERT, a RoBERTa-based model that can explicitly learn and incorporate FrameNet Embeddings for concept-level metaphor detection. FrameBERT not only achieves better or comparable performance to the state-of-the-art, but also is more explainable and interpretable compared to existing models, attributing to its ability of accounting for external knowledge of FrameNet.

Yucheng Li, Frank Guerin, Chenghua Lin The Secret of Metaphor on Expressing Stronger Emotion, In: arXiv.org

Metaphors are proven to have stronger emotional impact than literal expressions. Although this conclusion is shown to be promising in benefiting various NLP applications, the reasons behind this phenomenon are not well studied. This paper conducts the first study in exploring how metaphors convey stronger emotion than their literal counterparts. We find that metaphors are generally more specific than literal expressions. The more specific property of metaphor can be one of the reasons for metaphors' superiority in emotion expression. When we compare metaphors with literal expressions with the same specificity level, the gap of emotion expressing ability between both reduces significantly. In addition, we observe specificity is crucial in literal language as well, as literal language can express stronger emotion by making it more specific.

Jing Su, Qingyun Dai, Frank Guerin, Mian Zhou (2021)BERT-hLSTMs: BERT and hierarchical LSTMs for visual storytelling, In: Computer speech & language67101169 Elsevier

Visual storytelling is a creative and challenging task, aiming to automatically generate a story-like description for a sequence of images. The descriptions generated by previous visual storytelling approaches lack coherence because they use word-level sequence generation methods and do not adequately consider sentence-level dependencies. To tackle this problem, we propose a novel hierarchical visual storytelling framework which separately models sentence-level and word-level semantics. We use the transformer-based BERT to obtain embeddings for sentences and words. We then employ a hierarchical LSTM network: the bottom LSTM receives as input the sentence vector representation from BERT, to learn the dependencies between the sentences corresponding to images, and the top LSTM is responsible for generating the corresponding word vector representations, taking input from the bottom LSTM. Experimental results demonstrate that our model outperforms most closely related baselines under automatic evaluation metrics BLEU and CIDEr, and also show the effectiveness of our method with human evaluation. (C) 2020 Elsevier Ltd. All rights reserved.

Severin Fichtl, Dirk Kraft, Norbert Kruger, Frank Guerin (2018)Bootstrapping Relational Affordances of Object Pairs Using Transfer, In: IEEE transactions on cognitive and developmental systems10(1)pp. 56-71 IEEE

Robots acting in everyday environments need a good knowledge of how a manipulation action can affect pairs of objects in a relationship, such as "inside" or "behind" or "on top." These relationships afford certain means-end actions such as pulling a container to retrieve the contents, or pulling a tool to retrieve a desired object. We investigate how these relational affordances could be learned by a robot from its own action experience. A major challenge in this approach is to reduce the number of training samples needed to achieve accuracy, and hence we investigate an approach which can leverage past knowledge to accelerate current learning (which we call bootstrapping). We learn random forest-based affordance predictors from visual inputs and demonstrate two approaches to knowledge transfer for bootstrapping. In the first approach [direct bootstrapping (DB)], the state-space for a new affordance predictor is augmented with the output of previously learned affordances. In the second approach [category-based bootstrapping (CB)], we form categories that capture underlying commonalities of a pair of existing affordances and augment the state-space with this category classifier's output. In addition, we introduce a novel heuristic, which suggests how a large set of potential affordance categories can be pruned to leave only those categories which are most promising for bootstrapping future affordances. Our results show that both bootstrapping approaches outperform learning without bootstrapping. We also show that there is no significant difference in performance between DB and CB.

Rui Mao, Chenghua Lin, FRANK GUERIN (2018)Word Embedding and WordNet Based Metaphor Identification and Interpretation, In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)pp. 1222-1231 Association for Computational Linguistics

Metaphoric expressions are widespread in natural language, posing a significant challenge for various natural language processing tasks such as Machine Translation. Current word embedding based metaphor identification models cannot identify the exact metaphorical words within a sentence. In this paper, we propose an unsupervised learning method that identifies and interprets metaphors at word-level without any preprocessing, outperforming strong baselines in the metaphor identification task. Our model extends to interpret the identified metaphors, paraphrasing them into their literal counterparts, so that they can be better translated by machines. We evaluated this with two popular translation systems for English to Chinese, showing that our model improved the systems significantly.

Paulo Abelha, Frank Guerin, Markus Schoeler (2016)A model-based approach to finding substitute tools in 3D vision data, In: Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA 2016)2016pp. 2471-2478 IEEE

A robot can feasibly be given knowledge of a set of tools for manipulation activities (e.g. hammer, knife, spatula). If the robot then operates outside a closed environment it is likely to face situations where the tool it knows is not available, but alternative unknown tools are present. We tackle the problem of finding the best substitute tool based solely on 3D vision data. Our approach has simple hand-coded models of known tools in terms of superquadrics and relationships among them. Our system attempts to fit these models to point clouds of unknown tools, producing a numeric value for how good a fit is. This value can be used to rate candidate substitutes. We explicitly control how closely each part of a tool must match our model, under direction from parameters of a target task. We allow bottom-up information from segmentation to dictate the sizes that should be considered for various parts of the tool. These ideas allow for a flexible matching so that tools may be superficially quite different, but similar in the way that matters. We evaluate our system's ratings relative to other approaches and relative to human performance in the same task. This is an approach to knowledge transfer, via a suitable representation and reasoning engine, and we discuss how this could be extended to transfer in planning.

Xiao Li, Chenghua Lin, Ruizhe Li, Chaozheng Wang, FRANK GUERIN (2020)Latent Space Factorisation and Manipulation via Matrix Subspace Projection, In: Proceedings of the 37th International Conference on Machine Learning119pp. 5916-5926

We tackle the problem disentangling the latent space of an autoencoder in order to separate labelled attribute information from other characteristic information. This then allows us to change selected attributes while preserving other information. Our method, matrix subspace projection, is much simpler than previous approaches to latent space factorisation, for example not requiring multiple discriminators or a careful weighting among their loss functions. Furthermore our new model can be applied to autoencoders as a plugin, and works across diverse domains such as images or text. We demonstrate the utility of our method for attribute manipulation in autoencoders trained across varied domains, using both human evaluation and automated methods. The quality of generation of our new model (e.g. reconstruction, conditional generation) is highly competitive to a number of strong baselines.

Rui Mao, Chenghua Lin, FRANK GUERIN (2019)End-to-End Sequential Metaphor Identification Inspired by Linguistic Theories, In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguisticspp. 3888-3898

End-to-end training with Deep Neural Networks (DNN) is a currently popular method for metaphor identification. However, standard sequence tagging models do not explicitly take advantage of linguistic theories of metaphor identification. We experiment with two DNN models which are inspired by two human metaphor identification procedures. By testing on three public datasets, we find that our models achieve state-of-the-art performance in end-to-end metaphor identification.

Pawel Gajewski, Paulo Ferreira, Georg Bartels, Chaozheng Wang, Frank Guerin, Bipin Indurkhya, Michael Beetz, Bartlomiej Sniezynski (2019)Adapting Everyday Manipulation Skills to Varied Scenarios, In: 2019 International Conference on Robotics and Automation (ICRA)2019pp. 1345-1351 IEEE

We address the problem of executing tool-using manipulation skills in scenarios where the objects to be used may vary. We assume that point clouds of the tool and target object can be obtained, but no interpretation or further knowledge about these objects is provided. The system must interpret the point clouds and decide how to use the tool to complete a manipulation task with a target object; this means it must adjust motion trajectories appropriately to complete the task. We tackle three everyday manipulations: scraping material from a tool into a container, cutting, and scooping from a container. Our solution encodes these manipulation skills in a generic way, with parameters that can be filled in at run-time via queries to a robot perception module; the perception module abstracts the functional parts of the tool and extracts key parameters that are needed for the task. The approach is evaluated in simulation and with selected examples on a PR2 robot.

Paulo Abelha, Frank Guerin (2018)Learning how a tool affords by simulating 3D models from the web, In: Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)2017pp. 4923-4929 IEEE

Robots performing everyday tasks such as cooking in a kitchen need to be able to deal with variations in the household tools that may be available. Given a particular task and a set of tools available, the robot needs to be able to assess which would be the best tool for the task, and also where to grasp that tool and how to orient it. This requires an understanding of what is important in a tool for a given task, and how the grasping and orientation relate to performance in the task. A robot can learn this by trying out many examples. This learning can be faster if these trials are done in simulation using tool models acquired from the Web. We provide a semi-automatic pipeline to process 3D models from the Web, allowing us to train from many different tools and their uses in simulation. We represent a tool object and its grasp and orientation using 21 parameters which capture the shapes and sizes of principal parts and the relationships among them. We then learn a `task function' that maps this 21 parameter vector to a value describing how effective it is for a particular task. Our trained system can then process the unsegmented point cloud of a new tool and output a score and a way of using the tool for a particular task. We compare our approach with the closest one in the literature and show that we achieve significantly better results.

Additional publications