Deception exists in all aspects of life and is particularly evident on the Web. Deception includes child sexual predators grooming victims online, medical news headlines with little medical evidence or scientific rigour, individuals claiming others? work as their own, and systematic deception of company shareholders and institutional investors leading to corporate collapses.
This thesis explores the potential for automatic detection of deception. We investigate the nature of deception and the related cues, focusing in particular on Verbal Cues, and concluding that they cannot be readily generalised. We demonstrate how deception-specific features, based on sound hypotheses, can overcome related limitations by presenting approaches for three different examples of deception ? namely Child Sexual Predator Detection (SPD), Authorship Identification (AI) and Intrinsic Plagiarism Detection (IPD). We further show how our approaches result in competitive levels of reliability.
For SPD we develop our approach largely based on the commonality of requests for key personal information. To address AI, we introduce approaches based on a frequency-mean-variance and a frequency-only framework in order to detect strong associations between co-occurring patterns of a limited number of stopwords. Our IPD approaches are based on simple commonality of words at document level and usage of proper nouns; document sections lacking commonality can be identified as plagiarised.
The frameworks of the International Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN) competitions provided an independent evaluation of the approaches. The SPD approach obtained an F1 score of 0.48. F1 scores of 0.47, 0.53 and 0.57 were achieved in AI tasks for PAN2012, 2013 and 2014 respectively. IPD yielded an overall accuracy of 91%. Through post-competition adaptations we also show how to improve the approaches and the scores and demonstrate the importance of suitable datasets and how most approaches are not easily transferable between various types of deception.