Professor Payam Barnaghi
Academic and research departmentsCentre for Vision, Speech and Signal Processing (CVSSP), Department of Electrical and Electronic Engineering.
I am Professor of Machine Intelligence at the Department of Electronic and Electrical Engineering and a member of the Centre for Vision, Speech and Signal Processing (CVSSP) at the University of Surrey.
I am Deputy Head of the Care Research and Technology Centre at the UK Dementia Research Institute and technical lead of the Department of Health/NHS TIHM for Dementia. I am an associate editor of the IEEE Transactions on Big Data and vice-chair of the IEEE SIG on Big Data Intelligent Networking. I am a senior member of IEEE and a Fellow of the Higher Education Academy. I received the FEPS Teacher of the Year Award (2017) at the Faculty of Engineering and Physical Sciences and an IEEE Outstanding Leadership Award in 2017.
My main research goal is to develop information processing and machine learning methods for cyber-physical and social systems. I work on machine learning, Internet of Things (IoT), semantic web, adaptive algorithms, stream processing and information search and retrieval to solve problems and develop new technologies for the future Internet/Web and healthcare systems. My colleagues and I are currently working on:
- AI and IoT enabled solutions to provide personalised care for people affected by Dementia
- Machine learning and causal networks to create symptom experience models for cancer care
- The unobtrusive monitoring of patients in hospital wards and at home
- Time-series data processing and pattern analysis
- Deep learning for semi-supervised learning models for dynamic data streams
Areas of specialism
Internet of Things;
Time-series Data Analysis;
Affiliations and memberships
- New Health Tech Innovation of the Year Award, 2019 (TIHM for Dementia Project)
- The Most Outstanding Innovation, Guildford’s Innovation Awards (TIHM for Dementia Project)
- HSJ 2018 Award for Improving Care with Technology (TIHM for Dementia Project)
- Regional NHS Parliamentary Award, NHS 70th Anniversary, 2018 (TIHM for Dementia Project).
- Best Mental Health Initiative Award, EHI 2017 Awards (TIHM for Dementia Project)
- Teacher of The Year Award, Faculty of Engineering and Physical Sciences (2017)
- IEEE Outstanding Leadership Award (2017)
- Co-Investigator and Deputy Head, Care Research and Technology Centre, the UK Dementia Research Institute (2019-2025)
- Principal Investigator, Technology Integrated Health Management (TIHM) 1.5, IoT Testbed for Dementia Care, The Department of Health/NHS (2018-2019)
- Principal Investigator, Automated Body Monitoring System, MinebeaMitsumi (2018-2020)
- Principal Investigator, EU H2020 IoTCrawler (2018-2021)
- Principal Investigator, EU H2020 ACTIVAGE: Activating innovative IoT smart living environments for ageing well (2017-2021)
- Principal Investigator, Technology Integrated Health Management (TIHM), IoT Testbed for Healthcare, The Department of Health/NHS (2016-2018)
- Principal Investigator, EU H2020 FIESTA: Federated Interoperable Semantic IoT/cloud Testbeds and Applications (2015-2018)
- Co-Investigator, Optimisation techniques for water resource management, KTP project, Innovate UK (2015-2017)
- Principal Investigator, EU FP7-2012-ICT-FI collaborative Project, FI-Core/FIWARE: Future Internet Core Platform Extension, Availability and Sustainability (2014-2016)
- Principal Investigator and Project Co-ordinator, EU FP7 CityPulse: Real-Time IoT Stream Processing and Large-scale Data Analytics for Smart City Applications, (2013-2016)
- Co-Investigator, "Managing Communication Channels for Reliable Remote Working ", Technology Strategy Board (TSB) funded project (2014-2016)
- Co-Investigator and project co-ordinator, Internet of Things Architecture, Industry project funded by InterDigital Communications Inc. (2012-2013)
- Co-Investigator, EyeHub- Internet of Things Information Hub, Technology Strategy Board (TSB) funded project (2013-2014)
- Co-Investigator and Deputy Project Co-ordinator, EU FP7 Internet of Things Environment for Service Creation and Testing (IoTest), Small or medium-scale focused research project (STREP) (2011-2014)
- Co-Investigator, EU FP7-2011-ICT-FI Future Internet Ware (FI-Ware) (2011-2014)
- EPSRC Developing Leaders Recess Award, Engineering and Physical Sciences Research Council, January 2012
£20m Research and Technology Centre to enable people with dementia to live in own homes for longer
Trailblazing study that uses artificial intelligence to support people with dementia scoops prestigious healthcare award
£1 million in extra funding to extend Internet of Things study that supports people with dementia and their carers
Dementia patients could remain at home longer thanks to ground breaking technology
TIHM for dementia wins regional NHS70 Parliamentary Award following support from local MPs
University works with Surrey and Borders NHS Trust to deliver new innovative health services for dementia patients
In the media
Postgraduate research supervision
- Narges Pourshahrokhi, Principal Supervisor, January 2019-current.
- Roonak Rezvani, Principal Supervisor, April 2018-current.
- Honglin Li, Principal Supervisor, April 2018-current.
- Nikolaos Papachristou, Principal Supervisor, July 2015-current.
Completed postgraduate research projects I have supervised
- Masoud Hasanpour, Second Supervisor, October 2014-October 2018 (defended).
- Dr Yasmin Fathy, Principal Supervisor, Graduated in 2018.
- Dr Daniel Puschmann, Principal Supervisor, Graduated in 2018.
- Dr Eike Reetz, Second Supervisor, Graduated in 2016.
- Dr Frieder Ganz, Principal Supervisor, Graduated in 2014.
- Dr Gilbert Cassar, Second Supervisor, Graduated in 2013.
- Dr Wei Wang, Graduated in 2009.
KAT includes a collection of algorithms for each step of the Internet of Things (IoT) data processing workflow ranging from data and signal pre-processing algorithms such as Frequency Filters, dimensionality reduction techniques such as Wavelet, FFT, SAX, and Feature Extraction and Abstraction and Inference methods such as Clustering. Figure 1 shows the steps of the process chain for processing cyber-physical data on the Web. KAT can be used to design and evaluate algorithms for sensor data that aim to extract and find new insights from the data.
dreds or even thousands of sensor nodes have to be main-
tained and configured. With the upcoming initatives such
as Smart Home and Internet of Things, we need new mecha-
nism to discover and manage this amount of sensors. In this
paper, we describe a middleware architecture that uses con-
text information of sensors to supply a plug-and-play gate-
way and resource management framework for heterogeneous
sensor networks. Our main goals are to minimise the effort
for network engineers to configure and maintain the network
and supply a unified interface to access the underlying het-
erogeneous network. Based on the context information such
as battery status, routing information, location and radio
signal strength the gateway will configure and maintain the
sensor network. The sensors are associated to nearby base
stations using an approach that is adapted from the 802.11
WLAN association and negotiation mechanism to provide
registration and connectivity services for the underlying sen-
sor devices. This abstracted connection layer can be used to
integrate the underlying sensor networks into high-level ser-
vices and applications such as IP-based networks and Web
domains and then define correspondence between concepts in two
different ontologies using the SKOS model.
search and retrieval paradigms from documentoriented
and to entity and knowledge-centric search and retrieval.
By attempting to provide direct and intuitive answers
such systems alleviate information overload problem
and reduce information seekers? cognitive overhead.
Ontologies and knowledge bases are fundamental cornerstones
in semantic search systems based on which sophisticated
search mechanisms and efficient search services
are designed. Nevertheless, acquisition of quality knowledge
from heterogeneous sources on the Web is never a
trivial task. Transformation of data in existing databases
seems a promising bootstrapping approach, while information
providers may refuse to do so because of intellectual
property issues. In this article we discuss issues related to
knowledge acquisition for semantic search systems. In particular,
we discuss ontology learning from unstructured text
corpus, which is an automatic knowledge acquisition process
using different techniques.
creation from sensor data. We propose using data abstraction
techniques, in particular Symbolic Aggregate Approximation
(SAX), to analyse and create patterns from sensor data. The
created patterns are then linked to semantic descriptions that
define thematic, spatial and temporal features, providing highly
granular abstract representation of the raw sensor data. This
helps to reduce the size of the data that needs to be
communicated from the sensor nodes to the gateways or highlevel
processing components. We then discuss a method that uses
abstract patterns created by SAX method and occurrences of
different observations in a knowledge-based model to create
perceptions from sensor data.
led to the production of huge volumes of real-world streaming
data. We need effective techniques to process IoT data streams
and to gain insights and actionable information from realworld
observations and measurements. Most existing approaches
are application or domain dependent. We propose a method
which determines how many different clusters can be found
in a stream based on the data distribution. After selecting the
number of clusters, we use an online clustering mechanism
to cluster the incoming data from the streams. Our approach
remains adaptive to drifts by adjusting itself as the data changes.
We benchmark our approach against state-of-the-art stream
clustering algorithms on data streams with data drift. We show
how our method can be applied in a use case scenario involving
near real-time traffic data. Our results allow to cluster, label and
interpret IoT data streams dynamically according to the data
distribution. This enables to adaptively process large volumes of
dynamic data online based on the current situation. We show
how our method adapts itself to the changes. We demonstrate
how the number of clusters in a real-world data stream can be
determined by analysing the data distributions.
by smart objects that are directly related to the physical world. A structured, machine-processible approach to provision such
real-world services is needed to make heterogeneous physical objects accessible on a large scale and to integrate them with the
digital world. The incorporation of observation and measurement data obtained from the physical objects with the Web data, using
information processing and knowledge engineering methods, enables the construction of ?intelligent and interconnected things?.
The current research mostly focuses on the communication and networking aspects between the devices that are used for sensing
amd measurement of the real world objects. There is, however, relatively less effort concentrated on creating dynamic infrastructures
to support integration of the data into the Web and provide unified access to such data on service and application levels. This
paper presents a semantic modelling and linked data approach to create an information framework for IoT. The paper describes
a platform to publish instances of the IoT related resources and entities and to link them to existing resources on the Web. The
developed platform supports publication of extensible and interoperable descriptions in the form of linked data.
information search and retrieval methods, and facilitate
information acquisition, processing, storage and retrieval
on the semantic web. The past ten years have seen a number
of implemented semantic search systems and various proposed
frameworks. A comprehensive survey is needed to gain
an overall view of current research trends in this field. We
have investigated a number of pilot projects and corresponding
practical systems focusing on their objectives, methodologies
and most distinctive characteristics. In this paper, we report
our study and findings based on which a generalised semantic
search framework is formalised. Further, we describe issues
with regards to future research in this area.
to construct probabilistic models for service clustering.We discuss
how service descriptions can be enriched with machine-interpretable
semantics and then we investigate how these service descriptions can be
grouped in clusters in order to make discovery, ranking, and recommendation
faster and more effective. We propose using Probabilistic Latent
Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA) (i.e.
two machine learning techniques used in Information Retrieval) to learn
latent factors from the corpus of service descriptions and group services
according to their latent factors. By creating an intermediate layer of
latent factors between the services and their descriptions, the dimensionality
of the model is reduced and services can be searched and linked
together based on probabilistic methods in latent space. The model can
cluster any newly added service with a direct calculation without requiring
to re-calculate the latent variables or re-train the model.
sor data and link them to existing resource on the semantic Web. The
linked sensor data platform, called Sense2Web supports
exible and in-
teroperable descriptions and provide association of di erent sensor data
ontologies to resources described on the semantic Web and the Web of
data. The current advancements in (wireless) sensor networks and being
able to manufacture low cost and energy e cient hardware for sensors
has lead to much interest in integrating physical world data into theWeb.
Wireless sensor networks employ various types of hardware and software
components to observe and measure physical phenomena and make the
obtained data available through di erent networking services. Applica-
tions and users are typically interested in querying various events and
requesting measurement and observation data from the physical world.
Using a linked data approach enables data consumers to access sensor
data and query the data and relations to obtain information and/or inte-
grate data from various sources. Global access to sensor data can provide
a wide range of applications in di erent domains such as geographical
information systems, healthcare, smart homes, and business applications
and scenarios. In this paper we focus on publishing linked-data to anno-
tate sensors and link them to other existing resources on the Web.
concept extraction and relation learning. The authors of this chapter describe a novel approach
to learn relations automatically from unstructured text corpus based on probabilistic topic models. The
authors provide definition (Information Theory Principle for Concept Relationship) and quantitative
measure for establishing ?broader? (or ?narrower?) and ?related? relations between concepts. They
present a relation learning algorithm to automatically interconnect concepts into concept hierarchies
and terminological ontologies with the probabilistic topic models learned. In this experiment, around
7,000 ontology statements expressed in terms of ?broader? and ?related? relations are generated using
different combination of model parameters. The ontology statements are evaluated by domain experts
and the results show that the highest precision of the learned ontologies is around 86.6% and structures
of learned ontologies remain stable when values of the parameters are changed in the ontology learning
online prediction algorithm for processing Internet of
Things (IoT) time-series data. This is achieved by
first proposing a new data aggregation and datadriven
discretisation method that does not require data
segment normalisation. We apply a dictionary based
algorithm in order to identify patterns of interest along
with prediction of the next pattern. The performance
of the proposed method is evaluated using synthetic
and real-world datasets. The evaluations results shows
that our system is able to identify the patterns by up to
85% accuracy which is 16.5% higher than a baseline
using the Symbolic Aggregation Approximation (SAX)
of a wireless monitoring system for use within a pediatric
environment. The current wired methods used to provide noninvasive
sensing are not best suited to their end user, and there is
a development need for platform independent data transmission.
The main goal has been to develop a practical and flexible
proof-of-concept prototype suitable for the transmission of sensor
data. This prototype consists of an Arduino based multi-input
sensor system with wireless transmission, and an Android
monitoring station with the facility to rebroadcast the collected
data via email/web as a data file. This was achieved using
commercially available hardware platforms. The software
produced for the Android device allows for full control of the
functionality provided by the sensor platform developed on the
Arduino system, as well as storing the data within a relational
database. The data can also be graphically represented in realtime
on the Android device.
Recent advancements in sensing, networking technologies
and collecting real-world data on a large scale and from various environments
have created an opportunity for new forms of real-world services
and applications. This is known under the umbrella term of the Internet
of Things (IoT). Physical sensor devices constantly produce very large
amounts of data. Methods are needed which give the raw sensor measurements
a meaningful interpretation for building automated decision
support systems. To extract actionable information from real-world data,
we propose a method that uncovers hidden structures and relations
between multiple IoT data streams. Our novel solution uses Latent
Dirichlet Allocation (LDA), a topic extraction method that is generally
used in text analysis. We apply LDA on meaningful abstractions that
describe the numerical data in human understandable terms. We use
Symbolic Aggregate approXimation (SAX) to convert the raw data into
string-based patterns and create higher level abstractions based on
We finally investigate how heterogeneous sensory data from multiple
sources can be processed and analysed to create near real-time intelligence
and how our proposed method provides an efficient way to
interpret patterns in the data streams. The proposed method uncovers
the correlations and associations between different pattern in IoT data
streams. The evaluation results show that the proposed solution is able
to identify the correlation with high efficiency with an F-measure up to
has developed ontologies to describe concepts and relationships
between different entities in various application domains,
including Internet of Things (IoT) applications. A key problem
is that most of the IoT related semantic descriptions are not
as widely adopted as expected. One of the main concerns
of users and developers is that semantic techniques increase
the complexity and processing time and therefore they are
unsuitable for dynamic and responsive environments such as
the IoT. To address this concern, we propose IoT-Lite, an
instantiation of the semantic sensor network (SSN) ontology
to describe key IoT concepts allowing interoperability and
discovery of sensory data in heterogeneous IoT platforms by
a lightweight semantics. We propose 10 rules for good and
scalable semantic model design and follow them to create
IoT-Lite. We also demonstrate the scalability of IoT-Lite by
providing some experimental analysis, and assess IoT-Lite
against another solution in terms of round time trip (RTT)
performance for query-response times.
ongoing trial that is being conducted in the UK, called Technology
Integrated Health Management (TIHM). TIHM uses Internet of
Things (IoT) enabled solutions provided by various companies
in a collaborative project. The IoT devices and solutions are
integrated in a common platform that supports interoperable
and open standards. A set of machine learning and data analytics
algorithms generate notifications regarding the well-being of the
patients. The information is monitored around the clock by a
group of healthcare practitioners who take appropriate decisions
according to the collected data and generated notifications. In
this paper we discuss the design principles and the lessons that
we have learned by co-designing this system with patients, their
carers, clinicians, and also our industry partners. We discuss
the technical design of TIHM and explain why user-centred and
human-experience should be an integral part of the technological
Patients with Distinct Symptom Experiences, Journal of Pain and Symptom Management 55 (2) pp. 318-333 Elsevier
Risk profiling of oncology patients based on their symptom experience assists
clinicians to provide more personalized symptom management interventions. Recent findings
suggest that oncology patients with distinct symptom profiles can be identified using a variety of
To evaluate the concordance between the number and types of subgroups of
patients with distinct symptom profiles using latent class analysis (LCA) and K-modes analysis.
Using data on the occurrence of 25 symptoms from the Memorial Symptom
Assessment Scale (MSAS), that 1329 patients completed prior to their next dose of
chemotherapy (CTX), Cohen?s kappa coefficient was used to evaluate for concordance between
the two analytic methods. For both LCA and K-modes, differences among the subgroups in
demographic, clinical, and symptom characteristics, as well as quality of life outcomes were
determined using parametric and nonparametric statistics.
Using both analytic methods, four subgroups of patients with distinct symptom profiles
were identified (i.e., All Low, Moderate Physical and Lower Psychological, Moderate Physical
and Higher Psychological, All High). The percent agreement between the two methods was
75.32% which suggests a moderate level of agreement. In both analyses, patients in the All
High group were significantly younger and had a higher comorbidity profile, worse MSAS
subscale scores, and poorer QOL outcomes.
Both analytic methods can be used to identify subgroups of oncology patients with
distinct symptom profiles. Additional research is needed to determine which analytic methods
and which dimension of the symptom experience provides the most sensitive and specific risk
analysis is the segmentation of time-series data to identify
activities of interest. In this work, we analyse the performance
of univariate and multi-sensor Bayesian change detection
algorithms in segmenting accelerometer data. In particular, we
provide theoretical analysis and also performance evaluation on
synthetic data and real-world data. The results illustrate the
advantages of using multi-sensory variance change detection in
the segmentation of dynamic data (e.g. accelerometer data).
Machine learning has been used to accurately recognise physical activity patterns; however, classifiers for recognising targeted bone loading exercises have not been developed.
The purpose of this study was to determine the accuracy of machine learning models for classifying the intensity of exercises necessary for bone adaption in older adults.
Triaxial accelerometer data was collected from forty-four older participants (60-70 yrs) wearing a GCDC X16-1C accelerometer on their hip during three aerobics classes consisting of impact aerobic exercises performed at high and low intensities. Multi-class support vector machine (M-SVM) classifiers were trained in parallel for activity type detections where one classifier trained with low intensity activity samples and the other with high intensity samples. In a multi-view scoring manner, the classification confidence of these two learners was utilised for predicting the activity intensity. The leave-one-out cross-validation technique was used for assessment purpose.
Overall recognition accuracy of the M-SVM classifier for detecting exercise intensity was 73%. For each aerobics class, the M-SVM classifier accurately recognised exercise intensity by 82%, 73% and 65%.
Machine learning techniques such as M-SVM accurately recognised the intensity of bone promoting exercises from triaxial accelerometer data in community-dwelling older adults. First results of the developed classifier demonstrate significant potential of machine learning models for the evaluation of exercise adherence and performance in older adults.
Risk profiling of oncology patients based on their symptom experience assists clinicians to provide more personalized symptom management interventions. Recent findings suggest that oncology patients with distinct symptom profiles can be identified using a variety of analytic methods.
To evaluate the concordance between the number and types of subgroups of patients with distinct symptom profiles using latent class analysis (LCA) and K-modes analysis.
Using data on the occurrence of 25 symptoms from the Memorial Symptom Assessment Scale (MSAS), that 1329 patients completed prior to their next dose of chemotherapy (CTX), Cohen?s kappa coefficient was used to evaluate for concordance between the two analytic methods. For both LCA and K-modes, differences among the subgroups in demographic, clinical, and symptom characteristics, as well as quality of life outcomes were determined using parametric and nonparametric statistics.
Using both analytic methods, four subgroups of patients with distinct symptom profiles were identified (i.e., All Low, Moderate Physical and Lower Psychological, Moderate Physical and Higher Psychological, All High). The percent agreement between the two methods was 75.32% which suggests a moderate level of agreement. In both analyses, patients in the All High group were significantly younger and had a higher comorbidity profile, worse MSAS subscale scores, and poorer QOL outcomes.
Both analytic methods can be used to identify subgroups of oncology patients with distinct symptom profiles. Additional research is needed to determine which analytic methods and which dimension of the symptom experience provides the most sensitive and specific risk profiles.
have allowed the emergence of Internet-connected sensory devices that provide
observation and data measurement from the physical world. By 2020, it is
estimated that the total number of Internet-connected devices being used will
be between 25-50 billion. As the numbers grow and technologies become more
mature, the volume of data published will increase. Internet-connected devices
technology, referred to as Internet of Things (IoT), continues to extend the
current Internet by providing connectivity and interaction between the physical
and cyber worlds. In addition to increased volume, the IoT generates Big Data
characterized by velocity in terms of time and location dependency, with a
variety of multiple modalities and varying data quality. Intelligent processing
and analysis of this Big Data is the key to developing smart IoT applications.
This article assesses the different machine learning methods that deal with the
challenges in IoT data by considering smart cities as the main use case. The
key contribution of this study is presentation of a taxonomy of machine learning
algorithms explaining how different techniques are applied to the data in order
to extract higher level information. The potential and challenges of machine learning for IoT data analytics will also be discussed. A use case of applying
Support Vector Machine (SVM) on Aarhus Smart City traffic data is presented
for a more detailed exploration.
world. The Internet of Things (IoT) consists of the network-enabled devices and communication technologies
that allow connectivity and integration of physical objects (Things) into the digital world (Internet). Enormous
amounts of dynamic IoT data are collected from Internet-connected devices. IoT data is usually multi-variant
streams that are heterogeneous, sporadic, multi-modal and spatio-temporal. IoT data can be disseminated
with different granularities and have diverse structures, types and qualities. Dealing with the data deluge
from heterogeneous IoT resources and services imposes new challenges on indexing, discovery and ranking
mechanisms that will allow building applications that require on-line access and retrieval of ad-hoc IoT data.
However, the existing IoT data indexing and discovery approaches are complex or centralised which hinders
their scalability. The primary objective of this paper is to provide a holistic overview of the state-of-the-art on
indexing, discovery and ranking of IoT data. The paper aims to pave the way for researchers to design, develop,
implement and evaluate techniques and approaches for on-line large-scale distributed IoT applications and
data with respect to a statistical parameter of interest in
Bayesian models. It is common to assume that the parameters
are distinct within each segment. As such, many Bayesian
change point detection models do not exploit the segment parameter
patterns, which can improve performance. This work
proposes a Bayesian mean-shift change point detection algorithm
that makes use of repetition in segment parameters, by
introducing segment class labels that utilise a Dirichlet process
prior. The performance of the proposed approach was
assessed on both synthetic and real world data, highlighting
the enhanced performance when using parameter labelling.
measurement data are collected from sensors in Wireless
Sensor Networks (WSNs) for the Internet of Things (IoT)
applications such as environmental monitoring. However, continuous
transmission of the sensed data requires high energy
consumption. Data transmission between sensor nodes and
cluster heads (sink nodes) consumes much higher energy than
data sensing in WSNs. One way of reducing such energy
consumption is to minimise the number of data transmissions.
In this paper, we propose an Adaptive Method for Data Reduction
(AM-DR). Our method is based on a convex combination
of two decoupled Least-Mean-Square (LMS) windowed filters
with differing sizes for estimating the next measured values
both at the source and the sink node such that sensor nodes
have to transmit only their immediate sensed values that
deviate significantly (with a pre-defined threshold) from the
predicted values. The conducted experiments on a real-world
data show that our approach has been able to achieve up to
95% communication reduction while retaining a high accuracy
(i.e. predicted values have a deviation of ý+0:5 from real data
reduce the observations in the time series analysis domain. The IoT time series require aggregation methods that can preserve and
represent the key characteristics of the data. In this paper, we propose a segmentation algorithm that adapts to unannounced
mutations of the data (i.e. data drifts). The algorithm splits the data streams into blocks and groups them in square matrices, computes
the Discrete Cosine Transform (DCT) and quantizes them. The key information is contained in the upper-left part of the resulting
matrix. We extract this sub-matrix, compute the modulus of its eigenvalues and remove duplicates. The algorithm, called BEATS, is
designed to tackle dynamic IoT streams, whose distribution changes over time. We implement experiments with six datasets combining
real, synthetic, real-world data, and data with drifts. Compared to other segmentation methods like Symbolic Aggregate approXimation
(SAX), BEATS shows significant improvements. Trying it with classification and clustering algorithms it provides efficient results. BEATS
is an effective mechanism to work with dynamic and multi-variate data, making it suitable for IoT data sources. The datasets, code of
the algorithm and the analysis results can be accessed publicly at: https://github.com/auroragonzalez/BEATS.
planners know where and when they should be aware of the
repercussions regarding events happening in different parts of the
city. Most of the smart city data analysis solutions are focused on
the events and occurrences of the city as a whole, making it difficult
to discern the exact place and time of the consequences of a particular
event. We propose a novel method to model the events in a city
in space and time. We apply our methodology for vehicular traffic
data basing our models in (convolutional) neuronal networks.
Source Lookup, IEEE Internet of Things Journal 5 (3) pp. 2037-2054 IEEE
to provide an adequate scalability. This is due to the high computational complexity and communication overhead that is required to
create and maintain the indices of the IoT sources particularly when their attributes are dynamic. This paper presents a novel approach
for indexing distributed IoT sources and paves the way to design a data discovery service to search and gain access to their data. The
proposed method creates concise references to IoT sources by using Gaussian Mixture Models (GMM). Furthermore, a summary update
mechanism is introduced to tackle the change of sources availability and mitigate the overhead of updating the indices frequently. The
proposed approach is benchmarked against a standard centralized indexing and discovery solution. The results show that the proposed
solution reduces the communication overhead required for indexing by three orders of magnitude while depending on IoT network
architecture it may slightly increase the discovery time
innovation potential for developing smarter applications and
services. However, today we see solutions in the development of
vertical applications and services reflecting what used to be the
early days of the Web, leading to fragmentation and intra-nets of
Things. To achieve an open IoT ecosystem of systems and
platforms, several key enablers are needed for effective, adaptive
and scalable mechanisms for exploring and discovering IoT
resources and their data/capabilities. This paper discusses our
work in the EU H2020 IoTCrawler project. Its focus is on the
integration and interoperability across different platforms,
through dynamic and reconfigurable solutions for discovery and
integration of data and services from legacy and new systems. This
is complemented with adaptive, privacy-aware and secure
solutions for crawling, indexing and searching in distributed IoT
systems. IoTCrawler targets IoT development and demonstrations
with a focus on Industry 4.0, Social IoT, Smart City and Smart
Energy use cases.
have created an opportunity for new forms of services and applications. This is known under the umbrella term of the Internet of
Things (IoT). Physical sensor devices constantly produce very large amounts of data. Methods are needed which give the raw sensor measurements a meaningful interpretation for building automated decision support systems. One of the main research challenges in this domain is to extract actionable information from real-world data, that is information that can readily be used to make informed automatic
decisions in intelligent systems. Most existing approaches are application or domain dependent or are only able to deal with specific data
sources of one kind. This PhD research concerns multiple approaches for analysing IoT data streams. We propose a method which determines how many different clusters can be found in a stream based on the data distribution. After selecting the number of clusters, we use an online clustering mechanism to cluster the incoming data from the streams. Our approach remains adaptive to drifts by adjusting itself as the data changes. The work is benchmarked against state-of-the art stream clustering algorithms on data streams with data drift. We show how our method can be applied in a use case scenario involving near real-time traffic data. Our results allow to cluster, label and interpret IoT data streams dynamically according to the data distribution. This enables to adaptively process large volumes of dynamic data online based on the current situation. We show how our method adapts itself to the changes and we demonstrate how the number of clusters in a real-world data stream can be determined by analysing the data distributions.
Using the ideas and concepts of this approach as a starting point we designed another novel dynamic and adaptable clustering approach
that is more suitable for multi-variate time-series data clustering. Our solution uses probability distributions and analytical methods to adjust the centroids as the data and feature distributions change over time. We have evaluated our work against some well-known time-series clustering methods and have shown how the proposed method can reduce the complexity and perform efficient in multi-variate datastreams.
Finally we propose a method that uncovers hidden structures and relations between multiple IoT data streams. Our novel solution uses Latent Dirichlet Allocation (LDA), a topic extraction method that is generally used in text analysis. We apply LDA on meaningful labels that describe the numerical data in human understandable terms. To create the labels we use Symbolic Aggregate approXimation (SAX), a method that converts raw data into string-based patterns. The extracted patterns are then transformed with a rule engine into the labels.
The work investigates how heterogeneous sensory data from multiple sources can be processed and analysed to create near real-time intelligence and how our proposed method provides an efficient way to interpret patterns in the data streams. The proposed method provides a novel way to uncover the correlations and associations between different pattern in IoT data streams. The evaluation results show that the proposed solution is able to identify the correlation with high efficiency with an F-measure up to 90%.
Overall, this PhD research has designed, implemented and evaluated unsupervised adaptive algorithms to analyse, structure and extract information from dynamic and multi-variate sensory data streams. The results of this research has significant impact in designing flexible and scalable solutions in analysing real-world sensory data streams and specially in cases where labelled and annotated data is not available or it is too costly to be collected. Research and advancements in healthcare and smarter cities are two key areas that can directly fr
a rapidly growing digital economy imposes on current applications and information systems. Smart city applications enable city
authorities to monitor, manage and provide plans for public resources and infrastructures in city environments, while offering citizens
and businesses to develop and use intelligent services in cities. However, providing such smart city applications gives rise to several
issues such as semantic heterogeneity and trustworthiness of data sources, and extracting up-to-date information in real time from
large-scale dynamic data streams. In order to address these issues, we propose a novel framework with an efficient semantic data
processing pipeline, allowing for real-time observation of the pulse of a city. The proposed framework enables efficient semantic
integration of data streams and complex event processing on top of real-time data aggregation and quality analysis in a Semantic Web
environment. To evaluate our system, we use real-time sensor observations that have been published via an open platform called Open
Data Aarhus by the City of Aarhus. We examine the framework utilising Symbolic Aggregate Approximation to reduce the size of data
streams, and perform quality analysis taking into account both single and multiple data streams. We also investigate the optimisation of
the semantic data discovery and integration based on the proposed stream quality analysis and data aggregation techniques.
Dealing with the data deluge from heterogeneous IoT resources and services imposes new challenges on indexing, discovery and ranking mechanisms. Novel indexing and discovery methods will enable developing applications that use on-line access and retrieval of ad-hoc IoT data.
Investigation of the related work leads to the conclusion that there has been significant work on processing and analysing sensor data streams. However, there is still a need for integrating solutions that contemplate the work-flow from connecting IoT resources to make their published data indexable, searchable and discoverable.
This research proposes a set of novel solutions for indexing, processing and discovery in IoT networks. The work proposes novel distributed in-network and spatial indexing solutions. The proposed solutions scale well and provide up to 92% better response time and higher success rates in response to data search queries compared to a baseline approach.
A co-operative, adaptive, change detection algorithm has also been developed. It is based on a convex combination of two decoupled Least Mean Square (LMS) windowed filters. The approach provides better performance and less complexity compared to the state-of-the-art solutions. The change detection algorithm can also be applied to distributed networks in an on-line fashion. This co-operative approach allows publish/subscribe based and change based discovery solutions in IoT.
Continuous transmission of large volumes of data collected by sensor nodes induces a high communication cost
for each individual node in IoT networks. An Adaptive Method for Data Reduction (AM-DR) has been proposed for reducing the number of data transmissions in IoT networks. In AM-DR, identical predictive models are constructed at both the sensor and the sink nodes to describe data evolution such that sensor nodes require transmitting only their readings that deviate significantly from actual values. This has a significant impact on reducing the data load in IoT data discovery scenarios.
Finally, a solution for quality and energy-aware resource discovery and accessing IoT resources has been proposed. The solution effectively achieves a communication reduction while retaining a high prediction accuracy (i.e. only a deviation of ±1.0 degree between actual and predicted sensor readings). Furthermore, an energy cost model has been discussed to demonstrate how the proposed approach reduces energy consumption significantly and effectively prolongs the network lifetime.
sequential changes in streaming data obtained from sensors
in Wireless Sensor Networks (WSNs) for Internet of Things
(IoT) applications such as fire/fault detection, activity recognition
and environmental monitoring. Such applications require (near)
online detection of instantaneous changes. This paper proposes
an Online, adaptive Filtering-based Change Detection (OFCD)
algorithm. Our method is based on a convex combination of
two decoupled Least Mean Square (LMS) windowed filters with
differing sizes. Both filters are applied independently on data
streams obtained from sensor nodes such that their convex combination
parameter is employed as an indicator of abrupt changes
in mean values. An extension of our method (OFCD) based
on a Cooperative scheme between multiple sensors (COFCD) is
also presented. It provides an enhancement of both convergence
and steady-state accuracy of the convex weight parameter. Our
conducted experiments show that our approach can be applied in
distributed networks in an online fashion. It also provides better
performance and less complexity compared with the state-of-theart
on both of single and multiple sensors.
in an Internet-like structure to be managed and inventoried by computers. Radio-frequency
identification (RFID) - a prerequisite for the IoT - is an automatic way for data transaction in
object identification and is used to improve automation, inventory control and checkout
operations. An RFID system consists of a reader device and one or several tags. Smart reader
systems are building blocks for cutting edge applications of RFID and as a subdivision of these
systems, RFID smart shelf solutions are started to be implemented for large-scale item-level
management where characteristics of reader antennas are critical issue.
This work focuses on designing optimised reader antennas for high frequency (HF) RFID
smart shelf systems which operate based on inductive coupling between the tag and the reader
antennas and have good performance in crowded environments. Firstly, an approach is presented
to increase band-width of HF RFID reader antennas to improve the reception of sub-carrier
frequencies. A fabricated enhanced band-width antenna at 13.56 MHz is evaluated for its
capability in being used for smart shelf applications. The obtained band-width supports sub-carrier
frequencies for all the HF RFID standards to be detected easier and thus leads to increased
identification range. It is shown the HF RFID technology is capable of identifying the distance of
tagged books based on the received magnetic field intensity.
Secondly, multi turn small self resonant coil (MT SSRC) antennas are introduced and analysed
as a new model of inductively coupled reader antennas. Based on the analysis, two turn planar
SSRC (TTP SSRC) antennas having similar dimension with the current HF RFID reader antennas
are investigated. Fabricated TTP SSRC antenna operating at 13.56 MHz is resulted to optimised Q
factor and more uniform near field pattern in comparison with the similar antennas. Also, a
number of TTP SSRC antennas operating at a distinct frequency, 13.56MHz, are fabricated on
different substrates and it is shown the desired Q factor and antenna dimension can be obtained
based on the dielectric characteristics of the substrate.
The environment is observed from a program that is hosted on a sensor node. Machine learning and data mining techniques are embedded in the program to learn from the environment and detect events. A collaborative sensing is a technology to process an event from distrusted nodes which can enhance an accuracy result that can be fault or event.
This research studied processing sensor data to detect events using multiple sensor nodes. A model and/or rules are defined in order to detect an outlier from data matching between sensor data and the model and/or rules. An outlier is analysed and processed to detect an event.
The main contributions of this work have been on collaborative sensing in different sensors including clustering analysis for data labelling, classification analysis in order to process an outlier for an event detection.
in both their severity and distress. Recent advances in Network Analysis (NA) provide a novel approach to gain insights into
the complex nature of co-occurring symptoms and symptom clusters and identify core symptoms. We present findings from the
first study that used NA to examine the relationships among 38 common symptoms in a large sample of oncology patients
undergoing chemotherapy. Using two different models of Pairwise Markov Random Fields (PMRF), we examined the nature
and structure of interactions for three different dimensions of patients? symptom experience (i.e., occurrence, severity, distress).
Findings from this study provide the first direct evidence that the connections between and among symptoms differ depending
on the symptom dimension used to create the network. Based on an evaluation of the centrality indices, nausea appears
to be a structurally important node in all three networks. Our findings can be used to guide the development of symptom
management interventions based on the identification of core symptoms and symptom clusters within a network.
Computational tools that predict the course and severity of these symptoms have the
potential to assist oncology clinicians to personalize the patient's treatment regimen
more efficiently and provide more aggressive and timely interventions. Three common
and inter-related symptoms in cancer patients are depression, anxiety, and sleep
disturbance. In this paper, we elaborate on the efficiency of Support Vector Regression
(SVR) and Non-linear Canonical Correlation Analysis by Neural Networks (n-CCA) to
predict the severity of the aforementioned symptoms between two different time points
during a cycle of chemotherapy (CTX). Our results demonstrate that these two
methods produced equivalent results for all three symptoms. These types of predictive
models can be used to identify high risk patients, educate patients about their symptom
experience, and improve the timing of pre-emptive and personalized symptom
infection and analysing daily living activities in people with dementia, PLOS One 14 (1) e0209909 pp. 1-22 PLOS
data are collected from sensory devices in the Internet
of Things (IoT) networks. IoT data is often generated
in highly distributed and dynamic environments. Continuous
transmission of large volumes of data collected between sensor
and head/sink nodes induces a high communication cost for
individual nodes. This results in a significant increase in the
overall energy cost for IoT applications such as environmental
monitoring. Decreasing data transmission between nodes can
effectively reduce energy consumption and prolong the network
lifetime, especially in battery-powered nodes/networks. In this
paper, we describe an Adaptive Method for Data Reduction (AMDR),
a data reduction approach for reducing the overall data
transmission and communication between sensor nodes in IoT
networks such that fine-grained sensor readings can be used
to reconstruct the original data within a user-defined accuracy
boundary. Evaluation with real-world data shows that AM-DR
achieves a communication reduction in some scenarios up to 95%
while retaining a high prediction accuracy. To fully achieve the
energy savings enabled by AM-DR, we provide a communication
cost model. The proposed model is also integrated into the
LEACH protocol to demonstrate how our proposed approach
reduces energy consumption and effectively prolongs the network
efficiency of IoT systems by performing some of the analysis
and operations on the nodes or on intermediary edge devices.
This will reduce the energy consumption, data transmission
load and latency by shifting some of the processes to the edge
devices. In this paper, we introduce a pattern extraction method
which uses both the Lagrangian Multiplier and the Principal
Component Analysis (PCA) to create patterns from raw sensory
data. We have evaluated our method by applying a clustering
method on constructed patterns. The results show that by using
our proposed Lagrangian-based pattern extraction method, the
existing clustering algorithms perform more accurately - by
up to 20% higher compared with the state-of-the-art methods,
especially in dealing with dynamic real-world data. We have
conducted our evaluations based on synthetic and real-world data
sets and have compared the results to the existing state-of-the-art
approaches. We also discuss how the proposed methods can be
embedded into the edge computing devices in IoT systems and
of Internet of Things (IoT) devices has led to the generation of
large volumes of real world data. Analytical models can be used to
extract meaningful insights from this data. However, most of IoT
data is not fully utilised, which is mainly due to interoperability
issues and the difficulties to analyse data collected by heterogeneous
resources. To overcome this heterogeneity, semantic
technologies are used to create common models to share various
data originated from heterogeneous sources. However, semantics
add further overhead to data delivery, and the processing time
to annotate the data with the model can increase the latency and
complexity in publishing and querying the annotated data. In
this paper, we present a lightweight semantic model to annotate
IoT streams. The metadata descriptions that are provided in the
models are used for search and discovery of the data using various
attributes such as value and type. The proposed model extends
commonly used ontologies such as W3C/OGC SSN ontology
and its recent lightweight core, SOSA, and includes concepts
to describe streaming IoT data. We also show use cases, tools
and applications where the proposed model has been used.
Owing to growth in the popularity of mobile phones, solutions for more efficient mobile network resource management have been increasingly demanded by network operators. Predicting the future state of the network and allocating the network resources based on the predicted state has been proposed as an effective method for efficient management of the network resources by the research community. One of the major factors that changes the future state of network is changes in the behavior of users. As the result, to forecast the future state of network, a major task is to predict the future behaviors of users. This task is accomplished by User Behavior Prediction Models (UBPrMs). In order to maintain the quality of the service, such methods are expected to provide sufficiently accurate prediction. However, the existing methods often are not able to meet this performance requirement.
The accuracy of a predictive model is affected by two distinct sources of error, namely Modeling Error (ME) and Sampling Error (SaE). As the result, one ought to consider both sources of error while improving the performance of a model. To do this, this thesis aims to study and alleviate the impact of the mentioned sources of error on the performance of a UBPrM.
To treat the ME, we propose a novel group-level user behaviors prediction framework as a more accurate alternative for population-level user behaviors prediction models and a more computationally efficient alternative for individual-level user behaviors prediction. The novel framework is called Event Profiling Method (EPM). To diminish the impact of ME, the proposed event-based method takes advantage of similarities amongst users' behavior and the existing underlying patterns that repetitively occur in the network.
To evaluate the proposed framework, EPM method needs to be implemented in real-world scenarios. Video popularity prediction is considered as a suitable use case for EPM. For this purpose, this thesis utilizes the ideas of EPM framework to propose a novel approach for enhancing the video popularity prediction models. Using the proposed approach, we enhance three popularity prediction techniques that outperform the accuracy of the prior state-of-the-art solutions. The major components of the proposed approach are three novel mechanisms for "user grouping", "content classification" and "dominant-follower users identification". The user grouping method is an unsupervised clustering approach that divides the users into an adequate number of user groups with similar interests. The content classification approach identifies the classes of videos with similar early popularity trends. The dominant-follower identification technique divides the users in each group into two distinct subgroups based on their reaction time to the released videos. To predict the popularity of the newly-released videos, our proposed popularity prediction model trains its parameters in each user group and its associated video popularity classes and user subgroups. Evaluations are performed through a 5-fold cross validation and on a dataset containing one month video request records of 26,706 number of BBC iPlayer users. Our analysis shows that the accuracy of the proposed solution outperforms the state-of-the-art including S-H, ML, MRBF models on average by 59%, 27% and 21%, respectively.
Afterwards, this thesis proposes a novel combination technique for multi-dimensional user profiles that is able to treat the SaE. In doing so, the proposed technique considers the samples of other users' behavior (or in general, other items) as a biased approximation of each user (or an item). The method utilizes two conditions on the magnitude and sign of the estimated bias between two users to decide on combining their profiles or not. The proposed technique is evaluated against synthesized and real-world datasets. Our results show that the proposed method provides better estimations of the st
proposed approach, we enhance three popularity prediction techniques that outperform the accuracy of the
prior state-of-the-art solutions. The major components of the proposed approach are two novel mechanisms for
"user grouping" and "content classification". The user grouping method is an unsupervised clustering approach
that divides the users into an adequate number of user groups with similar interests. The content classification
approach identifies the classes of videos with similar popularity growth trends. To predict the popularity of
the newly-released videos, our proposed popularity prediction model trains its parameters in each user group
and its associated video popularity classes. Evaluations are performed through a 5-fold cross validation and
on a dataset containing one month video request records of 26,706 number of BBC iPlayer users. Using the
proposed grouping technique, user groups of similar interest and up to 2 video popularity classes for each
user group were detected. Our analysis shows that the accuracy of the proposed solution outperforms the
state-of-the-art including SH, ML, MRBF models on average by 45%, 33% and 24%, respectively. Finally, we
discuss how various systems in the network and service management domain such as cache deployment,
advertising and video broadcasting technologies benefit from our findings to illustrate the implications.