Paul Krause

Paul Krause


Professor in Complex Systems, Director of Research

Biography

Biography

I graduated with First Class Honours in Pure Mathematics and Experimental Physics at Exeter University in 1977, before moving on to Study for a PhD in Geophysics under Geraint Rosser at the same university. Following that, I then moved on to spells at a number of major research laboratories: The National Physical Laboratory (1980-87); The University of Surrey (1987-89); the Imperial Cancer Research Fund (1989-1996); Philips Research Laboratories, UK (1996-2003). For the last two years at PRL I was working part-time at the University of Surrey before moving to full time at the latter at the beginning of 2003.

Since 1987, my research projects have involved a high degree of interdisciplinarity, although always involving the development of AI, Machine Learning and Data Analytic techniques, especially to the analysis of complex socio-technical systems.

Click here to see my Google Scholar Citation Profile.

I am a Fellow of the Institute of Mathematics and Its Applications (also conferring Chartered Mathematician status) and am also Editor (Computing and Software) of the IET's Open Access Journal of Engineering, and an Associate Editor of Soft Computing.

Research interests

  • Digital and Industrial Ecosystems as Complex Adaptive Systems
  • Use of ICT to support sustainable living and social change
  • Formal models of interactive computing
  • Practical applications of Machine Learning

Teaching

COM2039: Parallel Programming

COMM049: HTML5 and CSS3 for Mobile Web Applications

Departmental duties

Director of Research

Chair of Academic Misconduct Panel

Chair of Research Management Committee

Publication highlights

Click here to see my Google Scholar Citation Profile.

News

Media Contacts

Contact the press team

Email:

mediarelations@surrey.ac.uk

Phone: +44 (0)1483 684380 / 688914 / 684378
Out-of-hours: +44 (0)7773 479911
Senate House, University of Surrey
Guildford, Surrey GU2 7XH

My publications

Publications

Bryant D, Krause PJ, Vreeswijk GAW (2006) Argue tuProlog: A Lightweight Argumentation Engine for Agent Applications, COMPUTATIONAL MODELS OF ARGUMENT 144 pp. 27-32 I O S PRESS
Krause PJ, Razavi AR, Moschoyiannis S, Marinos A (2009) Stability and Complexity in Digital Ecosystems, 2009 3RD IEEE INTERNATIONAL CONFERENCE ON DIGITAL ECOSYSTEMS AND TECHNOLOGIES pp. 200-205 IEEE
Leppenwell E, de Lusignan S, Vicente MT, Michalakidis G, Krause P, Thompson S, McGilchrist M, Sullivan F, Desombre T, Taweel A, Delaney B (2012) Developing a survey instrument to assess the readiness of primary care data, genetic and disease registries to conduct linked research: TRANSFoRm International Research Readiness (TIRRE) survey instrument., Inform Prim Care 20 (3) pp. 207-216
Clinical data are collected for routine care in family practice; there are also a growing number of genetic and cancer registry data repositories. The Translational Research and Patient Safety in Europe (TRANSFoRm) project seeks to facilitate research using linked data from more than one source. We performed a requirements analysis which identified a wide range of data and business process requirements that need to be met before linking primary care and either genetic or disease registry data.
Xhafa F, Wang J, Chen X, Liu JK, Li J, Krause P (2013) An efficient PHR service system supporting fuzzy keyword search and fine-grained access control, Soft Computing pp. 1-8
Outsourcing of personal health record (PHR) has attracted considerable interest recently. It can not only bring much convenience to patients, it also allows efficient sharing of medical information among researchers. As the medical data in PHR is sensitive, it has to be encrypted before outsourcing. To achieve fine-grained access control over the encrypted PHR data becomes a challenging problem. In this paper, we provide an affirmative solution to this problem. We propose a novel PHR service system which supports efficient searching and fine-grained access control for PHR data in a hybrid cloud environment, where a private cloud is used to assist the user to interact with the public cloud for processing PHR data. In our proposed solution, we make use of attribute-based encryption (ABE) technique to obtain fine-grained access control for PHR data. In order to protect the privacy of PHR owners, our ABE is anonymous. That is, it can hide the access policy information in ciphertexts. Meanwhile, our solution can also allow efficient fuzzy search over PHR data, which can greatly improve the system usability. We also provide security analysis to show that the proposed solution is secure and privacy-preserving. The experimental results demonstrate the efficiency of the proposed scheme. © 2013 Springer-Verlag Berlin Heidelberg.
Hierons RM, Bogdanov K, Bowen JP, Cleaveland R, Derrick J, Dick J, Gheorghe M, Harman M, Kapoor K, Krause P, Luettgen G, Simons AJH, Vilkomir S, Woodward MR, Zedan H (2009) Using Formal Specifications to Support Testing, ACM COMPUTING SURVEYS 41 (2) ARTN 9 ASSOC COMPUTING MACHINERY
Razavi A, Marinos A, Moschoyiannis S, Krause P (2009) Recovery management in RESTful Interactions, Proceedings of 3rd IEEE International Conference on Digital Ecosystems and Technologies pp. 436-441 IEEE
With REST becoming a dominant architectural paradigm for web services in distributed systems, more and more use cases are applied to it, including use cases that require transactional guarantees. We believe that the loose coupling that is supported by RESTful transactions, makes this currently our preferred interaction style for digital ecosystems (DEs). To further expand its value to DEs, we propose a RESTful transaction model that satisfies both the constraints of recoverable transactions and those of the REST architectural style. We then show the correctness and applicability of the model.
Bryant D, Krause P, Moschoyiannis S (2006) A tool to facilitate agent deliberation, LOGICS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS 4160 pp. 465-468 SPRINGER-VERLAG BERLIN
Marinos A, Krause P (2009) Using SBVR, REST and Relational Databases to develop Information Systems native to the Digital Ecosystem, 2009 3RD IEEE INTERNATIONAL CONFERENCE ON DIGITAL ECOSYSTEMS AND TECHNOLOGIES pp. 424-429 IEEE
Moschoyiannis S, Razavi AR, Zheng YY, Krause P (2008) Long-running Transactions: semantics, schemas, implementation, Proceedings of 2nd IEEE International Conference on Digital Ecosystems and Techonologies pp. 208-215 IEEE
In this paper we describe a formal model for the distributed coordination of long-running transactions in a Digital Ecosystem for business, involving Small and Medium Enterprises (SMEs). The proposed non-interleaving model of interaction-based service composition allows for communication between internal activities of transactions. The formal semantics of the various modes of service composition are represented by standard xml schemas. The current implementation framework uses suitable asynchronous message passing techniques and reflects the design decisions of the proposed model for distributed transactions in digital ecosystems.
Ryman-Tubb NF, Krause P (2011) Neural Network Rule Extraction to Detect Credit Card Fraud, ENGINEERING APPLICATIONS OF NEURAL NETWORKS, PT I 363 pp. 101-110 SPRINGER-VERLAG BERLIN
Liyanage H, de Lusignan S, Liaw ST, Kuziemsky CE, Mold F, Krause P, Fleming D, Jones S (2014) Big Data Usage Patterns in the Health Care Domain: A Use Case Driven Approach Applied to the Assessment of Vaccination Benefits and Risks. Contribution of the IMIA Primary Healthcare Working Group., Yearb Med Inform 9 pp. 27-35
BACKGROUND: Generally benefits and risks of vaccines can be determined from studies carried out as part of regulatory compliance, followed by surveillance of routine data; however there are some rarer and more long term events that require new methods. Big data generated by increasingly affordable personalised computing, and from pervasive computing devices is rapidly growing and low cost, high volume, cloud computing makes the processing of these data inexpensive. OBJECTIVE: To describe how big data and related analytical methods might be applied to assess the benefits and risks of vaccines. METHOD: We reviewed the literature on the use of big data to improve health, applied to generic vaccine use cases, that illustrate benefits and risks of vaccination. We defined a use case as the interaction between a user and an information system to achieve a goal. We used flu vaccination and pre-school childhood immunisation as exemplars. RESULTS: We reviewed three big data use cases relevant to assessing vaccine benefits and risks: (i) Big data processing using crowdsourcing, distributed big data processing, and predictive analytics, (ii) Data integration from heterogeneous big data sources, e.g. the increasing range of devices in the "internet of things", and (iii) Real-time monitoring for the direct monitoring of epidemics as well as vaccine effects via social media and other data sources. CONCLUSIONS: Big data raises new ethical dilemmas, though its analysis methods can bring complementary real-time capabilities for monitoring epidemics and assessing vaccine benefit-risk balance.
Krause PJ, Perez-Minana E, Thornton J (2012) Bayesian Networks for the management of Greenhouse Gas emissions in the British agricultural sector, Environmental Modelling and Software 35 pp. 132-148 Elsevier
Recent years have witnessed a rapid rise in the development of deterministic and non-deterministic models to estimate human impacts on the environment. An important failing of these models is the difficulty that most people have understanding the results generated by them, the implications to their way of life and also that of future generations. Within the field, the measurement of greenhouse gas emissions (GHG) is one such result. The research described in this paper evaluates the potential of Bayesian Network (BN) models for the task of managing GHG emissions in the British agricultural sector. Case study farms typifying the British agricultural sector were inputted into both, the BN model and CALM, a Carbon accounting tool used by the Country Land and Business Association (CLA) in the UK for the same purpose. Preliminary results show that the BN model provides a better understanding of how the tasks carried out on a farm impact the environment through the generation of GHG emissions. This understanding is achieved by translating the emissions information into their cost in monetary terms using the Shadow Price of Carbon (SPC), something that is not possible using the CALM tool. In this manner, the farming sector should be more inclined to deploy measures for reducing its impact. At the same time, the output of the analysis can be used to generate a business plan that will not have a negative effect on a farm's capital income.
Fenton N, Neil M, Marsh W, Hearty P, Radlinski L, Krause P (2008) On the effectiveness of early life cycle defect prediction with Bayesian Nets, EMPIRICAL SOFTWARE ENGINEERING 13 (5) pp. 499-537 SPRINGER
Krause P, de Lusignan S (2010) Procuring interoperability at the expense of usability: a case study of UK National Programme for IT assurance process., Studies in Health Technology and Informatics: Seamless care, safe care: the challenges of interoperability and patient safety in health care: Proceedings of the EFMI Special Topic Conference 155 pp. 143-149
The allure of interoperable systems is that they should improve patient safety and make health services more efficient. The UK's National Programme for IT has made great strides in achieving interoperability; through linkage to a national electronic spine. However, there has been criticism of the usability of the applications in the clinical environment.
Sansom M, Salazar N, Krause P (2012) MindBeat Quintet: Kinetifying thought through movement and sound.,
MindBeat is a software developed at the University of Surrey that facilitates collaborative thinking within a multi-disciplinary set-up. The project involved the development of an electronic space that enabled an academic ensemble to carry out an 'ideas improvisation'. Five academics from very different disciplines were invited to post short 'beats' (texts made up of no more than 3-4 sentences), around a predefined question: what makes multidisciplinary collaborations work? The beats developed in time into a progressive thread of ideas. The aim of the software was to track the emergence, development and decline of new ideas within a multidisciplinary environment, and also to be able to understand the patterns that emerge in this process by representing the ideas visually as coloured squares.

The MindBeat software was launched in June 2012 as part of an electronic theatre production of Peter Handke's 'Offending the Audience'. The five Surrey academics played the parts remotely by feeding Handke's text onto the Mindbeat website as part of a three-day durational performance. An open audience was then invited to interact with the play's five voices by sitting at one of five iMac stations set up in the studio space. These five computer monitors showed the text broken down into coloured square patterns. The audience could open the text by clicking onto the coloured squares, which would reveal the short text or beat. They could then add a comment or thought to the original text. The audience's participation produced almost 500 additional beats, creating an alternative version to the Handke script. The Mindbeat software visualised this ideation as a complex pattern of coloured squares.

The installation featured generative video and generative electronic music played live throughout the entire three-days. Using the colour and shape patterns of the ideas-exchange as their score, the musicians shared the software visualisation as a basis for their durational sonic improvisation.

Marinos A, Razavi A, Moschoyiannis S, Krause P (2009) RETRO: A Consistent and Recoverable RESTful Transaction Model, 2009 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, VOLS 1 AND 2 pp. 181-188 IEEE
Krause PJ, Ambler S, Elvang-Goransson M, Fox J (1995) A Logic of Argumentation for Reasoning Under Unertainty, Computational Intelligence 11 pp. 113-131 Wiley Blackwell
We present the syntax and proof theory of a logic of argumentation, LA. We also outline the development of a category theoretic semantics for LA. LA is the core of a proof theoretic model for reasoning under uncertainty. In this logic, propositions are labelled with a representation of the arguments which support their validity. Arguments may then be aggregated to collect more information about the potential validity of the propositions of interest. We make the notion of aggregation primitive to the logic, and then define strength mappings from sets of arguments to one of a number of possible dictionaries. This provides a uniform framework which incorporates a number of numerical and symbolic techniques for assigning subjective confidences to propositions on the basis of their supporting arguments. These aggregation techniques are also described, with examples
Mak L, Krause P (2006) Detection & management of concept drift, Proceedings of 2006 International Conference on Machine Learning and Cybernetics, Vols 1-7 pp. 3486-3491 IEEE
Bryant D, Krause P (2008) A review of current defeasible reasoning implementations, KNOWLEDGE ENGINEERING REVIEW 23 (3) pp. 227-260 CAMBRIDGE UNIV PRESS
Razavi A, Krause P, Moschoyiannis S (2010) Digital Ecosystems: challenges and proposed solutions, pp. 1003-1031 Information Science Reference - Imprint of: IGI Publishing
Razavi AR, Moschoyiannis SK, Krause PJ (2008) A Scale-free Business Network for Digital Ecosystems, 2008 2ND IEEE INTERNATIONAL CONFERENCE ON DIGITAL ECOSYSTEMS AND TECHNOLOGIES pp. 196-201 IEEE
Razavi A, Moschoyiannis S, Krause P (2007) Concurrency Control and Recovery Management for Open e-Business Transactions, WOTUG-30: COMMUNICATING PROCESS ARCHITECTURES 2007 65 pp. 267-285 IOS PRESS
Krause PJ, Sabry N (2013) Optimal Green Virtual Machine Migration Model, International Journal of Business Data Communications and Networking 9 (3) pp. 35-52
Cloud computing provides the opportunity to migrate virtual machines to ?follow-the-green? data centres. That is, to migrate virtual machines between green data centres on the basis of clean energy availability, to mitigate the environmental impact of carbon footprint emissions and energy consumption. The virtual machine migration problem can be modelled to maximize the utility of computing resources or minimizing the cost of using computing resources. However, this would ignore the network energy consumption and its impact on the overall CO2 emissions. Unless this is taken into account the extra data traffic due to migration of data could then cause an increase in brown energy consumption and eventually lead to an unintended increase in carbon footprint emissions. Energy consumption is a key aspect in deploying distributed service in cloud networks within decentralized service delivery architectures. In this paper, the authors address an optimiza- tion view of the problem of locating a set of cloud services on a set of sites green data centres managed by a service provider or hybrid cloud computing brokerage. The authors? goal is to minimize the overall network energy consumption and carbon footprint emissions for accessing the cloud services for any pair of data centres i and j. The authors propose an optimization migration model based on the development of integer linear programming (ILP) models, to identify the leverage of green energy sources with data centres and the energy consumption of migrating VMs.
Zheng Y, Krause P (2006) Asynchronous semantics and anti-patterns for interacting web services, QSIC 2006: Sixth International Conference on Quality Software, Proceedings pp. 74-81 IEEE COMPUTER SOC
Razavi AR, Moschoyiannis SK, Krause PJ (2007) A coordination model for distributed transactions in Digital Business EcoSystems, 2007 INAUGURAL IEEE INTERNATIONAL CONFERENCE ON DIGITAL ECOSYSTEMS AND TECHNOLOGIES pp. 319-324 IEEE
de Lusignan S, Liaw ST, Krause P, Curcin V, Vicente MT, Michalakidis G, Agreus L, Leysen P, Shaw N, Mendis K (2011) Key Concepts to Assess the Readiness of Data for International Research: Data Quality, Lineage and Provenance, Extraction and Processing Errors, Traceability, and Curation. Contribution of the IMIA Primary Health Care Informatics Working Group., Yearb Med Inform 6 (1) pp. 112-120
To define the key concepts which inform whether a system for collecting, aggregating and processing routine clinical data for research is fit for purpose.
Moschoyiannis S, Marinos A, Krause P (2010) Generating SQL queries from SBVR rules, Lecture Notes in Computer Science: Semantic Web Rules 6403 pp. 128-143 SPRINGER-VERLAG BERLIN
Declarative technologies have made great strides in expressivity between SQL and SBVR. SBVR models are more expressive that SQL schemas, but not as imminently executable yet. In this paper, we complete the architecture of a system that can execute SBVR models. We do this by describing how SBVR rules can be transformed into SQL DML so that they can be automatically checked against the database using a standard SQL query. In particular, we describe a formalization of the basic structure of an SQL query which includes aggregate functions, arithmetic operations, grouping, and grouping on condition. We do this while staying within a predicate calculus semantics which can be related to the standard SBVR-LF specification and equip it with a concrete semantics for expressing business rules formally. Our approach to transforming SBVR rules into standard SQL queries is thus generic, and the resulting queries can be readily executed on a relational schema generated from the SBVR model.
Marinos A, Krause P (2010) Towards the web of models: A rule-driven RESTful architecture for distributed systems, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6403 LNCS pp. 251-258
Krause PJ, Marinos A (2009) An SBVR framework for RESTful Web Applications, Lecture Notes in Computer Science: Rule Interchange and Applications 5858 pp. 144-158 Springer
We propose a framework that can be used to produce functioning web applications from SBVR models. To achieve this, we begin by discussing the concept of declarative application generation and examining the commonalities between SBVR and the RESTful architectural style of the web. We then show how a relational database schema and RESTful interface can be generated from an SBVR model. In this context, we discuss how SBVR can be used to semantically describe hypermedia on the Web and enhance its evolvability and loose coupling properties. Finally, we show that this system is capable of exhibiting process-like behaviour without requiring explicitly defined processes.
Allinjawi AA, Al-Nuaim HA, Krause P (2014) An Achievement Degree Analysis Approach to Identifying Learning Problems in Object-Oriented Programming, ACM TRANSACTIONS ON COMPUTING EDUCATION 14 (3) ARTN 20 ASSOC COMPUTING MACHINERY
Marinos A, Krause P (2009) What, not How: A generative approach to service composition, 2009 3RD IEEE INTERNATIONAL CONFERENCE ON DIGITAL ECOSYSTEMS AND TECHNOLOGIES pp. 430-435 IEEE
Moschoyiannis S, Krause P, Bryant D, McBurney P (2009) Verifiable Protocol Design for Agent Argumentation Dialogues, 2009 3RD IEEE INTERNATIONAL CONFERENCE ON DIGITAL ECOSYSTEMS AND TECHNOLOGIES pp. 459-464 IEEE
Razavi A, Moschoyiannis S, Krause P (2009) An open digital environment to support business ecosystems, Peer-to-Peer Networking and Applications 2 (4) pp. 367-397 SPRINGER
We present a Peer-to-Peer network design which aims to support business activities conducted through a network of collaborations that generate value in different, mutually beneficial, ways for the participating organisations. The temporary virtual networks formed by long-term business transactions that involve the execution of multiple services from different providers are used as the building block of the underlying scale-free business network. We show how these local interactions, which are not governed by a single organisation, give rise to a fully distributed P2P architecture that reflects the dynamics of business activities. The design is based on dynamically formed permanent clusters of nodes, the so-called Virtual Super Peers (VSPs), and this results in a topology that is highly resilient to certain types of failure (and attacks). Furthermore, the proposed P2P architecture is capable of reconfiguring itself to adapt to the usage that is being made of it and respond to global failures of conceptual hubs. This fosters an environment where business communities can evolve to meet emerging business opportunities and achieve sustainable growth within a digital ecosystem.
Michalakidis G, Kumarapeli P, Ring A, van Vlymen J, Krause P, de Lusignan S (2010) A system for solution-orientated reporting of errors associated with the extraction of routinely collected clinical data for research and quality improvement., Studies in Health Technology and Informatics: Proceedings of the 13th World Congress on Medical Informatics 160 (Pt 1) pp. 724-728
We have used routinely collected clinical data in epidemiological and quality improvement research for over 10 years. We extract, pseudonymise and link data from heterogeneous distributed databases; inevitably encountering errors and problems.
Zheng Y, Zhou J, Krause P (2007) Analysis of BPEL data dependencies, SEAA 2007: 33RD EUROMICRO CONFERENCE ON SOFTWARE ENGINEERING AND ADVANCED APPLICATIONS, PROCEEDINGS pp. 351-358 IEEE COMPUTER SOC
de Lusignan S, Cashman J, Poh N, Michalakidis G, Mason A, Desombre T, Krause P (2012) Conducting Requirements Analyses for Research using Routinely Collected Health Data: a Model Driven Approach., Stud Health Technol Inform 180 pp. 1105-1107
Background: Medical research increasingly requires the linkage of data from different sources. Conducting a requirements analysis for a new application is an established part of software engineering, but rarely reported in the biomedical literature; and no generic approaches have been published as to how to link heterogeneous health data. Methods: Literature review, followed by a consensus process to define how requirements for research, using, multiple data sources might be modeled. Results: We have developed a requirements analysis: i-ScheDULEs - The first components of the modeling process are indexing and create a rich picture of the research study. Secondly, we developed a series of reference models of progressive complexity: Data flow diagrams (DFD) to define data requirements; unified modeling language (UML) use case diagrams to capture study specific and governance requirements; and finally, business process models, using business process modeling notation (BPMN). Discussion: These requirements and their associated models should become part of research study protocols.
Krause PJ, Fenton N, Neil M, Marsh W, Hearty P, Marquez D, Mishra R (2007) Predicting software defects in varying development lifecycles using Bayesian nets, Information and Software Technology 49 (1) pp. 32-43
Fenton N, Neil M, Marsh W, Hearty P, Marquez D, Krause P, Mishra R (2007) Predicting software defects in varying development lifecycles using Bayesian nets, INFORMATION AND SOFTWARE TECHNOLOGY 49 (1) pp. 32-43 ELSEVIER SCIENCE BV
Liang P-C, Krause P (2016) Smartphone-Based Real-Time Indoor Location Tracking With 1-m Precision, IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS 20 (3) pp. 756-762 IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Zhang F, Povey D, Krause P (2007) Protein Attributes Microtuning System (PAMS): an effective tool to increase protein structure prediction by data purification, 2007 INAUGURAL IEEE INTERNATIONAL CONFERENCE ON DIGITAL ECOSYSTEMS AND TECHNOLOGIES pp. 53-58 IEEE
Manaf NA, Moschoyiannis S, Krause P (2015) Service Choreography, SBVR, and Time, EPTCS 201, 2015, pp. 63-77
We propose the use of structured natural language (English) in specifying
service choreographies, focusing on the what rather than the how of the
required coordination of participant services in realising a business
application scenario. The declarative approach we propose uses the OMG standard
Semantics of Business Vocabulary and Rules (SBVR) as a modelling language. The
service choreography approach has been proposed for describing the global
orderings of the invocations on interfaces of participant services. We
therefore extend SBVR with a notion of time which can capture the coordination
of the participant services, in terms of the observable message exchanges
between them. The extension is done using existing modelling constructs in
SBVR, and hence respects the standard specification. The idea is that users -
domain specialists rather than implementation specialists - can verify the
requested service composition by directly reading the structured English used
by SBVR. At the same time, the SBVR model can be represented in formal logic so
it can be parsed and executed by a machine.
Razavi A, Marinos A, Moschoyiannis S, Krause P (2009) RESTful Transactions Supported by the Isolation Theorems, WEB ENGINEERING, PROCEEDINGS 5648 pp. 394-409 SPRINGER-VERLAG BERLIN
Liyanage H, Krause P, De Lusignan S (2015) Using ontologies to improve semantic interoperability in health data., Journal of innovation in health informatics 22 (2) pp. 309-315
The present-day health data ecosystem comprises a wide array of complex heterogeneous data sources. A wide range of clinical, health care, social and other clinically relevant information are stored in these data sources. These data exist either as structured data or as free-text. These data are generally individual person-based records, but social care data are generally case based and less formal data sources may be shared by groups. The structured data may be organised in a proprietary way or be coded using one-of-many coding, classification or terminologies that have often evolved in isolation and designed to meet the needs of the context that they have been developed. This has resulted in a wide range of semantic interoperability issues that make the integration of data held on these different systems changing. We present semantic interoperability challenges and describe a classification of these. We propose a four-step process and a toolkit for those wishing to work more ontologically, progressing from the identification and specification of concepts to validating a final ontology. The four steps are: (1) the identification and specification of data sources; (2) the conceptualisation of semantic meaning; (3) defining to what extent routine data can be used as a measure of the process or outcome of care required in a particular study or audit and (4) the formalisation and validation of the final ontology. The toolkit is an extension of a previous schema created to formalise the development of ontologies related to chronic disease management. The extensions are focused on facilitating rapid building of ontologies for time-critical research studies.
Webb SJ, Hanser T, Howlin B, Krause P, Vessey JD (2014) Feature combination networks for the interpretation of statistical machine learning models: Application to Ames mutagenicity, Journal of Cheminformatics 6 (1)
Background: A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints.A fragmentation algorithm is utilised to investigate the model's behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model's behaviour for the specific query. Results: Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. Conclusion: This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development. © 2014 Webb et al.; licensee Chemistry Central Ltd.
Razavi AR, Malone PJ, Moschoyiannis S, Jennings B, Krause PJ (2007) A distributed transaction and accounting model for digital ecosystem composed services, 2007 INAUGURAL IEEE INTERNATIONAL CONFERENCE ON DIGITAL ECOSYSTEMS AND TECHNOLOGIES pp. 215-218 IEEE
Zheng Y, Zhou J, Krause P (2007) A model checking based test case generation framework for web services, International Conference on Information Technology, Proceedings pp. 715-720 IEEE COMPUTER SOC
Moschoyiannis S, Krause PJ (2015) True Concurrency in Long-running Transactions for Digital Ecosystems, FUNDAMENTA INFORMATICAE 138 (4) pp. 483-514 IOS PRESS
Bryant D, Krause P (2006) An implementation of a lightweight argumentation engine for agent applications, LOGICS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS 4160 pp. 469-472 SPRINGER-VERLAG BERLIN
de Lusignan S, Krause P, Michalakidis G, Vicente MT, Thompson S, McGilchrist M, Sullivan F, van Royen P, Agreus L, Desombre T, Taweel A, Delaney B (2012) Business Process Modelling is an Essential Part of a Requirements Analysis. Contribution of EFMI Primary Care Working Group., Yearb Med Inform 7 (1) pp. 34-43 Schattauer Publishers
To perform a requirements analysis of the barriers to conducting research linking of primary care, genetic and cancer data.
Moschoyiannis S, Krause PJ, Shields MW (2009) A True-Concurrent Interpretation of Behavioural Scenarios, Electronic Notes in Theoretical Computer Science 203 (7) pp. 3-22 Elsevier
We describe a translation of scenarios given in UML 2.0 sequence diagrams into a tuples-based behavioural model that considers multiple access points for a participating instance and exhibits true-concurrency. This is important in a component setting since different access points are connected to different instances, which have no knowledge of each other. Interactions specified in a scenario are modelled using tuples of sequences, one sequence for each access point. The proposed unfolding of the sequence diagram involves mapping each location (graphical position) onto the so-called component vectors. The various modes of interaction (sequential, alternative, concurrent) manifest themselves in the order structure of the resulting set of component vectors, which captures the dependencies between participating instances. In previous work, we have described how (sets of) vectors generate concurrent automata. The extension to our model with sequence diagrams in this paper provides a way to verify the diagram against the state-based model.
Krause PJ, Fenton N, Neil M, Marsh W, Hearty P, Radlinski L (2008) On the effectiveness of early life cycle defect prediction with Bayesian Nets, Empirical Software Engineering: an international journal 13 (5) pp. 499-537 Springer
Standard practice in building models in software engineering normally involves three steps: collecting domain knowledge (previous results, expert knowledge); building a skeleton of the model based on step 1 including as yet unknown parameters; estimating the model parameters using historical data. Our experience shows that it is extremely difficult to obtain reliable data of the required granularity, or of the required volume with which we could later generalize our conclusions. Therefore, in searching for a method for building a model we cannot consider methods requiring large volumes of data. This paper discusses an experiment to develop a causal model (Bayesian net) for predicting the number of residual defects that are likely to be found during independent testing or operational usage. The approach supports (1) and (2), does not require (3), yet still makes accurate defect predictions (an R 2 of 0.93 between predicted and actual defects). Since our method does not require detailed domain knowledge it can be applied very early in the process life cycle. The model incorporates a set of quantitative and qualitative factors describing a project and its development process, which are inputs to the model. The model variables, as well as the relationships between them, were identified as part of a major collaborative project. A dataset, elicited from 31 completed software projects in the consumer electronics industry, was gathered using a questionnaire distributed to managers of recent projects. We used this dataset to validate the model by analyzing several popular evaluation measures (R 2, measures based on the relative error and Pred). The validation results also confirm the need for using the qualitative factors in the model. The dataset may be of interest to other researchers evaluating models with similar aims. Based on some typical scenarios we demonstrate how the model can be used for better decision support in operational environments. We also performed sensitivity analysis in which we identified the most influential variables on the number of residual defects. This showed that the project size, scale of distributed communication and the project complexity cause the most of variation in number of defects in our model. We make both the dataset and causal model available for research use.
Zheng Y, Krause P (2007) Automata semantics and analysis of BPEL, 2007 INAUGURAL IEEE INTERNATIONAL CONFERENCE ON DIGITAL ECOSYSTEMS AND TECHNOLOGIES pp. 307-312 IEEE
Larkham A (2018) Investigating agent based models for testing the effects of carbon taxes on the information and communications technology market with respect to small and medium sized enterprises in the United Kingdom.,
At a time when climate change has become an accepted risk to civilisation, governments are developing and implementing policies designed to reduce the impacts of anthropogenic greenhouse gas (GHG) emissions. These can include ?command and control? directives. However, in free market economies, such as the United Kingdom, there is a preference for a lassie faire approach with taxes and incentives designed to shape the market rather than direct government intervention.

This thesis examines the feasibility of using agent-based modeling techniques for predictive analysis with respect to the application of carbon taxes on electricity consumption in the context of wider societal objectives and scenarios ? particularly in the procurement of information and communication technology (ICT) equipment and services by small and medium businesses (SMEs) in the United Kingdom. In doing so it provides an area of novel research. With more than 2% of greenhouse gas (GHG) emissions associated with the usage of ICT and with that proportion expected to grow, it is an important sector to target with emission reduction strategies. Testing these strategies in simulation prior to application to the market place is important to policy makers. Normally, policy makers wish to reduce the risk of implementing policies that would not achieve the desired goals and or harm the economy. Agent-based modeling potentially offers policy makers valuable insight into probable emergent market behaviour from a ?bottom-up? methodology that better suits free markets that contain millions of SMEs.

This thesis applies a multidisciplinary problem solving approach, including microeconomics, agent based modeling and policy research, to examining potential market responses to carbon taxes such as the Climate Change Levy (CCL); the UK?s primary carbon tax designed to reduce carbon emissions produced by SMEs. Areas of novelty in the thesis include the use of agent based models to examine the effects of carbon taxes on the behaviour of SMEs and the use of ICT as a factor production with the SME agents themselves.

Mak L, Krause P (2006) Detection & Management of Concept Drift, pp. 3486-3491

The ability to correctly detect the location and derive the contextual information where a concept begins to drift is essential in the study of domains with changing context. This paper proposes a Top-down learning method with the incorporation of a learning accuracy mechanism to efficiently detect and manage context changes within a large dataset. With the utilisation of simple search operators to perform convergent search and JBNC with a graphical viewer to derive context information, the identified hidden context are shown with the location of the disjoint points, the contextual attributes that contribute to the concept drift, the graphical output of the true relationships between these attributes and the Boolean characterisation which is the context.