Professor Stephen Jarvis

Professor Stephen Jarvis


President and Vice-Chancellor
Eleanor Ellis, Executive Assistant

About

My qualifications

PhD in Computer Science
University of Durham
MSc in Computation
University of Oxford
BSc (Hons) in Computer Science
University of London

Publications

T.R. Law, J. Hancox, S.A. Wright, S.A. Jarvis (2019)An algorithm for computing short-range forces in molecular dynamics simulations with non-uniform particle densities, In: Journal of parallel and distributed computing130pp. 1-11 Elsevier Inc

We present projection sorting, an algorithmic approach to determining pairwise short-range forces between particles in molecular dynamics simulations. We show it can be more effective than the standard approaches when particle density is non-uniform. We implement tuned versions of the algorithm in the context of a biophysical simulation of chromosome condensation, for the modern Intel Broadwell and Knights Landing architectures, across multiple nodes. We demonstrate up to 5× overall speedup and good scaling to large problem sizes and processor counts. •We present the projection sorting algorithm for molecular dynamics simulations.•We provide optimised implementations for Intel Broadwell and Knights Landing.•We extend our implementations to multi-node environments using MPI.•We investigate the performance of our algorithm in a biophysical MD simulation.•We observe serial speedups up to 5×, and good MPI scaling behaviour.

Sarunkorn Chotvijit, Malkiat Thiarai, Stephen Jarvis (2018)Big data analytics in social care provision, In: Anneke Zuiderwijk, Charles C. Hinnant (eds.), Proceedings of the 19th Annual International Conference on Digital Government Research: Governance in the Data Agea5pp. 1-8 ACM

There is significant national interest in tackling issues surrounding the needs of vulnerable children and adults. At the same time, UK local authorities face severe financial challenges as a result of decreasing financial settlements and increasing demands from growing urban populations. This research employs state-of-the-art data analytics and visualisation techniques to analyse six years of local government social care data for the city of Birmingham, the UK's second most populated city. This analysis identifies: (i) service cost profiles over time; (ii) geographical dimensions to service demand and delivery; (iii) patterns in the provision of services, and (iv) the extent to which data value and data protection interact. The research accesses data held by the local authority to discover patterns and insights that may assist in the understanding of service demand, support decision making and resource management, whilst protecting and safeguarding its most vulnerable citizens. The use of data in this manner could also inform the approach a local authority has to its data, its capture and use, and the potential for supporting data-led management and service improvements.

Henry Crosby, Paul Davis, Theo Damoulas, Stephen A. Jarvis (2016)A spatio-temporal, Gaussian process regression, real-estate price predictor, In: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems68pp. 1-4 ACM

This paper introduces a novel four-stage methodology for real-estate valuation. This research shows that space, property, economic, neighbourhood and time features are all contributing factors in producing a house price predictor in which validation shows a 96.6% accuracy on Gaussian Process Regression beating regression-kriging, random forests and an M5P-decision-tree. The output is integrated into a commercial real estate decision engine.

Victoria Houlden, João Porto de Albuquerque, Scott Weich, Stephen Jarvis (2019)A spatial analysis of proximate greenspace and mental wellbeing in London, In: Applied geography (Sevenoaks)109102036 Elsevier Ltd

While local-area greenspace is associated with reduced symptoms of mental distress and greater life satisfaction, most previous research has measured the amount of local-area greenspace within administrative boundaries, and found mixed results for associations between greenspace and multidimensional mental wellbeing. The study was designed to examine whether the amount of greenspace within a radius of individuals’ homes was associated with mental wellbeing, testing the government guideline that greenspace should be available within 300 m of homes. Individual and Household-level data were drawn from the Annual Population Survey at postcode level (APS, Pooled Dataset 2012–2015), which includes 3 mental wellbeing measures, covering aspects of life satisfaction, sense of worth, and happiness, as well as a range of socio-demographic variables. Greenspace data were obtained Greenspace Information for Greater London Group (GiGL), and was used to calculated the amount of greenspace within a 300 m radius of individuals. Linear regression models revealed positive and statistically significant associations between the amount of greenspace and indicators of life satisfaction and worth. Moran's I, an indicator of spatial autocorrelation, revealed statistically significant clustering of the residuals of these models, so Geographically Weighted Regression (GWR) models were calculated, in order to adjust for underlying spatial processes within the data and investigate the geographic variation in the association between local greenspace and mental wellbeing. The global GWR model revealed that an increase in 1 ha of greenspace within 300 m of residents was associated with a statistically significant 0.803 increase in life satisfaction, 0.740 and 0.521 for worth and happiness, respectively. This therefore provides some support for the inclusion of greenspace within 300 m of homes. Local GWR coefficients revealed slight variation in the strength of these associations across the study space. Therefore, further analyses are required to investigate whether the walking (network distance), absolute size, or type of each greenspace are able to explain this spatial variation. •Associations between local-area greenspace and mental wellbeing are mixed.•UK Government recommends greenspace within 300 m of individuals.•Greenspace within buffers of individuals was calculated.•Greenspace within 300 m associated with improved mental wellbeing.•This association varies spatially.

Steven A Wright, Simon D Hammond, Simon J Pennycook, Stephen A Jarvis (2011)Light-weight parallel I/O analysis at scale

Input/output (I/O) operations can represent a significant proportion of the run-time when large scientific applications are run in parallel. Although there have been advances in the form of file-format libraries, file system design and I/O hardware, a growing divergence exists between the performance of parallel file systems and compute processing rates. In this paper we utilise RIOT, an input/output tracing toolkit being developed at the University of Warwick, to assess the performance of three standard industry I/O benchmarks and mini-applications. We present a case study demonstrating the tracing and analysis capabilities of RIOT at scale, using MPI-IO, Parallel HDF-5 and MPI-IO augmented with the Parallel Log-structured File System (PLFS) middle-ware being developed by the Los Alamos National Laboratory.

Stephen Jarvis (2016)Foreword, In: 2016 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)pp. v-vi IEEE

Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.

Liam Steadman, Nathan Griffiths, Stephen Jarvis, Mark Bell, Shaun Helman, Caroline Wallbank (2021)k D-STR: A Method for Spatio-Temporal Data Reduction and Modelling, In: ACM/IMS transactions on data science2(3)pp. 1-31

Analysing and learning from spatio-temporal datasets is an important process in many domains, including transportation, healthcare and meteorology. In particular, data collected by sensors in the environment allows us to understand and model the processes acting within the environment. Recently, the volume of spatio-temporal data collected has increased significantly, presenting several challenges for data scientists. Methods are therefore needed to reduce the quantity of data that needs to be processed in order to analyse and learn from spatio-temporal datasets. In this article, we present the - Dimensional Spatio-Temporal Reduction method ( D-STR ) for reducing the quantity of data used to store a dataset whilst enabling multiple types of analysis on the reduced dataset. D-STR uses hierarchical partitioning to find spatio-temporal regions of similar instances, and models the instances within each region to summarise the dataset. We demonstrate the generality of D-STR with three datasets exhibiting different spatio-temporal characteristics and present results for a range of data modelling techniques. Finally, we compare D-STR with other techniques for reducing the volume of spatio-temporal data. Our results demonstrate that D-STR is effective in reducing spatio-temporal data and generalises to datasets that exhibit different properties.

Xiaolong Jin, Ahmed Y. Al-Dubai, Stephen Jarvis (2012)Message from IUCC Program Chairs, In: 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communicationspp. lxv-lxv IEEE

Presents the welcome message from the conference proceedings.

S. D. Hammond, G. R. Mudalige, J. A. Smith, J. A. Davis, S. A. Jarvis, J. Holt, I. Miller, J. A. Herdman, A. Vadgama (2010)To upgrade or not to upgrade? Catamount vs. Cray linux environment, In: 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum5470885pp. 1-8
Stephen Jarvis (2019)Message from the Workshop Chairs, In: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedingspp. 5-6 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)

Conference Title: 2019 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)Conference Start Date: 2019, Nov. 18 Conference End Date: 2019, Nov. 18 Conference Location: Denver, CO, USAPresents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.

Stephen Jarvis (2020)Message from the ISPA-2020 Program Chairs ISPA-BDCloud-SocialCom-SustainCom 2020, In: 2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)pp. xxxvi-xxxvi IEEE

Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.

Lori A S Snyder, Stephen A Jarvis, Nigel J Saunders (2005)Complete and variant forms of the 'gonococcal genetic island' in Neisseria meningitidis, In: Microbiology (Society for General Microbiology)151(Pt 12)pp. 4005-4013

Comparative genome hybridization using the pan-Neisseria microarray identified genes from the gonococcal genetic island (GGI) within Neisseria meningitidis strains of serogroups W-135, H, and Z. While some of these strains contain nearly all of the genes of the GGI, there are differences in the presence of some of these genes between the strains, including between those of the same serogroup. Attempts were then made to determine the location of the GGI in these meningococci. Sequencing of Neisseria gonorrhoeae strain MS11 revealed that the GGI is a conjugative plasmid that can be chromosomally integrated at the dif sites near ung and can also be present in its circularized form. In N. meningitidis, a dif site is present in this location and also serves as the point of chromosomal integration of the GGI in this species.

Nataliya Tkachenko, Sarunkorn Chotvijit, Neha Gupta, Emma Bradley, Charlotte Gilks, Weisi Guo, Henry Crosby, Eliot Shore, Malkiat Thiarai, Rob Procter, Stephen Jarvis (2017)Google Trends can improve surveillance of Type 2 diabetes, In: Scientific reports7(1)4993pp. 4993-10

Recent studies demonstrate that people are increasingly looking online to assess their health, with reasons varying from personal preferences and beliefs to inability to book a timely appointment with their local medical practice. Records of these activities represent a new source of data about the health of populations, but which is currently unaccounted for by disease surveillance models. This could potentially be useful as evidence of individuals' perception of bodily changes and self-diagnosis of early symptoms of an emerging disease. We make use of the Experian geodemographic Mosaic dataset in order to extract Type 2 diabetes candidate risk variables and compare their temporal relationships with the search keywords, used to describe early symptoms of the disease on Google. Our results demonstrate that Google Trends can detect early signs of diabetes by monitoring combinations of keywords, associated with searches for hypertension treatment and poor living conditions; Combined search semantics, related to obesity, how to quit smoking and improve living conditions (deprivation) can be also employed, however, may lead to less accurate results.

H. N LIM CHOI KEUNG, J. R. D Dyson, S. A Jarvis, G. R Nudd (2003)Performance evaluation of a grid resource monitoring and discovery service, In: IEE proceedings. Software150(4)pp. 243-251 Institution of Electrical Engineers
Nataliya Tkachenko, Rob Procter, Stephen Jarvis (2016)Predicting the impact of urban flooding using open data, In: Royal Society open science3(5)160013 The Royal Society

This paper aims to explore whether there is a relationship between search patterns for flood risk information on the Web and how badly localities have been affected by flood events. We hypothesize that localities where people stay more actively informed about potential flooding experience less negative impact than localities where people make less effort to be informed. Being informed, of course, does not hold the waters back; however, it may stimulate (or serve as an indicator of) such resilient behaviours as timely use of sandbags, relocation of possessions from basements to upper floors and/or temporary evacuation from flooded homes to alternative accommodation. We make use of open data to test this relationship empirically. Our results demonstrate that although aggregated Web search reflects average rainfall patterns, its eigenvectors predominantly consist of locations with similar flood impacts during 2014–2015. These results are also consistent with statistically significant correlations of Web search eigenvectors with flood warning and incident reporting datasets.

Nigel J Saunders, Prawit Boonmee, John F Peden, Stephen A Jarvis (2005)Inter-species horizontal transfer resulting in core-genome and niche-adaptive variation within Helicobacter pylori, In: BMC genomics6(1)9pp. 9-9 BioMed Central
R. G. Morgan, S. A. Jarvis (1998)Profiling large-scale lazy functional programs, In: Journal of functional programming8(3)pp. 201-237 Cambridge University Press

The LOLITA natural language processor is an example of one of the ever-increasing number of large-scale systems written entirely in a functional programming language. The system consists of over 47,000 lines of Haskell code (excluding comments) and is able to perform a wide range of tasks such as semantic and pragmatic analysis of text, information extraction and query analysis. The efficiency of such a system is critical; interactive tasks (such as query analysis) must ensure that the user is not inconvenienced by long pauses, and batch mode tasks (such as information extraction) must ensure that an adequate throughput can be achieved. For the past three years the profiling tools supplied with GHC and HBC have been used to analyse and reason about the complexity of the LOLITA system. There have been good results, however experience has shown that in a large system the profiling life-cycle is often too long to make detailed analysis possible, and the results are often misleading. In response to these problems a profiler has been developed which allows the complete set of program costs to be recorded in so-called cost-centre stacks. These program costs are then analysed using a post-processing tool to allow the developer to explore the costs of the program in ways that are either not possible with existing tools or would require repeated compilations and executions of the program. The modifications to the Glasgow Haskell compiler based on detailed cost semantics and an efficient implementation scheme are discussed. The results of using this new profiling tool in the analysis of a number of Haskell programs are also presented. The overheads of the scheme are discussed and the benefits of this new system are considered. An outline is also given of how this approach can be modified to assist with the tracing and debugging of programs.

Stephen Jarvis (2011)Message from the General Co-chairs and Program Co-chairs, In: 2011 Second International Conference on Networking and Distributed Computingpp. xi-xii IEEE

Presents the welcome message from the conference proceedings.

Nataliya Tkachenko, Stephen Jarvis, Rob Procter (2017)Predicting floods with Flickr tags, In: PloS one12(2)e0172870

Increasingly, user generated content (UGC) in social media postings and their associated metadata such as time and location stamps are being used to provide useful operational information during natural hazard events such as hurricanes, storms and floods. The main advantage of these new sources of data are twofold. First, in a purely additive sense, they can provide much denser geographical coverage of the hazard as compared to traditional sensor networks. Second, they provide what physical sensors are not able to do: By documenting personal observations and experiences, they directly record the impact of a hazard on the human environment. For this reason interpretation of the content (e.g., hashtags, images, text, emojis, etc) and metadata (e.g., keywords, tags, geolocation) have been a focus of much research into social media analytics. However, as choices of semantic tags in the current methods are usually reduced to the exact name or type of the event (e.g., hashtags '#Sandy' or '#flooding'), the main limitation of such approaches remains their mere nowcasting capacity. In this study we make use of polysemous tags of images posted during several recent flood events and demonstrate how such volunteered geographic data can be used to provide early warning of an event before its outbreak.

Stephen A. Jarvisl, Richard G. Morgan, Stephen Jarvis (1997)The results of: Profiling large-scale lazy functional programs, In: Implementation of Functional Languages - 8th International Workshop, IFL 1996, Selected Papers1268pp. 200-221
Stephen Jarvis (2010)Message from the ScalCom-2010 General Chairs, In: 2010 IEEE 10th International Conference on Computer and Information Technologypp. lxxix-lxxix IEEE

Presents the welcome message from the conference proceedings.

M. Al-Ghamdi, A. P. Chester, L. He, S. A. Jarvis, J. W. J. Xue (2011)Dynamic Active Window Management: A Method for Improving Revenue Generation in Dynamic Enterprise Systems, In: 2011 IEEE 11th International Conference on Computer and Information Technology6036586pp. 23-28 IEEE

In dynamic resource allocation systems, servers are moved between pools when overloading is detected. In this work, we investigate the impact to such systems of combining three adaptive monitoring techniques. First we employ two well known switching policies the Proportional Switching Policy (PSP) and the Bottleneck Aware Switching Policy (BSP) to move servers between server pools as appropriate. Second we use a meta forecasting technique to predict the movement in future system workload. Third, we use a Dynamic Active Window Model (DAWM), which defines the period over which workload data is analysed. We have previously shown that request servicing capability can be improved by as much as 40% when the right combination of dynamic server switching and workload forecasting are used. This extended model shows that a further 51.5% improvement can be achieved when the switching server policy, meta-forecasting and dynamic active window management are employed together over a real-world workload based on Internet traces.

Stephen Jarvis (2017)Message from FCST 2017 Program Chairs, In: Proceedings / International Symposium on Parallel Architectures, Algorithms, and Networks (ISPAN)pp. xxv-xxv IEEE

Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.

Matthew T. Bettencourt, Dominic A.S. Brown, Keith L. Cartwright, Eric C. Cyr, Christian A. Glusa, Paul T. Lin, Stan G. Moore, Duncan A.O. McGregor, Roger P. Pawlowski, Edward G. Phillips, Nathan V. Roberts, Steven A. Wright, Satheesh Maheswaran, John P. Jones, Stephen A. Jarvis, (2021)EMPIRE-PIC: A performance portable unstructured particle-in-cell code, In: Communications in computational physics30(4)pp. 1232-1268
Liam Steadman, Nathan Griffiths, Stephen Jarvis, Mark Bell, Shaun Helman, Caroline Wallbank (2020)Reducing and linking spatio-temporal datasets with kD-STR, In: Bandana Kar, Xinyue Ye, Shima Mohebbi, Guangtao Fu (eds.), Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Advances in Resilient and Intelligent Citiespp. 10-19 ACM

When linking spatio-temporal datasets, the kD-STR algorithm can be used to reduce the datasets and speed up the linking process. However, kD-STR can sacrifice accuracy in the linked dataset whilst retaining unnecessary information. To overcome this, we propose a preprocessing step that removes unnecessary information and an alternative heuristic for kD-STR that prioritises accuracy in the linked output. These are evaluated in a case study linking a road condition dataset with air temperature, rainfall and road traffic data. In this case study, we found the alternative heuristic achieved a 19% improvement in mean error for the linked air temperature features and an 18% reduction in storage used for the rainfall dataset compared to the original kD-STR heuristic. The results in this paper support our hypothesis that, at worse, our alternative heuristic will yield a similar error and storage overhead for linking scenarios as the original kD-STR heuristic. However, in some cases it can give a reduction that is more accurate when linking the datasets whilst using less storage than the original kD-STR algorithm.

Stephen Jarvis (2020)Message from the Workshop Chairs, In: 2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)pp. 6-7 IEEE

Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.

Victoria Houlden, Scott Weich, João Porto de Albuquerque, Stephen Jarvis, Karen Rees (2018)The relationship between greenspace and the mental wellbeing of adults: A systematic review, In: PloS one13(9)e0203000

The view that interacting with nature enhances mental wellbeing is commonplace, despite a dearth of evidence or even agreed definitions of 'nature'. The aim of this review was to systematically appraise the evidence for associations between greenspace and mental wellbeing, stratified by the different ways in which greenspace has been conceptualised in quantitative research. We undertook a comprehensive database search and thorough screening of articles which included a measure of greenspace and validated mental wellbeing tool, to capture aspects of hedonic and/or eudaimonic wellbeing. Quality and risk of bias in research were assessed to create grades of evidence. We undertook detailed narrative synthesis of the 50 studies which met the review inclusion criteria, as methodological heterogeneity precluded meta-analysis. Results of a quality assessment and narrative synthesis suggest associations between different greenspace characteristics and mental wellbeing. We identified six ways in which greenspace was conceptualised and measured: (i) amount of local-area greenspace; (ii) greenspace type; (iii) visits to greenspace; (iv) views of greenspace; (v) greenspace accessibility; and (vi) self-reported connection to nature. There was adequate evidence for associations between the amount of local-area greenspace and life satisfaction (hedonic wellbeing), but not personal flourishing (eudaimonic wellbeing). Evidence for associations between mental wellbeing and visits to greenspace, accessibility, and types of greenspace was limited. There was inadequate evidence for associations with views of greenspace and connectedness to nature. Several studies reported variation in associations between greenspace and wellbeing by life course stage, gender, levels of physically activity or attitudes to nature. Greenspace has positive associations with mental wellbeing (particularly hedonic wellbeing), but the evidence is not currently sufficient or specific enough to guide planning decisions. Further studies are needed, based on dynamic measures of greenspace, reflecting access and uses of greenspace, and measures of both eudaimonic and hedonic mental wellbeing.

Stephen Jarvis (2009)Preface, In: 2009 First International Conference on Advances in P2P Systemspp. ix-x IEEE

Presents the welcome message from the conference proceedings.

Alessio Pagani, Guillem Mosquera, Aseel Alturki, Samuel Johnson, Stephen Jarvis, Alan Wilson, Weisi Guo, Liz Varga (2019)Resilience or robustness: identifying topological vulnerabilities in rail networks, In: Royal Society open science6(2)181301 The Royal Society

Many critical infrastructure systems have network structures and are under stress. Despite their national importance, the complexity of large-scale transport networks means that we do not fully understand their vulnerabilities to cascade failures. The research conducted through this paper examines the interdependent rail networks in Greater London and surrounding commuter area. We focus on the morning commuter hours, where the system is under the most demand stress. There is increasing evidence that the topological shape of the network plays an important role in dynamic cascades. Here, we examine whether the different topological measures of resilience (stability) or robustness (failure) are more appropriate for understanding poor railway performance. The results show that resilience, not robustness, has a strong correlation with the consumer experience statistics. Our results are a way of describing the complexity of cascade dynamics on networks without the involvement of detailed agent-based models, showing that cascade effects are more responsible for poor performance than failures. The network science analysis hints at pathways towards making the network structure more resilient by reducing feedback loops.

Konstantin Klemmer, Tobias Brandt, Stephen Jarvis (2018)Isolating the effect of cycling on local business environments in London, In: PloS one13(12)e0209090

We investigate whether increasing cycling activity affects the emergence of new local businesses. Historical amenity data from OpenStreetMap is used to quantify change in shop and sustenance amenity counts. We apply an instrumental variable framework to investigate a causal relationship and to account for endogeneity in the model. Measures of cycling infrastructure serve as instruments. The impact is evaluated on the level of 4835 Lower Super Output Areas in Greater London. Our results indicate that an increase in cycling trips significantly contributes to the emergence of new local shops and businesses. Limitations regarding data quality, zero-inflation and residual spatial autocorrelation are discussed. While our findings correspond to previous investigations stating positive economic effects of cycling, we advance research in the field by providing a new dataset of unprecedented high granularity and size. Furthermore, this is the first study in cycling research looking at business amenities as a measure of economic activity. The insights from our analysis can enhance understandings of how cycling affects the development of local urban economies and may thus be used to assess and evaluate transport policies and investments. Beyond this, our study highlights the value of open data in city research.

S.A. Jarvis (2003)Editorial: Performance engineering, In: IEE proceedings. Software150(4)
Stephen Jarvis (2015)Message from the PICom 2015 Program Chairs, In: 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computingpp. xlvii-xlvii IEEE

Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.

A. P. Chester, M. Leeke, M. Al-Ghamdi, A. Jhumka, S. A. Jarvis (2011)A Framework for Data Center Scale Dynamic Resource Allocation Algorithms, In: 2011 IEEE 11th International Conference on Computer and Information Technology6036593pp. 67-74 IEEE

The scale and complexity of online applications and e-business infrastructures has led service providers to rely on the capabilities of large-scale hosting platforms, i.e., data centers. Dynamic resource allocation (DRA) algorithms have been shown to allow server resource allocation to be matched with application workload, which can improve server resource utilisation and drive down costs for service providers. However, research on DRA algorithms has almost exclusively focused on their performance characteristics at small-scale, precluding their useful application in commercial hosting environments, such as those dedicated to supporting cloud computing. In this paper, we show, and subsequently propose a framework to address, the scalability problems of current DRA algorithms. We then build on the proposed framework to develop a highly-scalable algorithm for dynamic resource allocation.

Andrew Lamzed-Short, Timothy R Law, Andrew Mallinson, Gihan R Mudalige, Stephen A Jarvis (2021)Towards automated kernel fusion for the optimisation of scientific applications IEEE

In this paper we introduce a novel transformation pass written using LLVM that performs kernel fusion. We demonstrate the correctness and performance of the pass on several example programs inspired by scientific applications of interest. The method achieves up to 4× speedup relative to unfused versions of the programs, and exact performance parity with manually fused versions. In contrast to previous work, it also requires minimal user intervention. Our approach is facilitated by a new loop fusion algorithm capable of interprocedurally fusing both skewed and unskewed loops in different kernels.

Guang Tan, Stephen A Jarvis, Anne-Marie Kermarrec (2009)Connectivity-guaranteed and obstacle-adaptive deployment schemes for mobile sensor networks8(6) IEEE

Mobile sensors can relocate and self-deploy into a network. While focusing on the problems of coverage, existing deployment schemes largely over-simplify the conditions for network connectivity: they either assume that the communication range is large enough for sensors in geometric neighborhoods to obtain location information through local communication, or they assume a dense network that remains connected. In addition, an obstacle-free field or full knowledge of the field layout is often assumed. We present new schemes that are not governed by these assumptions, and thus adapt to a wider range of application scenarios. The schemes are designed to maximize sensing coverage and also guarantee connectivity for a network with arbitrary sensor communication/sensing ranges or node densities, at the cost of a small moving distance. The schemes do not need any knowledge of the field layout, which can be irregular and have obstacles/holes of arbitrary shape. Our first scheme is an enhanced form of the traditional virtual-force-based method, which we term the Connectivity-Preserved Virtual Force (CPVF) scheme. We show that the localized communication, which is the very reason for its simplicity, results in poor coverage in certain cases. We then describe a Floor-based scheme which overcomes the difficulties of CPVF and, as a result, significantly outperforms it and other state-of-the-art approaches. Throughout the paper our conclusions are corroborated by the results from extensive simulations.

Wilson M Tan, Stephen A Jarvis (2016)Heuristic solutions to the target identifiability problem in directional sensor networks, In: Journal of Network and Computer Application65pp. 84-102 Academic Press

Existing algorithms for orienting sensors in directional sensor networks have primarily concerned themselves with the problem of maximizing the number of covered targets, assuming that target identification is a non-issue. Such an assumption however, does not hold true in all situations. In this paper, heuristic algorithms for choosing active sensors and orienting them with the goal of balancing coverage and identifiability are presented. The performance of the algorithms are verified via extensive simulations, and shown to confer increased target identifiability compared to algorithms originally designed to simply maximize the number of targets covered.

Dean G Chester, Steven A Wright, Stephen A Jarvis (2018)Understanding communication patterns in HPCG, In: Electronic Notes in Theoretical Computer Science340pp. 55-65 Elsevier Science BV

Conjugate Gradient (CG) algorithms form a large part of many HPC applications, examples include bioinformatics and weather applications. These algorithms allow numerical solutions to complex linear systems. Understanding how distributed implementations of these algorithms use a network interconnect will allow system designers to gain a deeper insight into their exacting requirements for existing and future applications. This short paper documents our initial investigation into the communication patterns present in the High Performance Conjugate Gradient (HPCG) benchmark. Through our analysis, we identify patterns and features which may warrant further investigation to improve the performance of CG algorithms and applications which make extensive use of them. In this paper, we capture communication traces from runs of the HPCG benchmark at a variety of different processor counts and then examine this data to identify potential performance bottlenecks. Initial results show that there is a fall in the throughput of the network when more processes are communicating with each other, due to network contention.

Lei Zhao, Stephen A. Jarvis (2007)Predictive performance modelling of parallel component compositions, In: Tertiary education and management10(2)pp. 155-166 SPRINGER

Large-scale scientific computing applications frequently make use of closely-coupled distributed parallel components. The performance of such applications is therefore dependent on the component parts and their interaction at run-time. This paper describes a methodology for predictive performance modelling and evaluation of parallel applications composed of multiple interacting components. In this paper, the fundamental steps and required operations involved in the modelling and evaluation process are identified - including component decomposition, component model combination, M x N communication modelling, dataflow analysis and overall performance evaluation. A case study is presented to illustrate the modelling process and the methodology is verified through experimental analysis.

Bo Gao, Ligang He, Stephen A Jarvis (2016)Offload decision models and the price of anarchy in mobile cloud application ecosystems, In: IEEE Access3pp. 3125-3137 IEEE

With the maturity of technologies, such as HTML5 and JavaScript, and with the increasing popularity of cross-platform frameworks, such as Apache Cordova, mobile cloud computing as a new design paradigm of mobile application developments is becoming increasingly more accessible to developers. Following this trend, future on-device mobile application ecosystems will not only comprise a mixture of native and remote applications, but also include multiple hybrid mobile cloud applications. The resource competition in such ecosystems and its impact over the performance of mobile cloud applications has not yet been studied. In this paper, we study this competition from a game theoretical perspective and examine how it affects the behavior of mobile cloud applications. Three offload decision models of cooperative and non-cooperative nature are constructed and their efficiency compared. We present an extension to the classic load balancing game to model the offload behaviors within a non-cooperative environment. Mixed-strategy Nash equilibria are derived for the non-cooperative offload game with complete information, which further quantifies the price of anarchy in such ecosystems. We present simulation results that demonstrate the differences between each decision model’s efficiency. Our modeling approach facilitates further research in the design of the offload decision engines of mobile cloud applications. Our extension to the classic load balancing game broadens its applicability to real-life applications.

Robert F Bird, Patrick Gillies, Michael R Bareford, Andy Herdman, Stephen A Jarvis (2018)Performance Optimisation of Inertial Confinement Fusion Codes using Mini-applications, In: The International Journal of High Performance Computing Applications32(4)pp. 570-581 SAGE Publications

Despite the recent successes of nuclear energy researchers, the scientific community still remains some distance from being able to create controlled, self-sustaining fusion reactions. Inertial Confinement Fusion (ICF) techniques represent one possible option to surpass this barrier, with scientific simulation playing a leading role in guiding and supporting their development. The simulation of such techniques allows for safe and efficient investigation of laser design and pulse shaping, as well as providing insight into the reaction as a whole. The research presented here focuses on the simulation code EPOCH, a fully relativistic particle-in-cell plasma physics code concerned with faithfully recreating laser-plasma interactions at scale. A significant challenge in developing large codes like EPOCH is maintaining effective scientific delivery on successive generations of high-performance computing architecture. To support this process, we adopt the use of mini-applications - small code proxies that encapsulate important computational properties of their larger parent counterparts. Through the development of a mini-application for EPOCH (called miniEPOCH), we investigate a variety of the performance features exhibited in EPOCH, expose opportunities for optimisation and increased scientific capability, and offer our conclusions to guide future changes to similar ICF codes.

Wilson M. Tan, Paul Sullivan, Hamish Watson, Joanna Slota-Newson, Stephen A. Jarvis (2017)An indoor test methodology for solar-powered wireless sensor networks, In: ACM transactions on embedded computing systems16(3)82pp. 1-25 ASSOC COMPUTING MACHINERY

Repeatable and accurate tests are important when designing hardware and algorithms for solar-powered wireless sensor networks (WSNs). Since no two days are exactly alike with regard to energy harvesting, tests must be carried out indoors. Solar simulators are traditionally used in replicating the effects of sunlight indoors; however, solar simulators are expensive, have lighting elements that have short lifetimes, and are usually not designed to carry out the types of tests that hardware and algorithm designers require. As a result, hardware and algorithm designers use tests that are inaccurate and not repeatable (both for others and also for the designers themselves). In this article, we propose an indoor test methodology that does not rely on solar simulators. The test methodology has its basis in astronomy and photovoltaic cell design. We present a generic design for a test apparatus that can be used in carrying out the test methodology. We also present a specific design that we use in implementing an actual test apparatus. We test the efficacy of our test apparatus and, to demonstrate the usefulness of the test methodology, perform experiments akin to those required in projects involving solar-powered WSNs. Results of the said tests and experiments demonstrate that the test methodology is an invaluable tool for hardware and algorithm designers working with solar-powered WSNs.

Steven A. Wright, Stephen A. Jarvis (2015)Quantifying the Effects of Contention on Parallel File Systems, In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop7284412pp. 932-940 IEEE

As we move towards the Exactable era of supercomputing, node-level failures are becoming more common-place, frequent check pointing is currently used to recover from such failures in long-running science applications. While compute performance has steadily improved year-on-year, parallel I/O performance has stalled, meaning check pointing is fast becoming a bottleneck to performance. Using current file systems in the most efficient way possible will alleviate some of these issues and will help prepare developers and system designers for Exactable, unfortunately, many domain-scientists simply submit their jobs with the default file system configuration. In this paper, we analyse previous work on finding optimality on Lustre file systems, demonstrating that by exposing parallelism in the parallel file system, performance can be improved by up to 49x. However, we demonstrate that on systems where many applications are competing for a finite number of object storage targets (OSTs), competing tasks may reduce optimal performance considerably. We show that reducing each job's request for OSTs by 40% decreases performance by only 13%, while increasing the availability and quality of service of the file system. Further, we present a series of metrics designed to analyse and explain the effects of contention on parallel file systems. Finally, we re-evaluate our previous work with the Parallel Log-structured File System (PLFS), comparing it to Lustre at various scales. We show that PLFS may perform better than Lustre in particular configurations, but that at large scale PLFS becomes a bottleneck to performance. We extend the metrics proposed in this paper to explain these performance deficiencies that exist in PLFS, demonstrating that the software creates high levels of self-contention at scale.

J. A. Herdman, W. P. Gaudin, S. McIntosh-Smith, M. Boulton, D. A. Beckingsale, A. C. Mallinson, S. A. Jarvis (2012)Accelerating hydrocodes with openACC, opeCL and CUDA, In: Proceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 20126495848pp. 465-471 IEEE

Hardware accelerators such as GPGPUs are becoming increasingly common in HPC platforms and their use is widely recognised as being one of the most promising approaches for reaching exascale levels of performance. Large HPC centres, such as AWE, have made huge investments in maintaining their existing scientific software codebases, the vast majority of which were not designed to effectively utilise accelerator devices. Consequently, HPC centres will have to decide how to develop their existing applications to take best advantage of future HPC system architectures. Given limited development and financial resources, it is unlikely that all potential approaches will be evaluated for each application. We are interested in how this decision making can be improved, and this work seeks to directly evaluate three candidate technologies-OpenACC, OpenCL and CUDA-in terms of performance, programmer productivity, and portability using a recently developed Lagrangian-Eulerian explicit hydrodynamics mini-application. We find that OpenACC is an extremely viable programming model for accelerator devices, improving programmer productivity and achieving better performance than OpenCL and CUDA.

Neha Gupta, Henry James Crosby, David Purser, Stephen A Jarvis, Weisi Guo (2018)Twitter usage across industry : a spatiotemporal analysis IEEE

High resolution social media data presents an opportunity to better understand people’s behavioural patterns and sentiment. Whilst significant work has been conducted in various targeted social contexts, very little is understood about differentiated behaviour in different industrial sectors. In this paper, we present results on how social media usage and general sentiment vary across the geographic and industry sector landscape. Unlike existing studies, we use a novel geocomputational approach to link location specific Twitter data with business sectors by leveraging the UK Standard Industrial Classification Code (SIC Code). Our baseline results for the Greater London area identifies Construction, Real Estate, Transport and Financial Services industries consistently have stronger Twitter footprints. We go on to apply natural language processing (NLP) techniques to understand the prevailing sentiment within each business sector and discuss how the evidence can contribute towards de-biasing Twitter data. We believe this research will prove a valuable surveillance tool for policy makers and service providers to monitor ongoing sentiment in different industry sectors, perceive the impact of new policies and can be used as a low cost alternative to survey methods in organisational studies.

Ligang He, James Wenjun Xue, Stephen A. Jarvis (2007)Partition-based profit optimisation for multi-class requests in clusters of servers, In: IEEE International Conference on e-Business Engineering ICEBE 2007 : proceedings : IEEE International Workshop on Service-Oriented Applications, Integration and Collaboration : SOAIC 2007 : IEEE International Workshop on Service-Oriented System Engineering4402084pp. 131-138 IEEE COMPUTER SOC

This paper investigates profit optimisation by the partitioning of server pools. Different types of web requests are considered in this paper: best-effort requests and multi-class QoS-demanding requests. Each request is associated with a certain amount of revenue. The system earns associated revenue when completing a best effort request, while the revenue earned for serving a QoS-demanding request depends on the degree of satisfaction. The complete server pool is divided into two clusters of servers, each dedicated to serving one type of request. The total profits accrued when serving these requests are mathematically modelled in this paper The optimisation equations for the pool partitioning have also been constructed so that the maximum profit can be achieved. An optimal server switching policy is also developed The server switching policy is optimal in the sense that the maximum profit can be maintained by si,;itching servers from one partition to the other Costs involved in server switching are factored into the switching model. Supportive experimentation based on simulation have been conducted and the results verify the effectiveness of the pool partitioning policy and the server switching policy developed in this paper.

S. A. Wright, S. J. Pennycook, S. A. Jarvis (2012)Towards the Automated Generation of Hard Disk Models through Physical Geometry Discovery, In: 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC)6495845pp. 437-445 IEEE

As the High Performance Computing industry moves towards the exascale era of computing, parallel scientific and engineering applications are becoming increasingly complex. The use of simulation allows us to predict how an application's performance will change with the adoption of new hardware or software, helping to inform procurement decisions. In this paper, we present a disk simulator designed to predict the performance of read and write operations to a single hard disk drive (HDD). Our simulator uses a geometry discovery benchmark (Diskovery) in order to estimate the data layout of the HDD, as well as the time spent moving the read/write head. We validate our simulator against two different HDDs, using a benchmark designed to simulate common disk read and write patterns, demonstrating accuracy to within 5% of the observed I/O time for sequential operations, and to within 10% of the observed time for seek-heavy workloads.

David A Beckingsale, O. F. J Perks, W. P Gaudin, J. A Herdman, Stephen A Jarvis (2012)Optimisation of patch distribution strategies for AMR applications

As core counts increase in the world’s most powerful supercomputers, applications are becoming limited not only by computational power, but also by data availability. In the race to exascale, efficient and effective communication policies are key to achieving optimal application performance. Applications using adaptive mesh refinement (AMR) trade off communication for computational load balancing, to enable the focused computation of specific areas of interest. This class of application is particularly susceptible to the communication performance of the underlying architectures, and are inherently difficult to scale efficiently. In this paper we present a study of the effect of patch distribution strategies on the scalability of an AMR code. We demonstrate the significance of patch placement on communication overheads, and by balancing the computation and communication costs of patches, we develop a scheme to optimise performance of a specific, industry-strength, benchmark application.

Stephen A. Jarvis, Sanjay Poria, Rick G. Morgan (2005)Experiences in teaching large scale functional programming, In: Pieter H. Hartel, Rinus Plasmeijer (eds.), Funtional Programming Languages in Educationpp. 103-119 Springer Berlin Heidelberg

LOLITA is a large scale natural processing system written in the functional language Haskell. It consists of over 47,000 lines of code written over 180 different modules. There are currently 20 people working on the system, most of whom are Ph.D. students. The majority of research projects involve the development of an application which is written around a semantic network; the knowledge representation structure at the core of the system. Because of the type of various applications, developers often join the team with little or no functional programming experience. For this reason the task of teaching these developers to the level required to implement their respective applications, requires teaching at various levels of abstraction. The strategy chosen means that each researcher only needs to be taught at the particular level of abstraction at which they work. These abstractions give rise to the notion of a domain specific sublanguage; that is a programming style in which a different language is created for each desired level of abstraction. In this paper we show how functional languages provide the necessary framework to enable these sublanguages to be created.

J Cao, Stephen A Jarvis, Daniel P Spooner, James D Turner, G. R Nudd (2001)Performance prediction technology for agent-based resource management in grid environments IEEE

Resource management constitutes an important infrastructural component of a computational grid environment. The aim of grid resource management is to efficiently schedule applications over the available resources provided by the supporting grid architecture. Such goals within the high performance community rely, in part, on accurate performance prediction capabilities. This paper introduces a resource management infrastructure for grid computing environments. The technique couples application performance prediction with a hierarchical multi-agent system. An initial system implementation utilises the performance prediction capabilities of the PACE toolkit to provide quantitative data regarding the performance of complex applications running on local grid resources. The validation results show that a high level of accuracy can be obtained, that cross-platform comparisons can be easily undertaken, and that the estimates can be evaluated rapidly. A hierarchy of homogeneous agents are used to provide a scalable and adaptable abstraction of the grid system architecture. An agent is a representative of a local grid resource and is considered to be both a service provider and a service requestor. Agents are organised into a hierarchy and cooperate to provide service advertisement and discovery. A performance monitor and advisor has been developed to optimise the performance of the agent system. A case study with corresponding experimental results are included to demonstrate the efficiency of the resource management and scheduling system. The main features of the system include: hard quality of service support using PACE performance prediction capabilities; agent-based dynamic resource advertisement and discovery capabilities; simulation-based quantitative grid performance analysis and user-oriented scheduling of local grid resources.

Andrew C Mallinson, David A Beckingsale, W. P Gaudin, J. A Herdman, Stephen A Jarvis (2013)Towards portable performance for explicit hydrodynamics codes

Significantly increasing intra-node parallelism is widely recognised as being a key prerequisite for reaching exascale levels of computational performance. In future exascale systems it is likely that this performance improvement will be realised by increasing the parallelism available in traditional CPU devices and using massively-parallel hardware accelerators. The MPI programming model is starting to reach its scalability limit and is unable to take advantage of hardware accelerators; consequently, HPC centres (such as AWE) will have to decide how to develop their existing applications to best take advantage of future HPC system architectures. This work seeks to evaluate OpenCL as a candidate technology for implementing an alternative hybrid programming model, and whether it is able to deliver improved code portability whilst also maintaining or improving performance. On certain platforms the performance of our OpenCL imple- mentation is within 4% of an optimised native version.

Simon J Pennycook, Gihan R Mudalige, Simon D Hammond, Stephen A Jarvis (2010)Parallelising wavefront applications on general-purpose GPU devices Performance Computing and Visualisation, Department of Computer Science, University of Warwick

Pipelined wavefront applications form a large portion of the high performance scientific computing workloads at supercomputing centres. This paper investigates the viability of graphics processing units (GPUs) for the acceleration of these codes, using NVIDIA's Compute Unified Device Architecture (CUDA). We identify the optimisations suitable for this new architecture and quantify the characteristics of those wavefront codes that are likely to experience speedups.

J. W. J Xue, Adam P Chester, Ligang He, Stephen A Jarvis (2009)Model-driven server allocation in distributed enterprise systems

Internet service providers (ISPs) usually use several server pools to host different web applications, to ensure smooth system management and minimum interference between applications. The workload demand in each of the pools can vary dramatically due to a number of factors, including timing and the types of the hosted applications. Therefore, it is desirable that servers should be able to switch between pools to optimise resource usage and maximise company revenue. Internet applications can be modelled as multi-tier queueing networks, with each network station corresponding to each application tier. The advantage of using an analytical model is that performance metrics can be easily computed, and potential system bottlenecks can be identified without running the actual system. In this paper, an analytical model is used to assist dynamic resource allocation in server pools. In addition, an admission control scheme is also used to deal with system overloading. Performance evaluation is conducted via simulation and the experimental results show the benefits of our approach for various workload scenarios.

Stephen Roberts, Steven Wright, David Lecomber, Christopher January, Jonathan Byrd, Xavier Oro, Stephen Jarvis (2015)POSE: A mathematical and visual modelling tool to guide energy aware code optimisation, In: 2015 Sixth International Green and Sustainable Computing Conference (IGSC)7393705pp. 1-8 IEEE

Performance engineers are beginning to explore software-level optimisation as a means to reduce the energy consumed when running their codes. This paper presents POSE, a mathematical and visual modelling tool which highlights the relationship between runtime and power consumption. POSE allows developers to assess whether power optimisation is worth pursuing for their codes. We demonstrate POSE by studying the power optimisation characteristics of applications from the Mantevo and Rodinia benchmark suites. We show that LavaMD has the most scope for CPU power optimisation, with improvements in Energy Delay Squared Product (ED2P) of up to 30.59%. Conversely, MiniMD offers the least scope, with improvements to the same metric limited to 7.60%. We also show that no power optimised version of MiniMD operating below 2.3 GHz can match the ED2P performance of the original code running at 3.2 GHz. For LavaMD this limit is marginally less restrictive at 2.2 GHz.

Richard O. Kirk, Gihan R. Mudalige, Istvan Z. Reguly, Steven A. Wright, Matt J. Martineau, Stephen A. Jarvis (2017)Achieving Performance Portability for a Heat Conduction Solver Mini-Application on Modern Multi-core Systems, In: Proceedings / IEEE International Conference on Cluster Computing2017-8049026pp. 834-841 IEEE

Modernizing production-grade, often legacy applications to take advantage of modern multi-core and many-core architectures can be a difficult and costly undertaking. This is especially true currently, as it is unclear which architectures will dominate future systems. The complexity of these codes can mean that parallelisation for a given architecture requires significant re-engineering. One way to assess the benefit of such an exercise would be to use mini-applications that are representative of the legacy programs. In this paper, we investigate different implementations of TeaLeaf, a mini-application from the Mantevo suite that solves the linear heat conduction equation. TeaLeaf has been ported to use many parallel programming models, including OpenMP, CUDA and MPI among others. It has also been re-engineered to use the OPS embedded DSL and template libraries Kokkos and RAJA. We use these different implementations to assess the performance portability of each technique on modern multi-core systems. While manually parallelising the application targeting and optimizing for each platform gives the best performance, this has the obvious disadvantage that it requires the creation of different versions for each and every platform of interest. Frameworks such as OPS, Kokkos and RAJA can produce executables of the program automatically that achieve comparable portability. Based on a recently developed performance portability metric, our results show that OPS and RAJA achieve an application performance portability score of 71% and 77% respectively for this application.

James Dickson, Steven Wright, Satheesh Maheswaran, Andy Herdman, Mark C. Miller, Stephen Jarvis (2016)Replicating HPC I/O Workloads With Proxy Applications, In: PROCEEDINGS OF PDSW-DISCS 2016 - 1ST JOINT INTERNATIONAL WORKSHOP ON PARALLEL DATA STORAGE AND DATA INTENSIVE SCALABLE COMPUTING SYSTEMSpp. 13-18 IEEE

Large scale simulation performance is dependent on a number of components, however the task of investigation and optimization has long favored computational and communication elements above I/O. Manually extracting the pattern of I/O behavior from a parent application is a useful way of working to address performance issues on a per-application basis, but developing workflows with some degree of automation and flexibility provides a more powerful approach to tackling current and future I/O challenges. In this paper we describe a workload replication workflow that extracts the I/O pattern of an application and recreates its behavior with a flexible proxy application. We demonstrate how simple lightweight characterization can be translated to provide an effective representation of a physics application, and show how a proxy replication can be used as a tool for investigating I/O library paradigms.

Lei Zhao, Stephen A Jarvis, Daniel P Spooner, G. R Nudd (2005)On predictive performance modelling of parallel component composition
David A. Bacigalupo, Jano van Hemert, Xiaoyu Chen, Asif Usmani, Adam P. Chester, Ligang He, Donna N. Dillenberger, Gary B. Wills, Lester Gilbert, Stephen A. Jarvis (2011)Managing dynamic enterprise and urgent workloads on clouds using layered queuing and historical performance models, In: Simulation modelling practice and theory19(6)pp. 1479-1495 Elsevier B.V

The automatic allocation of enterprise workload to resources can be enhanced by being able to make what–if response time predictions whilst different allocations are being considered. We experimentally investigate an historical and a layered queuing performance model and show how they can provide a good level of support for a dynamic-urgent cloud environment. Using this we define, implement and experimentally investigate the effectiveness of a prediction-based cloud workload and resource management algorithm. Based on these experimental analyses we: (i) comparatively evaluate the layered queuing and historical techniques; (ii) evaluate the effectiveness of the management algorithm in different operating scenarios; and (iii) provide guidance on using prediction-based workload and resource management.

O. F. J Perks, Simon D Hammond, Simon J Pennycook, Stephen A Jarvis (2011)WMTrace : a lightweight memory allocation tracker and analysis framework

The diverging gap between processor and memory performance has been a well discussed aspect of computer architecture literature for some years. The use of multi-core processor designs has, however, brought new problems to the design of memory architectures - increased core density without matched improvement in memory capacity is reduc- ing the available memory per parallel process. Multiple cores accessing memory simultaneously degrades performance as a result of resource con- tention for memory channels and physical DIMMs. These issues combine to ensure that memory remains an on-going challenge in the design of parallel algorithms which scale. In this paper we present WMTrace, a lightweight tool to trace and analyse memory allocation events in parallel applications. This tool is able to dynamically link to pre-existing application binaries requiring no source code modification or recompilation. A post-execution analysis stage enables in-depth analysis of traces to be performed allowing memory allocations to be analysed by time, size or function. The second half of this paper features a case study in which we apply WMTrace to five parallel scientific applications and benchmarks, demonstrating its effectiveness at recording high-water mark memory consumption as well as memory use per-function over time. An in-depth analysis is provided for an unstructured mesh benchmark which reveals significant memory allocation imbalance across its participating processes.

Hélène Niuklan Lim Choi Keung, J Cao, Daniel P Spooner, Stephen A Jarvis, G. R Nudd (2002)Grid information services using software agents

Computational Grids allow large-scale, pervasive and consistent sharing of geographically dispersed resources. Their inherent nature incorporates issues including the discovery of resources located in different administrative domains, predicting the performance of those resources and monitoring their behaviour. The Monitoring and Discovery Service (MDS), one of the pillars provided by the Globus toolkit, can be used to offer Grid information servies to an existing agent-based resource advertisement and discovery system. This paper presents an agent system which implements the GRid Information Protocol (GRIP) and the GRid Registration Protocol (GRRP) of the MDS to discover virtual organisations and monitor their respective resources. The result system has the effect of resource brokering, monitoring and performance prediction.

Henry Crosby, Theodoros Damoulas, Stephen A. Jarvis (2020)Road and travel time cross-validation for urban modelling, In: International journal of geographical information science : IJGIS34(1)pp. 98-118 Taylor & Francis

The physical and social processes in urban systems are inherently spatial and hence data describing them contain spatial autocorrelation (a proximity-based interdependency on a variable) that need to be accounted for. Standard k-fold cross-validation (KCV) techniques that attempt to measure the generalisation performance of machine learning and statistical algorithms are inappropriate in this setting due to their inherent i.i.d assumption, which is violated by spatial dependency. As such, more appropriate validation methods have been considered, notably blocking and spatial k-fold cross-validation (SKCV). However, the physical barriers and complex network structures which make up a city's landscape mean that these methods are also inappropriate, largely because the travel patterns (and hence Spatial Autocorrelation (SAC)) in most urban spaces are rarely Euclidean in nature. To overcome this problem, we propose a new road distance and travel time k-fold cross-validation method, RT-KCV. We show how this outperforms the prior art in providing better estimates of the true generalisation performance to unseen data.

James D Turner, Daniel P Spooner, J Cao, Stephen A Jarvis, D. N Dillenberger, G. R Nudd (2001)A transaction definition language for Java application response measurement, In: Journal of Computer Resource Measurementpp. 55-65
Simon J Pennycook, Stephen A Jarvis (2012)Developing performance-portable molecular dynamics kernels in Open CL IEEE

This paper investigates the development of a molecular dynamics code that is highly portable between architectures. Using OpenCL, we develop an implementation of Sandia’s miniMD benchmark that achieves good levels of performance across a wide range of hardware: CPUs, discrete GPUs and integrated GPUs. We demonstrate that the performance bottlenecks of miniMD’s short-range force calculation kernel are the same across these architectures, and detail a number of platform- agnostic optimisations that improve its performance by at least 2x on all hardware considered. Our complete code is shown to be 1.7x faster than the original miniMD, and at most 2x slower than implementations individually hand-tuned for a specific architecture.

Richard O. Kirk, Martin Nolten, Robert Kevis, Timothy R. Law, Satheesh Maheswaran, Steven A. Wright, Seimon Powell, Gihan R. Mudalige, Stephen A. Jarvis (2020)Warwick Data Store: A Data Structure Abstraction Library, In: 2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)9307883pp. 71-85 IEEE

With the increasing complexity of memory architectures and scientific applications, developing data structures that are performant, portable, scalable, and support developer productivity, is a challenging task. In this paper, we present Warwick Data Store (WDS), a lightweight and extensible C++ template library designed to manage these complexities and allow rapid prototyping. WDS is designed to abstract details of the underlying data structures away from the user, thus easing application development and optimisation. We show that using WDS does not significantly impact achieved performance across a variety of different scientific benchmarks and proxy-applications, compilers, and different architectures. The overheads are largely below 30% for smaller problems, with the overhead deceasing to below 10% when using larger problems. This shows that the library does not significantly impact the performance, while providing additional functionality to data structures, and the ability to optimise data structures without changing the application code.

R. F. Bird, S. A. Wright, D. A. Beckingsale, S. A. Jarvis (2013)Performance Modelling of Magnetohydrodynamics Codes, In: Mirco Tribastone, Stephen Gilmore (eds.), Computer Performance Engineeringpp. 197-209 Springer Berlin Heidelberg

Performance modelling is an important tool utilised by the High Performance Computing industry to accurately predict the run-time of science applications on a variety of different architectures. Performance models aid in procurement decisions and help to highlight areas for possible code optimisations. This paper presents a performance model for a magnetohydrodynamics physics application, Lare. We demonstrate that this model is capable of accurately predicting the run-time of Lare across multiple platforms with an accuracy of 90% (for both strong and weak scaled problems). We then utilise this model to evaluate the performance of future optimisations. The model is generated using SST/macro, the machine level component of the Structural Simulation Toolkit (SST) from Sandia National Laboratories, and is validated on both a commodity cluster located at the University of Warwick and a large scale capability resource located at Lawrence Livermore National Laboratory.

J. A Smith, Simon D Hammond, Stephen A Jarvis (2011)The effect of an application performance modelling tool

One of the most important metrics of machine efficiency in HPC is job turnaround time, which is the time taken for a user to submit a job and recieve their results. This time consists of two primary components; run-time, which depends on the resources allocated to the job, and queue wait time, which is dependent on the resources requested and the present level of machine usage. This paper investigates the effect of applying application performance modelling techniques to producing run-time estimates for jobs to be scheduled on parallel High Performance Computing systems. Our aim through the development of such tools is to improve turnaround time for jobs across the system. This investigation is performed from the perspective of a community of HPC users who would make use of a tool to assist in job submission. We implement and validate a higher performance implementation of the scheduling algorithm used by the Maui scheduler and demonstrate that it matches the behaviour of an existing Maui configuration. We formulate a method for generating potential performance model results and use it to to modify three workloads from production supercomputers to include generated performance model wall-time estimates. We then apply the scheduling simulator to the workloads in order to simulate the effect of using of such a tool for the three real workloads. Examining the results from the simulation, we show a randomly selected sample set of tool users obtaining and improvement of upto 23% in average queuing time, and conclude that a tool that uses performance models to generate improved wall-time estimates would be beneficial to users of HPC systems.

Zhenyu Li, James Davis, Stephen Jarvis (2017)An Efficient Task-based All-Reduce for Machine Learning Applications, In: Proceedings of the Machine Learning on HPC Environmentsa2pp. 1-8 ACM

All-Reduce is a collective-combine operation frequently utilised in synchronous parameter updates in parallel machine learning algorithms. The performance of this operation - and subsequently of the algorithm itself - is heavily dependent on its implementation, configuration and on the supporting hardware on which it is run. Given the pivotal role of all-reduce, a failure in any of these regards will significantly impact the resulting scientific output. In this research we explore the performance of alternative all-reduce algorithms in data-flow graphs and compare these to the commonly used reduce-broadcast approach. We present an architecture and interface for all-reduce in task-based frameworks, and a parallelization scheme for object-serialization and computation. We present a concrete, novel application of a butterfly all-reduce algorithm on the Apache Spark framework on a high-performance compute cluster, and demonstrate the effectiveness of the new butterfly algorithm with a logarithmic speed-up with respect to the vector length compared with the original reduce-broadcast method - a 9x speed-up is observed for vector lengths in the order of 108. This improvement is comprised of both algorithmic changes (65%) and parallel-processing optimization (35%). The effectiveness of the new butterfly all-reduce is demonstrated using real-world neural network applications with the Spark framework. For the model-update operation we observe significant speed-ups using the new butterfly algorithm compared with the original reduce-broadcast, for both smaller (Cifar and Mnist) and larger (ImageNet) datasets.

In this paper we define a Distributed Arbitrary Segment Tree (DAST), a distributed tree-like structure that layers the range query processing mechanism over public Distributed Hash Table (DHT) services. Compared with traditional segment trees, the arbitrary segment tree used by a DAST reduces the number of key-space segments that need to be maintained, which in turn results in fewer query operations and lower overheads. Moreover, considering that range queries often contain redundant entries that the clients do not need, we introduce the concept of Accuracy of Results (AoR) for range queries. We demonstrate that by adjusting AoR, the DHT operational overhead can be improved. DAST is implemented on a well-known public DHT service (OpenDHT) and validation through experimentation and supporting simulation is performed. The results demonstrate the effectiveness of DAST over exiting methods.

Stephen A Jarvis (2003)Special Issue on Performance Engineering, In: IEE proceedings. Software150(4)pp. 213-264 Institution of Electrical Engineers
Stephen A Jarvis, Jason S Mirsky, John F Peden, Nigel J Saunders (2000)Identification of horizontally acquired DNA using genome signature analysis University of Warwick. Department of Computer Science

Identification of regions with untypical percentage G+C composition and dinucleotide signatures are two genome analysis techniques used in the identification of horizontally acquired DNA. We describe a generic program framework for performing both types of analysis in linear time. The approach is extended for length >2 oligonucleotide signatures. Using the derived program we test some of the conclusions of Karlin and Burge, the primary exponents of these techniques. We demonstrate that no single method of signature analysis is sufficient for the complete identification of horizontally acquired DNA. Consequently we produce a fast program - and a robust methodology - for the production and employment genome signatures in the identification of horizontally acquired DNA.

D. A. Beckingsale, O. F. J. Perks, W. P. Gaudin, J. A. Herdman, S. A. Jarvis (2013)Optimisation of Patch Distribution Strategies for AMR Applications, In: Mirco Tribastone, Stephen Gilmore (eds.), Computer Performance Engineeringpp. 210-223 Springer Berlin Heidelberg

As core counts increase in the world’s most powerful supercomputers, applications are becoming limited not only by computational power, but also by data availability. In the race to exascale, efficient and effective communication policies are key to achieving optimal application performance. Applications using adaptive mesh refinement (AMR) trade off communication for computational load balancing, to enable the focused computation of specific areas of interest. This class of application is particularly susceptible to the communication performance of the underlying architectures, and are inherently difficult to scale efficiently. In this paper we present a study of the effect of patch distribution strategies on the scalability of an AMR code. We demonstrate the significance of patch placement on communication overheads, and by balancing the computation and communication costs of patches, we develop a scheme to optimise performance of a specific, industry-strength, benchmark application.

G. R. Nudd, S. A. Jarvis (2005)Performance-based middleware for Grid computing, In: Concurrency and computation17(2-4)pp. 215-234 John Wiley & Sons, Ltd
David A Bacigalupo, Jano van Hemert, Asif Usmani, Donna N Dillenberger, Gary B Wills, Stephen A Jarvis (2010)Resource management of enterprise cloud systems using layered queuing and historical performance models, In: 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum5470782pp. 1-8 IEEE

The automatic allocation of enterprise workload to resources can be enhanced by being able to make `what-if' response time predictions, whilst different allocations are being considered. It is important to quantitatively compare the effectiveness of different prediction techniques for use in cloud infrastructures. To help make the comparison of relevance to a wide range of possible cloud environments it is useful to consider the following. 1.) urgent cloud customers such as the emergency services that can demand cloud resources at short notice (e.g. for our FireGrid emergency response software). 2.) dynamic enterprise systems, that must rapidly adapt to frequent changes in workload, system configuration and/or available cloud servers. 3.) The use of the predictions in a coordinated manner by both the cloud infrastructure and cloud customer management systems. 4.) A broad range of criteria for evaluating each technique. However, there have been no previous comparisons meeting these requirements. This paper, meeting the above requirements, quantitatively compares the layered queuing and (¿HYDRA¿) historical techniques - including our initial thoughts on how they could be combined. Supporting results and experiments include the following: i.) defining, investigating and hence providing guidelines on the use of a historical and layered queuing model; ii.) using these guidelines showing that both techniques can make low overhead and typically over 70% accurate predictions, for new server architectures for which only a small number of benchmarks have been run; and iii.) defining and investigating tuning a prediction-based cloud workload and resource management algorithm.

J. A Smith, Simon D Hammond, Gihan R Mudalige, J. A Davis, A. B Mills, Stephen A Jarvis (2009)hpsgprof : a new profiling tool for large-scale parallel scientific codes University of Leeds. School of Computing

Contemporary High Performance Computing (HPC) applications can exhibit unacceptably high overheads when existing instrumentation–based performance analysis tools are applied. Our experience shows that for some sections of these codes, existing instrumentation–based tools can cause, on average, a fivefold increase in runtime. Our experience has been that, in a performance modelling context, these less representative runs can misdirect the modelling process. We present an approach to recording call paths for optimised HPC application binaries, without the need for instrumentation. A a result, a new tool has been developed which complements our work on analytical– and simulation–based performance modelling. The utility of this approach, in terms of low and consistent runtime overhead, is demonstrated by a comparative evaluation against existing tools for a range of recognised HPC benchmark codes.

Robert F Bird, Steven A Wright, David A Beckingsale, Stephen A Jarvis (2012)Performance modelling of magnetohydrodynamics codes

Performance modelling is an important tool utilised by the High Performance Computing industry to accurately predict the run-time of science applications on a variety of different architectures. Performance models aid in procurement decisions and help to highlight areas for possible code optimisations. This paper presents a performance model for a magnetohydrodynamics physics application, Lare. We demonstrate that this model is capable of accurately predicting the run-time of Lare across multiple platforms with an accuracy of 90% (for both strong and weak scaled problems). We then utilise this model to evaluate the performance of future optimisations. The model is generated using SST/macro, the machine level component of the Structural Simulation Toolkit (SST) from Sandia National Laboratories, and is validated on both a commodity cluster located at the University of Warwick and a large scale capability resource located at Lawrence Livermore National Laboratory.

James Dickson, Steven A Wright, Satheesh Maheswaran, J. A Herdman, Duncan Harris, Mark C Miller, Stephen A Jarvis (2017)Enabling portable I/O analysis of commercially sensitive HPC applications through workload replication Cray User Group

Benchmarking and analyzing I/O performance across high performance computing (HPC) platforms is necessary to identify performance bottlenecks and guide effective use of new and existing storage systems. Doing this with large production applications, which can often be commercially sensitive and lack portability, is not a straightforward task and the availability of a representative proxy for I/O workloads can help to provide a solution. We use Darshan I/O characterization and the MACSio proxy application to replicate five production workloads, showing how these can be used effectively to investigate I/O performance when migrating between HPC systems ranging from small local clusters to leadership scale machines. Preliminary results indicate that it is possible to generate datasets that match the target application with a good degree of accuracy. This enables a predictive performance analysis study of a representative workload to be conducted on five different systems. The results of this analysis are used to identify how workloads exhibit different I/O footprints on a file system and what effect file system configuration can have on performance.

Limin Liu, Ligang He, Stephen A. Jarvis (2012)Performance Analysis for Workflow Management Systems under Role-Based Authorization Control, In: Ruixuan Li, Jiannong Cao, Julien Bourgeois (eds.), Advances in Grid and Pervasive Computingpp. 323-337 Springer Berlin Heidelberg

Role-Based Access Control (RBAC) remains one of the most popular authorization control mechanisms. Workflow is a business flow composed of several related tasks. These tasks are interrelated and context-dependent during their execution. Under many circumstances execution context introduces uncertainty in authorization decisions for tasks. This paper investigates the role-based authorization model with the runtime context constraints and dynamic cardinality constraints. The Generalized Stochastic Petri-net is used to model the authorization process. Moreover, due to the state explosion problem in the Petri-net formalism, the proposed modeling method combines the Queuing theory to analyze both system-oriented and user-oriented performance. Given the workflow information, its running context and the authorization policies, this work can be used to predict the performance of these workflows running in the system. The prediction information can give insight in how to adjust authorization policies to strike a better balance between security and performance.

S. J. Pennycook, S. A. Jarvis (2012)Developing Performance-Portable Molecular Dynamics Kernels in OpenCL, In: 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC)6495840pp. 386-395 IEEE

This paper investigates the development of a molecular dynamics code that is highly portable between architectures. Using OpenCL, we develop an implementation of Sandia's miniMD benchmark that achieves good levels of performance across a wide range of hardware: CPUs, discrete GPUs and integrated GPUs. We demonstrate that the performance bottlenecks of miniMD's short-range force calculation kernel are the same across these architectures, and detail a number of platform-agnostic optimisations that improve its performance by at least 2x on all hardware considered. Our complete code is shown to be 1.7x faster than the original miniMD, and at most 2x slower than implementations individually hand-tuned for a specific architecture.

Ligang He, Stephen A Jarvis, David A Bacigalupo, Daniel P Spooner, G. R Nudd (2005)Performance-aware load balancing for multiclusters, In: J Cao, L. T Yang, M Guo, F Lau (eds.), Parallel and Distributed Processing and Applicationspp. 635-647 Springer Berlin Heidelberg

In a multicluster architecture, where jobs can be submitted through each constituent cluster, the job arrival rates in individual clusters may he uneven and the load therefore needs to be balanced among clusters. In this paper we investigate load balancing for two types of jobs, namely non-QoS and QoS-demanding jobs and as a result, two performance-specific load balancing strategies (called ORT and OMR) are developed. The ORT strategy is used to obtain the optimised mean response time for non-QoS jobs and the OMR strategy is used to achieve the optimised mean miss rate for QoS-demanding jobs. The ORT and OMR strategies are mathematically modelled combining queuing network theory to establish sets of optimisation equations. Numerical solutions are developed to solve these optimisation equations. and a so called fair workload level is determined for each cluster. When the current workload in a cluster reaches this pre-calculated fair workload level, the jobs subsequently submitted to the cluster are transferred to other clusters for execution. The effectiveness of both strategies is demonstrated through theoretical analysis and experimental verification. The results show that the proposed load balancing mechanisms bring about considerable performance gains for both job types, while the job transfer frequency among clusters is considerably reduced. This has a number of advantages, in particular in the case where scheduling jobs to remote resources involves the transfer of large executable and data files.

Daniel P Spooner, Stephen A Jarvis, J Cao, G. R Nudd, S Saini, D. J Kerbyson (2002)Performance-based workload management for grid computing

Imbalanced workload-distribution can significantly degrade performance of grid computing environments. In the past, the theory of divisible load has been widely investigated in static heterogeneous systems. However, it has not been widely applied to grid environments, which are characterized by heterogeneous resources and dynamic environments. In this paper, we propose a performance-based approach to workload distribution for master-slave types of applications on grids. Furthermore, applications with irregular workloads are addressed. We implemented three kinds of applications and conducted experimentations on our grid test-beds. Experimental results show that this approach performs more efficiently than conventional schemes. Consequently, we claim that dynamic workload distribution can benefit applications on grid environments.

D. P Spooner, S. A Jarvis, J Cao, S Saini, G. R Nudd (2003)Local grid scheduling techniques using performance prediction, In: IEE proceedings. Computers and digital techniques150(2)pp. 87-96 Institution of Electrical Engineers
Richard A. Bunt, Simon J. Pennycook, Stephen A. Jarvis, B. Leigh Lapworth, Y. K. Ho (2013)Model-Led Optimisation of a Geometric Multigrid Application, In: 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing6831991pp. 742-753 IEEE

This paper details the construction of an analytical performance model of HYDRA, a production nonlinear multigrid solver used by Rolls-Royce for computational fluid dynamics simulations. The model captures both the computational behaviour of HYDRA's key subroutines and the behaviour of its proprietary communication library, OPlus, with an absolute error consistently under 16% on up to 384 cores of an Intel X5650-based commodity cluster. We demonstrate how a performance model can be used to highlight performance bottlenecks and unexpected communication behaviours, thereby guiding code optimisation efforts. Informed by model predictions, we implement an optimisation in OPlus that decreases the communication and synchronisation time by up to 3.01 times and consequently improves total application performance by 1.41 times.

Konstantin Klemmer, Godwin Yeboah, João Porto de Albuquerque, Stephen A Jarvis Population Mapping in Informal Settlements with High-Resolution Satellite Imagery and Equitable Ground-Truth

We propose a generalizable framework for the population estimation of dense, informal settlements in low-income urban areas–so called 'slums'–using high-resolution satellite imagery. Precise population estimates are a crucial factor for efficient resource allocations by government authorities and NGO's, for instance in medical emergencies. We utilize equitable ground-truth data, which is gathered in collaboration with local communities: Through training and community mapping, the local population contributes their unique domain knowledge, while also maintaining agency over their data. This practice allows us to avoid carrying forward potential biases into the modeling pipeline, which might arise from a less rigorous ground-truthing approach. We contextualize our approach in respect to the ongoing discussion within the machine learning community, aiming to make real-world machine learning applications more inclusive, fair and accountable. Because of the resource intensive ground-truth generation process, our training data is limited. We propose a gridded population estimation model, enabling flexible and customizable spatial resolutions. We test our pipeline on three experimental site in Nigeria, utilizing pre-trained and fine-tune vision networks to overcome data sparsity. Our findings highlight the difficulties of transferring common benchmark models to real-world tasks. We discuss this and propose steps forward.

Ligang He, Stephen A. Jarvis, Daniel P. Spooner, Graham R. Nudd (2003)Performance-Based Dynamic Scheduling of Hybrid Real-Time Applications on a Cluster of Heterogeneous Workstations, In: Harald Kosch, László Böszörményi, Hermann Hellwagner (eds.), Euro-Par 2003 Parallel Processingpp. 195-200 Springer Berlin Heidelberg

It is assumed in this paper that periodic real-time applications are being run on a cluster of heterogeneous workstations, and new non-periodic real-time applications arrive at the system dynamically. In the dynamic scheduling scheme presented in this paper, the new applications are scheduled in such a way that they utilize spare capabilities left by existing periodic applications in the cluster. An admission control is introduced so that new applications are rejected by the system if their deadlines cannot be met. The effectiveness of the proposed scheduling scheme has been evaluated using simulation; experimental results show that the system utilization is significantly improved.

D. A. Beckingsale, W. P. Gaudin, R. D. Hornung, B. T. Gunney, T. Gamblin, J. A. Herdman, S. A. Jarvis, (2014)Parallel Block Structured Adaptive Mesh Refinement on Graphics Processing Units

Block-structured adaptive mesh refinement is a technique that can be used when solving partial differential equations to reduce the number of zones necessary to achieve the required accuracy in areas of interest. These areas (shock fronts, material interfaces, etc.) are recursively covered with finer mesh patches that are grouped into a hierarchy of refinement levels. Despite the potential for large savings in computational requirements and memory usage without a corresponding reduction in accuracy, AMR adds overhead in managing the mesh hierarchy, adding complex communication and data movement requirements to a simulation. In this paper, we describe the design and implementation of a native GPU-based AMR library, including: the classes used to manage data on a mesh patch, the routines used for transferring data between GPUs on different nodes, and the data-parallel operators developed to coarsen and refine mesh data. We validate the performance and accuracy of our implementation using three test problems and two architectures: an eight-node cluster, and over four thousand nodes of Oak Ridge National Laboratory’s Titan supercomputer. Our GPU-based AMR hydrodynamics code performs up to 4.87× faster than the CPU-based implementation, and has been scaled to over four thousand GPUs using a combination of MPI and CUDA.

katkey: 1723269_290 recordID: HT020176416 recordOrigin: ZDB-15-ACM gndPER: /Jarvis, Stephen gndKOE: 1222-1/ cdFull: http://dl.acm.org/citation.cfm?id=2088457

Stephen Jarvis, Steven Wright, Simon Hammond (2018)High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation Springer International Publishing

This book constitutes the refereed proceedings papers from the 8th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems, PMBS 2017, held in Denver, Colorado, USA, in November 2017. The 10 full papers and 3 short papers included in this volume were carefully reviewed and selected from 36 submissions. They were organized in topical sections named: performance evaluation and analysis; performance modeling and simulation; and short papers. .

P. Coetzee, S.A. Jarvis (2017)Goal-based composition of scalable hybrid analytics for heterogeneous architectures, In: Journal of parallel and distributed computing108pp. 59-73 Elsevier Inc

Crafting scalable analytics in order to extract actionable business intelligence is a challenging endeavour, requiring multiple layers of expertise and experience. Often, this expertise is irreconcilably split between an organisation’s engineers and subject matter domain experts. Previous approaches to this problem have relied on technically adept users with tool-specific training. Such an approach has a number of challenges: Expertise — There are few data-analytic subject domain experts with in-depth technical knowledge of compute architectures; Performance — Analysts do not generally make full use of the performance and scalability capabilities of the underlying architectures; Heterogeneity — calculating the most performant and scalable mix of real-time (on-line) and batch (off-line) analytics in a problem domain is difficult; Tools — Supporting frameworks will often direct several tasks, including, composition, planning, code generation, validation, performance tuning and analysis, but do not typically provide end-to-end solutions embedding all of these activities. In this paper, we present a novel semi-automated approach to the composition, planning, code generation and performance tuning of scalable hybrid analytics, using a semantically rich type system which requires little programming expertise from the user. This approach is the first of its kind to permit domain experts with little or no technical expertise to assemble complex and scalable analytics, for hybrid on- and off-line analytic environments, with no additional requirement for low-level engineering support. This paper describes (i) an abstract model of analytic assembly and execution, (ii) goal-based planning and (iii) code generation for hybrid on- and off-line analytics. An implementation, through a system which we call Mendeleev, is used to (iv) demonstrate the applicability of this technique through a series of case studies, where a single interface is used to create analytics that can be run simultaneously over on- and off-line environments. Finally, we (v) analyse the performance of the planner, and (vi) show that the performance of Mendeleev’s generated code is comparable with that of hand-written analytics. •A new abstract model of assembly and execution for arbitrary analytics, centred around a semantically rich type system.•Goal-based planning of hybrid analytic applications using this abstract model, requiring little programming ability from the user.•Automatic code generation across scalable compute architectures, integrating heterogeneous on- and off-line runtime environments.•Validation of the planning approach through its application to four case studies in telecommunications and image analysis, including an exploration of the performance and scalability of the planning engine for each of these case studies.•A demonstration of comparable performance with equivalent hand-written alternatives in both on- and off-line runtime environments.

Dean G. Chester, Taylor Groves, Simond D. Hammond, Tim Law, Steven A. Wright, Richard Smedley-Stevenson, Suhaib A. Fahmy, Gihan R. Mudalidge, Stephen A. Jarvis (2021)StressBench: A Configurable Full System Network and I/O Benchmark Framework, In: IEEE Conference on High Performance Extreme Computing (Online)2021-pp. 1-8 IEEE

We present StressBench, a network benchmarking framework written for testing MPI operations and file I/O concurrently. It is designed specifically to execute MPI communication and file access patterns that are representative of real-world scientific applications. Existing tools consider either the worst case congestion with small abstract patterns or peak performance with simplistic patterns. StressBench allows for a richer study of congestion by allowing orchestration of network load scenarios that are representative of those typically seen at HPC centres, something that is difficult to achieve with existing tools. We demonstrate the versatility of the framework from micro benchmarks through to finely controlled congested runs across a cluster. Validation of the results using four proxy application communication schemes within StressBench against parent applications shows a maximum difference of 15%. Using the I/O modeling capabilities of StressBench, we are able to quantify the impact of file I/O on application traffic showing how it can be used in procurement and performance studies.

Junwei Cao, Daniel P. Spooner, Stephen A. Jarvis, Graham R. Nudd (2005)Grid load balancing using intelligent agents, In: Future generation computer systems21(1)pp. 135-149 Elsevier B.V

Scalable management and scheduling of dynamic grid resources requires new technologies to build the next generation intelligent grid environments. This work demonstrates that AI techniques can be utilised to achieve effective workload and resource management. A combination of intelligent agents and multi-agent approaches is applied to both local grid resource scheduling and global grid load balancing. Each agent is a representative of a local grid resource and utilises predictive application performance data with iterative heuristic algorithms to engineer local load balancing across multiple hosts. At a higher level, agents cooperate with each other to balance workload using a peer-to-peer service advertisement and discovery mechanism.

This volume contains the 9 full papers presented at the 6th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems (PMBS 2015), held as part of the 27th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2015) at the Austin Convention Center in Austin, Texas on 15-20 November 2015. The SC conference series is the premier international forum for high-performance computing, networking, storage and analysis. The conference is unique in that it hosts a wide range of international participants from academia, national laboratories and industry, and featured over 350 exhibitors in the industry's largest HPC technology fair. This year's conference was themed HPC transforms, recognising the immense impact that highperformance computing has on our lives. Specifically SC 2015 focussed not only on the very visible way in which HPC is changing the world around us, but also on how HPC is improving every aspect of our lives in the most unexpected ways.

James D Turner, David A Bacigalupo, Stephen A Jarvis, G. R Nudd (2002)Using a transaction definition language for the automated ARMing of web services

Web service technology will provide a platform for next generation dynamic e-business applications. This paper describes a framework for identifying, monitoring and reporting performance data of critical transactions within a web service using the Java ARM standard, a Transaction Definition Language (TDL) and an Automated Instrumentation technique. Gourmet2Go, a demonstrator provided with the IBM web services toolkit, is used as a case study.

This paper describes a post-mortem call-graph profiling tool that analyses trace information generated during the execution of BSPlib programs. The purpose of the tool is to expose imbalance in either computation or communication, and to highlight portions of code that are amenable to improvement. One of the major benefits of this tool is that the amount of information displayed when visualising a profile for a parallel program is no more complex than that of a sequential program. The simplicity and practical relevance of the tool is demonstrated by analysing an SQL database query evaluation program. The tool is used to guide a series of optimisations that minimise the communication imbalance in the SQL program; this results in significant improvements in the performance of the parallel algorithms used in the program. Unlike other profiling tools, the architecture independent metric of size of data communicated is used to guide the optimisation process.

O. F. J Perks, Simon D Hammond, Simon J Pennycook, Stephen A Jarvis (2010)Should we worry about memory loss?

In recent years the High Performance Computing (HPC) industry has benefited from the development of higher density multi-core processors. With recent chips capable of executing up to 32 tasks in parallel, this rate of growth also shows no sign of slowing. Alongside the development of denser micro-processors has been the considerably more modest rate of improvement in random access memory (RAM) capacities. The effect has been that the available memory-per-core has reduced and current projections suggest that this is still set to reduce further. In this paper we present three studies into the use and measurement of memory in parallel applications; our aim is to capture, understand and, if possible, reduce the memory-per-core needed by complete, multi-component applications. First, we present benchmarked memory usage and runtimes of a six scientific benchmarks, which represent algorithms that are common to a host of production-grade codes. Memory usage of each benchmark is measured and reported for a variety of compiler toolkits, and we show greater than 30% variation in memory high-water mark requirements between compilers. Second, we utilise this benchmark data combined with runtime data, to simulate, via the Maui scheduler simulator, the effect on a multi-science workflow if memory-per-core is reduced from 1.5GB-per-core to only 256MB. Finally, we present initial results from a new memory profiling tool currently in development at the University of Warwick. This tool is applied to a finite-element benchmark and is able to map high-water-mark memory allocations to individual program functions. This demonstrates a lightweight and accurate method of identifying potential memory problems, a technique we expect to become commonplace as memory capacities decrease.

R. G Morgan, Stephen A Jarvis (1995)Profiling large-scale lazy functional programs

This paper presents a unique study into how to identify a meso-level normative (i.e., institutional) hierarchy of procedures that aim to deliver the ecological status of waterbodies in the UK. Using traditional survey and workshop methods, the majority of recent studies concentrate on engagement practices between macro- (government bodies) and micro- (local residents) level structures, which can be potentially replicated elsewhere. Meso-level elements (middle-level structures of control) are often regarded as ‘subjective institutional change’, e.g., failures to implement programs locally or misinterpretations of reflexive dialogs with communities. Nevertheless, it is often only meso-level structures that are capable of promoting and replicating policies elsewhere. At the same time, there is increasing appeal by governmental departments for communities to ‘selforganize’ and take responsibility for prioritizing environmental tasks, which themselves might be instigated by local trusts and voluntary organizations, the existence of which remains largely unaccounted for by central offices. The recent proliferation of Twitter accounts, with the prominent themes of water, ecology and ecosystems, which include people, organizations, businesses and ‘bots’ of various types, presents new opportunities for digital methods to gain insights into structures and functions of these virtual communities. We hypothesize that our methods can produce invaluable insights into the ‘crafting’ of environmental institutions through approaches commonly ignored by traditional ‘analog’ meso-level mechanisms. We use the example of Integrated Catchment Management in the UK, and specifically the Tamar Catchment in southwest England, in order to demonstrate how well Twitter can capture this transitory meso-level environmental political system.

James Dickson, Satheesh Maheswaran, Steven A Wright, J. A Herdman, Stephen A Jarvis (2015)MINIO : an I/O benchmark for investigating high level parallel libraries

Input/output (I/O) operations are one of the biggest challenges facing scientific computing as it transitions to exascale. The traditional software stack – com- prising of parallel file systems, middlewares and high level libraries – has evolved to enable applications to better cope with the demands of enormous datasets. This software stack makes high performance parallel I/O easily accessible to application engineers, however it is important to ensure best performance is not compromised through attempts to enrich these libraries. We present MINIO, a benchmark for the investigation of I/O behaviour focusing on understanding overheads and inefficiencies in high level library usage. MINIO uses HDF5 and TyphonIO to explore I/O at scale using different application behavioural pat- terns. A case study is performed using MINIO to identify performance limiting characteristics present in the TyphonIO library as an example of performance discrepancies in the I/O stack.

P Boonmee, N. J Saunders, Stephen A Jarvis (2001)Computational analysis of lateral gene transfer

Identification of regions with untypical percentage G+C composition anddinucleotide signatures are two genome analysis techniques used in theidentification of horizontallyacquired DNA. We describe a generic programframework for performing both types of analysis in linear time. Theapproach is extended for length >2 oligonucleotide signatures. Using the derived program we test some of the conclusions of Karlin and Burge, the primary exponents of these techniques. We demonstrate that no singlemethod of signature analysis is sufficient for the complete identificationof horizontally acquired DNA. Consequently we produce a fast program -and a robust methodology - for the production and employment genomesignatures in the identification of horizontally acquired DNA.

W Gaudin, Andrew C Mallinson, O. F. J Perks, J. A Herdman, David A Beckingsale, J Levesque, M Boulton, S McIntosh-Smith, Stephen A Jarvis (2014)Optimising hydrodynamics applications for the Cray XC30 with the application tool suite Cray User Group, Inc

Power constraints are forcing HPC systems to continue to increase hardware concurrency. Efficiently scaling applications on future machines will be essential for improved science and it is recognised that the “flat” MPI model will start to reach its scalability limits. The optimal approach is unknown, necessitating the use of mini-applications to rapidly evaluate new approaches. Reducing MPI task count through the use of shared memory programming models will likely be essential. We examine different strategies for improving the strong-scaling performance of explicit Hydrodynamics applications. Using the CloverLeaf mini-application across multiple generations of Cray platforms (XC30, XK6 and XK7), we show the utility of the hybrid approach and document our experiences with OpenMP, CUDA, OpenCL and OpenACC under both the PGI and CCE compilers. We also evaluate Cray Reveal as a tool for automatically hybridising HPC applications and Cray’s MPI rank to network topology-mapping tools for improving application performance.

Henry Crosby, Paul Davis, Stephen A. Jarvis (2015)Exploring New Data Sources to Improve UK Land Parcel Valuation, In: Proceedings of the 1st International ACM SIGSPATIAL Workshop on Smart Cities and Urban Analytics2835028pp. 32-35 ACM

The paper describes a novel approach for building a UK-wide Automated Land Valuation Model and its implementation into commercial online software. We examine existing approaches to land valuation used in the UK, notably Trade Area Analysis, Spatial Interaction and Comparable Sales. We make the case that land use analysis, demographics and societal preferences affect the potential income and optimal use of parcels of land and hence the value of those parcels. This hypothesis leads to the introduction of a number of additional factors required to facilitate estimated land value, including traffic flow, population and site suitability. A number of artificial intelligence (AI) and machine learning spatial-temporal techniques are introduced to predict the value of all land parcels sold since 1995. We introduce a new technique, which includes (i) the application of Support Vector Machines to land use analysis; (ii) the use of predictive techniques for macro-environmental factors; (iii) the use of large, open-source data sets to improve valuation; (iv) industry alignment in predefined industrial tool. A number of different mathematical techniques are used to validate the proposed model and we show that our model demonstrates 92% accuracy for residential pricing predictions.

S.A Jarvis, J.M.D Hill, C.J Siniolakis, V.P Vasilev (2002)Portable and architecture independent parallel performance tuning using BSP, In: Parallel computing28(11)0167pp. 1587-1609 Elsevier B.V

A call-graph profiling tool has been designed and implemented to analyse the efficiency of programs written in BSPlib. This tool highlights computation and communication imbalance in parallel programs, exposing portions of program code which are amenable to improvement. A unique feature of this profiler is that it uses the bulk synchronous parallel cost model, thus providing a mechanism for portable and architecture-independent parallel performance tuning. In order to test the capabilities of the model on a real-world example, the performance characteristics of an SQL query processing application are investigated on a number of different parallel architectures.

Stephen A Jarvis, G. R Nudd (2004)Review of the e-science demonstrators and their compatibility with the Warwick middleware project University of Warwick. Department of Computer Science

This document reports on the six month survey of the UK e-Science demonstrator projects. The report details the eight project consultations in which we have engaged, and provides a summary of the likely compatibility of these projects with the Warwick middleware research. Two of these eight projects have been identified for further collaboration, details of which are documented. In addition, this consultation has been extended to include UK e-Science pilot projects and the details of planned collaboration with one of these pilot projects also forms part of this report.

Stephen A Jarvis (2003)Performance engineering IEE, Publ. Dep

Predictive performance models of e-Commerce applications will allow Grid workload managers to provide e-Commerce clients with qualities of service (QoS) whilst making efficient use of resources. This paper demonstrates the use of two ‘coarse-grained’ modelling approaches (based on layered queuing modelling and historical performance data analysis) for predicting the performance of dynamic e-Commerce systems on heterogeneous servers. Results for a popular e-Commerce benchmark show how request response times and server throughputs can be predicted on servers with heterogeneous CPUs at different background loads. The two approaches are compared and their usefulness to Grid workload management is considered.

Brian Patrick Foley, Daniel P Spooner, P. J Isitt, Stephen A Jarvis, G. R Nudd (2005)Performance prediction for a code with data-dependent runtimes

In this paper we present a preliminary predictive model for a key biomedical imaging application in the UK e-Science IXI (Information eXtraction from Images) project. This code represents a significant challenge for our existing performance prediction tools as it has internal structures that exhibit highly variable runtimes depending on qualities in the input data provided. Since the runtime can vary by more than an order of magnitude, it has been difficult to apply meaningful quality of service criteria to workflows that use this code. The model developed here is used in the context of an interactive scheduling system which provides rapid feedback to the users, allowing them to tailor their workloads to available resources, or to allocate extra resources to scheduled workloads.

This paper details the development and application of a model for predictive performance analysis of a pipelined synchronous wavefront application running on commodity processor cluster systems. The performance model builds on existing work [1] by including extensions for modern commodity processor architectures. These extensions, including coarser hardware benchmarking, prove to be essential in countering the effects of modern superscalar processors (e.g. multiple operation pipelines and on-the-fly optimisations), complex memory hierarchies, and the impact of applying modern optimising compilers. The process of application modelling is also extended, combining static source code analysis with run-time profiling results for increased accuracy. The model is validated on several high performance SMP systems and the results show a high predictive accuracy (

M Al-Ghamdi, Adam P Chester, J. W. J Xue, Stephen A Jarvis (2009)The effect of server reallocation time in dynamic resource allocation

Since web workloads are known to vary dynamically with time, in this paper, we argue that dynamic resource allocation techniques are necessary to provide guarantees to web applications running on shared data centers. To address this issue, we use a system architecture that combines online measurements with prediction and resource allocation techniques. To capture the transient behavior of the application workloads, we model a server resource using a time-domain description of a generalized processor sharing (GPS) server. This model relates application resource requirements to their dynamically changing workload characteristics. The parameters of this model are continuously updated using an online monitoring and prediction framework. This framework uses time series analysis techniques to predict expected workload parameters from measured system metrics. We then employ a constrained non-linear optimization technique to dynamically allocate the server resources based on the estimated application requirements. The main advantage of our techniques is that they capture the transient behavior of applications while incorporating nonlinearity in the system model. We evaluate our techniques using simulations with synthetic as well as real-world web workloads. Our results show that these techniques can judiciously allocate system resources, especially under transient overload conditions.

Stephen A Jarvis, A Randewich (2010)High performance computing for national security, In: The Science and Technology Journal of AWEpp. 24-33 Atomic Weapons Establishment (AWE)
J. A. Herdman, W. P. Gaudin, O. Perks, D. A. Beckingsale, A. C. Mallinson, S. A. Jarvis (2014)Achieving portability and performance through OpenACC, In: Proceedings of the First Workshop on Accelerator Programming using Directives7081674pp. 19-26 IEEE Press

OpenACC is a directive-based programming model designed to allow easy access to emerging advanced architecture systems for existing production codes based on Fortran, C and C++. It also provides an approach to coding contemporary technologies without the need to learn complex vendor-specific languages, or understand the hardware at the deepest level. Portability and performance are the key features of this programming model, which are essential to productivity in real scientific applications. OpenACC support is provided by a number of vendors and is defined by an open standard. However the standard is relatively new, and the implementations are relatively immature. This paper experimentally evaluates the currently available compilers by assessing two approaches to the OpenACC programming model: the "parallel" and "kernels" constructs. The implementation of both of these construct is compared, for each vendor, showing performance differences of up to 84%. Additionally, we observe performance differences of up to 13% between the best vendor implementations. OpenACC features which appear to cause performance issues in certain compilers are identified and linked to differing default vector length clauses between vendors. These studies are carried out over a range of hardware including GPU, APU, Xeon and Xeon Phi based architectures. Finally, OpenACC performance, and productivity, are compared against the alternative native programming approaches on each targeted platform, including CUDA, OpenCL, OpenMP 4.0 and Intel Offload, in addition to MPI and OpenMP.

Xinuo Chen, Stephen A Jarvis, G Tan, Ligang He, Daniel P Spooner, G. R Nudd (2005)An implementation of BLAST over peer-to-peer and its performance validation through simulation
O. F. J Perks, Robert F Bird, David A Beckingsale, Stephen A Jarvis (2012)Exploiting spatiotemporal locality for fast call stack traversal

In the approach to exascale, scalable tools are becoming increasingly necessary to support parallel applications. Evaluating an application’s call stack is a vital technique for a wide variety of profilers and debuggers, and can create a significant performance overhead. In this paper we present a heuristic technique to reduce the overhead of frequent call stack evaluations. We use this technique to estimate the similarity between successive call stacks, removing the need for full call stack traversal and eliminating a significant portion of the performance overhead. We demonstrate this technique applied to a parallel memory tracing toolkit, WMTools, and analyse the performance gains and accuracy.

Huanzhou Zhu, Ligang He, Stephen A. Jarvis (2014)Optimizing Job Scheduling on Multicore Computers, In: Proceedings - International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems2015-(February)7033638pp. 61-70 IEEE

It is common nowadays that multiple cores reside on the same chip and share the on-chip cache. Resource sharing may cause performance degradation of the co-running jobs. Job co-scheduling is a technique that can effectively alleviate the contention. Many co-schedulers have been developed in the literature, but most of them do not aim to find the optimal co-scheduling solution. Being able to determine the optimal solution is critical for evaluating co-scheduling systems. Moreover, most co-schedulers only consider serial jobs. However, there often exist both parallel and serial jobs in some situations. This paper aims to tackle these issues. In this paper, a graph-based method is developed to find the optimal co-scheduling solution for serial jobs, and then the method is extended to incorporate parallel jobs. The extensive experiments have been conducted to evaluate the effectiveness and efficiency of the proposed co-scheduling algorithms. The results show that the proposed algorithms can find the optimal co-scheduling solution for both serial and parallel jobs.

Stephen A Jarvis, Jason S Mirsky, John F Peden, Nigel J Saunders (2000)Finding secret messages in DNA microdots University of Warwick. Department of Computer Science

A DNA microdot offers a novel way in which to communicate secret information. It is an extension of the more traditional microdot, a greatly reduced photograph of a secret document which replaced a full stop somewhere in an innocent-looking letter. The DNA microdot appears to be the secure modern alternative, exploiting the complexity of DNA in which a coded secret message is hidden. An enemy can only unlock the secret information by first knowing that an intercepted letter is impregnated with microdot DNA, and secondly by finding the message amongst a huge amount of background DNA used to mask the secret information. Using software developed to identify horizontally acquired DNA, we show that this apparently insurmountable task is in fact possible. With the increased speed of DNA sequencing, the information contained in the DNA microdot is far from secure.

Xinuo Chen, Stephen A Jarvis (2008)Design and implementation of efficient range query over DHT services DSP for Communication Systems

This paper describes the design and implementation of DAST, a Distributed Arbitrary Segment Tree structure that gives support of range query for public Distributed Hash Table (DHT) services. DAST does not modify the underlying DHT infrastructure, instead it utilises the scalability and robustness of DHT while providing simplicity of implementation and deployment for applications. Compared with traditional segment trees, the arbitrary segment tree used by a DAST reduces the number of key-space segments that need to be maintained, which in turn results in fewer query operations and lower overheads. Moreover, considering that range queries often contain redundant entries that the clients do not need, we introduce the concept of Accuracy of Results (AoR) for range queries. We demonstrate that by adjusting AoR, the DHT operational overhead can be improved. DAST is implemented on a well-known public DHT service (OpenDHT) and validation through experimentation and supporting simulation is performed. The results demonstrate the effectiveness of DAST over exiting methods.

Wentai Wu, Ligang He, Weiwei Lin, Yi Su, Yuhua Cui, Carsten Maple, Stephen Jarvis (2022)Developing an Unsupervised Real-Time Anomaly Detection Scheme for Time Series With Multi-Seasonality, In: IEEE transactions on knowledge and data engineering34(9)pp. 4147-4160 IEEE

On-line detection of anomalies in time series is a key technique used in various event-sensitive scenarios such as robotic system monitoring, smart sensor networks and data center security. However, the increasing diversity of data sources and the variety of demands make this task more challenging than ever. First, the rapid increase in unlabeled data means supervised learning is becoming less suitable in many cases. Second, a large portion of time series data have complex seasonality features. Third, on-line anomaly detection needs to be fast and reliable. In light of this, we have developed a prediction-driven, unsupervised anomaly detection scheme, which adopts a backbone model combining the decomposition and the inference of time series data. Further, we propose a novel metric, Local Trend Inconsistency (LTI), and an efficient detection algorithm that computes LTI in a real-time manner and scores each data point robustly in terms of its probability of being anomalous. We have conducted extensive experimentation to evaluate our algorithm with several datasets from both public repositories and production environments. The experimental results show that our scheme outperforms existing representative anomaly detection algorithms in terms of the commonly used metric, Area Under Curve (AUC), while achieving the desired efficiency.

Malik Shahzad K Awan, Stephen A Jarvis (2012)MalikStone : a context-aware light-weight benchmark for P2P computing IEEE

Benchmarking has been used by performance engineers for over three decades to gain better insight into system performance. Numerous benchmarks are used in industry to characterize the performance of standalone systems on the one hand, to measure the computational efficiency of massively parallel systems such as clusters, grids and supercomputers, on the other hand. One of the common features of performance evaluation of small or large standalone systems is that performance engineers have direct and full control over these systems. The peer-to-peer (P2P) paradigm, which involves volunteer participation for execution of lengthy and complex scientific computations, operates in an uncontrolled environment. The performance evaluation of participating nodes operating in an uncontrolled P2P environment is a challenging task. Currently, two traditional synthetic benchmarks (Dhrystone and Whetstone) are used as part of Berkeley Open Infrastructure for Network Computing (BOINC)'s Accounting System for granting credits to the participating nodes. The analysis of the performance data obtained from running Dhrystone and Whetstone on general purpose computers has highlighted the limitations of these benchmarks. This study proposes a new synthetic light-weight benchmark - MalikStone, which is representative of large P2P projects and has been specifically designed in view the dynamic nature and challenges of the P2P paradigm. The benchmark integrates the strengths of existing synthetic benchmarks and provides more detailed insight into system performance by capturing the context in which the benchmarking was performed. The results of this newly designed benchmark have been found to be encouraging. Compared with existing synthetic benchmarks, Dhrystone and Whetstone, and the SPEC CPU2006 benchmarking suite, we highlight MalikStone's superiority in characterizing system performance in P2P settings.

Jonathan M. R Byrd, Stephen A Jarvis, Abhir Bhalerao (2010)On the parallelisation of MCMC by speculative chain execution IEEE

The increasing availability of multi-core and multi- processor architectures provides new opportunities for improving the performance of many computer simulations. Markov Chain Monte Carlo (MCMC) simulations are widely used for approximate counting problems, Bayesian inference and as a means for estimating very high-dimensional integrals. As such MCMC has had a wide variety of applications in fields including computational biology and physics, financial econometrics, machine learning and image processing. One method for improving the performance of Markov Chain Monte Carlo simulations is to use SMP machines to perform ‘speculative moves’, reducing the runtime whilst producing statistically identical results to conventional sequential implementations. In this paper we examine the circumstances under which the original speculative moves method performs poorly, and consider how some of the situations can be addressed by refining the implementation. We extend the technique to perform Markov Chains speculatively, expanding the range of algorithms that maybe be accelerated by speculative execution to those with non-uniform move processing times. By simulating program runs we can predict the theoretical reduction in runtime that may be achieved by this technique. We compare how efficiently different architectures perform in using this method, and present experiments that demonstrate a runtime reduction of up to 35-42% where using conventional speculative moves would result in execution as slow, if not slower, than sequential processing.

The Resilient Distributed Dataset (RDD) is the core memory abstraction behind the popular data-analytic framework Apache Spark. We present an extension to the Resilient Distributed Dataset for map transformations, that we call MapRDD, which takes advantage of the underlying relations between records in the parent and child datasets, in order to achieve random-access of individual records in a partition. The design is complemented by a new MemoryStore, which manages data sampling and data transfers asynchronously. We use the ImageNet dataset to demonstrate that: (I) The initial data loading phase is redundant and can be completely avoided; (II) Sampling on the CPU can be entirely overlapped with training on the GPU to achieve near full occupancy; (III) CPU processing cycles and memory usage can be reduced by more than 90%, allowing other applications to be run simultaneously; (IV) Constant training step time can be achieved, regardless of the size of the partition, for up to 1.3 million records in our experiments. We expect to obtain the same improvements in other RDD transformations via further research on finer-grained implicit & explicit dataset relations.

Simon David Hammond, Steven A. Wright, Simon J. Pennycook, Robert F. Bird, John A. Herdman, Iain Miller, Ash Vadgama, Ligang He, Stephen A. Jarvis, (2012)Parallel File System Analysis Through Application I/O Tracing, In: The Computer Journal
Daniel P Spooner, James D Turner, J Cao, Stephen A Jarvis, G. R Nudd (2001)Application characterisation using a lightweight transaction model
Robert F Bird, Simon J Pennycook, Steven A Wright, Stephen A Jarvis (2013)Towards a portable and future-proof particle-in-cell plasma physics code

We present the first reported OpenCL implementation of EPOCH3D, an extensible particle-in-cell plasma physics code developed at the University of Warwick. We document the challenges and successes of this porting effort, and compare the performance of our implementation executing on a wide variety of hardware from multiple vendors. The focus of our work is on understanding the suitability of existing algorithms for future accelerator-based architectures, and identifying the changes necessary to achieve performance portability for particle-in-cell plasma physics codes. We achieve good levels of performance with limited changes to the algorithmic behaviour of the code. However, our results suggest that a fundamental change to EPOCH3D’s current accumulation step (and its dependency on atomic operations) is necessary in order to fully utilise the massive levels of parallelism supported by emerging parallel architectures.

Stephen A Jarvis, Daniel P Spooner, Hélène Niuklan Lim Choi Keung, J Cao, S Saini, G. R Nudd (2003)Performance prediction and its use in parallel and distributed computing systems IEEE

A performance prediction framework is described in which predictive data generated by the PACE toolkit is stored and published through a Globus MDS-based performance information service. Distributing this data allows additional performance-based middleware tools to be built; this paper describes two such tools, a local-level scheduler and a system for wide-area task management. Experimental evidence shows that by integrating these performance tools for local- and wide-area management, considerable improvements can be made to task scheduling, resource utilisation and load balancing on heterogeneous distributed computing systems.

Stephen A Jarvis, Daniel P Spooner, J. R. D Dyson, Lei Zhao, G. R Nudd (2003)Performance-responsive middleware for grid computing

The development of supportive middleware to manage resources and distributed workload across multiple administrative boundaries is of central importance to Grid computing. Active middleware services that perform look-up, match-making, scheduling and staging are being developed that allow users to identify and utilise appropriate resources that provide sustainable system- and user-level qualities of service. This paper documents two performance-responsive middleware services that address the implications of executing a particular workload on a given set of resources. These services are based on an estab- lished performance prediction system that is employed at both the local (intra-domain) and global (multi- domain) levels to provide dynamic workload steering. These additional facilities bring about significant performance improvements, the details of which are presented with regard to the user-perceived quality of service and to resource utilisation.

Adam P Chester, Matthew Leeke, M Al-Ghamdi, Stephen A Jarvis, Arshad Jhumka (2011)A modular failure-aware resource allocation architecture for cloud computing

The prevalence of cloud computing environments and the ever increasing reliance of large organisations on computational resources has meant that service providers must operate at unprecedented scales and levels of efficiency. Dynamic resource allocation (DRA) policies have been shown to allow service providers to improve resource utilisation and operational efficiency in presence of unpredictable demands, hence maximising profitability. However, practical considerations, such as power and space, have led service providers to adopt rack based approaches to application servicing. This co-location of computation resources, and the associated common provision of utilities it encourages, has immediate implications for system dependability. Specifically, in the presence of rack crash failures which can lead to all the servers within a rack becoming unavailable, resource allocation policies need to be cognisant of failures. In this paper, we address this issue and make the following specific contributions: (i) we present a modular architecture for failure-aware resource allocation, where a performance- oriented DRA policy is composed with a failure-aware resource allocator, (ii) we propose a metric, called Capacity Loss, to capture the exposure of an application to a rack failure, (iii) we develop an algorithm for reducing the proposed metric across all applications in a system operating under a DRA policy, and (iv) we evaluate the effectiveness of the proposed architecture on a large-scale DRA policy in context of rack failures, ultimately concluding that our approach reduces the number of failed requests as compared to a single random allocation. The main benefit of our approach is that we have developed a failure-aware resource allocation framework that can work in tandem with any DRA policy.

A Bokma, Stephen A Jarvis, M Huiban, A Slade (1995)Handling communication in concurrent KBS Gordon and Breach Science Publishers, Inc
Richard Frederick Barrett, Simon David Hammond, Oliver Perks, David A. Beckingsale, Iain Miller, John A. Herdman, Ash Vadgama, Ligang He, Stephen A. Jarvis, (2012)Towards Automated Memory Model Generation Via Event Tracing, In: Proposed for publication in The Computer Journal
Jonathan M R Byrd, Stephen A Jarvis, Abhir H Bhalerao (2010)On the parallelisation of MCMC-based image processing, In: 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum5470896pp. 1-8 IEEE

The increasing availability of multi-core and multiprocessor architectures provides new opportunities for improving the performance of many computer simulations. Markov Chain Monte Carlo (MCMC) simulations are widely used for approximate counting problems, Bayesian inference and as a means for estimating very high-dimensional integrals. As such MCMC has found a wide variety of applications in fields including computational biology and physics, financial econometrics, machine learning and image processing. Whilst divide and conquer is an obvious means to simplify image processing tasks, "naively" dividing an image into smaller images to be processed separately results in anomalies and breaks the statistical validity of the MCMC algorithm. We present a method of grouping the spatially local moves and temporarily partitioning the image to allow those moves to be processed in parallel, reducing the runtime whilst conserving the properties of the MCMC method. We calculate the theoretical reduction in runtime achievable by this method, and test its effectiveness on a number of different architectures. Experiments are presented that show reductions in runtime of 38% using a dual-core dual-processor machine. In circumstances where absolute statistical validity are not required, an algorithm based upon, but not strictly adhering to, MCMC may suffice. For such situation two additional algorithms are presented for partitioning of images to which MCMC will be applied. Assuming preconditions are met, these methods may be applied with minimal risk of anomalous results. Although the extent of the runtime reduction will be data dependent, the experiments performed showed the runtime reduced to 27% of its original value.

Malik Shahzad K. Awan, Stephen A. Jarvis (2013)Dynamic work-unit slicing for time-constrained job execution in P2P environments, In: 2013 International Conference on High Performance Computing & Simulation (HPCS)pp. 576-583 IEEE

Efficient utilization of resources is one of the key challenges that project administrators face in the dynamic peer-to-peer (P2P) computing environment. The volatile nature of participation by the general public for running lengthy and complex scientific applications is not bounded by any legal obligations, thus, requires following a greedy approach for resource utilization. Generally, the availability statistics of a computational resource and selected set of scheduling algorithms or heuristics are combined together for allocating tasks to the available resources. Berkeley Open Infrastructure for Network Computing (BOINC) is a widely used middleware platform in P2P systems, which uses four interrelated scheduling policies for utilization of available pool of computational resources. The policies used by BOINC mainly depend on the job execution estimates based on the composition of the job and performance results of the considered computing platform obtained using benchmarks. BOINC deploys two traditional synthetic benchmarks: Dhrystone and Whetstone; for measuring the integer and floating-point performance of a considered platform. However, the performance results obtained using these benchmarks show significant variations in results for similar microprocessor, operating system and hardware configuration. These inconsistent results, when used for scheduling, significantly affect the resource-scheduling estimates particularly for time-constrained jobs. This study proposes a novel scheduling policy based on a more consistent and P2P representative benchmark - MalikStone. The policy considers the total availability time of a computational resource, estimated execution time of a work-unit and the available unused time of a computational resource for dynamically slicing large work-unit into smaller work-unit depending on the available unused time of a computational resource. The results have revealed that the policy improved the utilization of available computational resources by around 10% under the considered experimental settings.

Mike Joy, Michael Luck, Stephen A. Jarvis (2002)Introducing Unix and Linux / Mike Joy, Stephen Jarvis and Michael Luck Palgrave Macmillan
Nigel J Sunaders, Jason S Mirsky, John F Peden, Stephen A Jarvis (2001)An improved method for identifying horizontally acquired virulence genes
Henry Crosby, Theo Damoulas, Alex Caton, Paul Davis, João Porto de Albuquerque, Stephen A. Jarvis (2018)Road distance and travel time for an improved house price Kriging predictor, In: Geo-spatial information science21(3)pp. 185-194 Taylor & Francis

The paper designs an automated valuation model to predict the price of residential property in Coventry, United Kingdom, and achieves this by means of geostatistical Kriging, a popularly employed distance-based learning method. Unlike traditional applications of distance-based learning, this papers implements non-Euclidean distance metrics by approximating road distance, travel time and a linear combination of both, which this paper hypothesizes to be more related to house prices than straight-line (Euclidean) distance. Given that - to undertake Kriging - a valid variogram must be produced, this paper exploits the conforming properties of the Minkowski distance function to approximate a road distance and travel time metric. A least squares approach is put forth for variogram parameter selection and an ordinary Kriging predictor is implemented for interpolation. The predictor is then validated with 10-fold cross-validation and a spatially aware checkerboard hold out method against the almost exclusively employed, Euclidean metric. Given a comparison of results for each distance metric, this paper witnesses a goodness of fit ( ) result of 0.6901 ± 0.18 SD for real estate price prediction compared to the traditional (Euclidean) approach obtaining a suboptimal value of 0.66 ± 0.21 SD.

Stephen A Jarvis, N Thomas, A van Moorsel (2014)Open issues in grid performability, In: International Journal of Simulation and Process Modellingpp. 3-12 Inderscience

This paper seeks to identify some of the technology challenges and research opportunities in assessing, optimising and assuring performance and dependability on the grid. Grid computing offers many scientific and commercial benefits, but also many technical and organisational challenges. Amongst the challenges for grid computing is the need to provide reliable quality of service to end users, while enabling increasingly dynamic and federated usage scenarios. Understanding and quantifying the performance capabilities of a grid system and its applications is a crucial issue, without which performance improvements will be at best ad hoc. Nevertheless, this is particularly difficult for complex and dynamic systems like the grid, where both the computing environment (in terms of the contributing platforms and networks) and the application base are in continual change. There is an internationally recognised need for research in this area and success will lay the foundation for next generation e-Science.

James D Turner, David A Bacigalupo, Stephen A Jarvis, D. N Dillenberger (2002)Application response measurement of distributed web services, In: Journal of Computer Resource Measurementpp. 45-55
Michael G Hinchey, Stephen A Jarvis (1995)Concurrent systems McGraw-Hill

The advent of concurrent systems undoubtedly promises greatly increased processing power. These systems, however, are inherently complex and present many new problems at both the design and implementation phases. Appropriate specification, design and implementation methods are required, which will facilitate the development of such systems. Techniques based on formal methods, such as the language of CSP (Communicating Sequential Processes) have proven to be the most successful means of conquering complexity in the specification of concurrent, embedded, real-time and distributed systems. Concurrent Systems: Formal Development in CSP offers an up-to-date and practical treatment of this important subject. The basis of the text is the combination of a tutorial approach with convenient reference material. Beginning with a clear introduction to CSP, the book goes on to provide a comprehensive listing of the laws of CSP and describes the more important variants of the language. An illustrative case study, based on the verification of a network protocol, forms a central part of the book and practical implementation in both Occam and Ada 9X is discussed in depth. This book represents a significant, student-friendly contribution to the teaching of CSP. Its example-based, but rigorous approach makes it ideal for both undergraduate and postgraduate courses in concurrency, formal methods, distributed and real-time systems. Special features include: A tutorial introduction combined with comprehensive reference material, coverage of the most important variants of CSP, a fully annotated select bibliography, and implementation details in Occam and Ada 9X.

Peter Coetzee, Stephen Jarvis (2015)Goal-Based Analytic Composition for On-and Off-line Execution at Scale, In: 2015 IEEE Trustcom/BigDataSE/ISPA27345475pp. 56-65 IEEE

Crafting scalable analytics in order to extract actionable business intelligence is a challenging endeavour, requiring multiple layers of expertise and experience. Often, this expertise is irreconcilably split between an organisation's engineers and subject matter or domain experts. Previous approaches to this problem have relied on technically adept users with tool-specific training. These approaches have generally not targeted the levels of performance and scalability required to harness the sheer volume and velocity of large-scale data analytics. In this paper, we present a novel approach to the automated planning of scalable analytics using a semantically rich type system, the use of which requires little programming expertise from the user. This approach is the first of its kind to permit domain experts with little or no technical expertise to assemble complex and scalable analytics, for execution both on-and off-line, with no lower-level engineering support. We describe in detail (i) an abstract model of analytic assembly and execution, (ii) goal-based planning and (iii) code generation using this model for both on-and off-line analytics. Our implementation of this model, Mendeleev, is used to (iv) demonstrate the applicability of our approach through a series of case studies, in which a single interface is used to create analytics that can be run in real-time (on-line) and batch (off-line) environments. We (v) analyse the performance of the planner, and (vi) show that the performance of Mendeleev's generated code is comparable with that of hand-written analytics.

M. S. K Awan, Stephen A Jarvis (2010)Understanding resource provisioning for ClimatePredication.net IEEE

Peer-to-peer computing, involving the participation of thousands of general purpose, public computers, has established itself as a viable paradigm for executing loosely- coupled, complex scientific applications requiring significant computational resources. ClimatePrediction.net is an excellent demonstrator of this technology, having attracted hundreds of thousands of users from more than 200 countries for the efficient and effective execution of complex climate prediction models. This paper is concerned with understanding the underlying compute resources on which ClimatePrediction.net commonly executes. Such a study is advantageous to three different stakeholders, namely the application developers, the project participants and the project administrators. This is the first study of this kind, and while offering general conclusions with specific reference to ClimatePrediction.net, it also illustrates the benefits of such work to scalability analysis of other peer-to-peer projects.

Dominic Brown, Matthew T Bettencourt, Steven A Wright, John P Jones, Stephen A Jarvis (2017)Performance of second order particle-in-cell methods on modern many-core architectures

The emergence of modern many-core architectures that offer an extreme level of parallelism makes methods that were previously infeasible due to computational expense now achievable. Particle-in-Cell (PIC) codes often fail to fully leverage this increased performance potential due to their high use of memory bandwidth. The use of higher order PIC methods may offer a solution to this by improving simulation accuracy significantly for an increase in computational intensity when compared to their first order counterparts. This greater expense is accompanied with only a minor increase in the amount of memory throughput required during the simulation. In this presentation we will show the performance of a second order PIC algorithm. Our implementation uses second order finite elements and particles that are represented with a collection of surrounding ghost particles. These ghost particles each have associated weights and offsets around the true particle position and therefore represent a charge distribution. We test our PIC implementation against a first order algorithm on various modern compute architectures including Intel’s Knights Landing (KNL) and NVIDIA’s Tesla P100. Our preliminary results show the viability of second order methods for PIC applications on these architectures when compared to previous generations of many-core hardware. Specifically, we see an order of magnitude improvement in performance for second order methods between the Pascal and Kepler GPU architectures, despite only a 4× improvement in theoretical peak performance between the architectures. Although these initial results show a large increase in runtime over first order methods, we hope to be able to show improved scaling behaviour and increased simulation accuracy in the future.

Simon J Pennycook, Simon D Hammond, Gihan R Mudalige, Stephen A Jarvis (2010)Experiences with porting and modelling wavefront algorithms on many-core architectures

We are currently investigating the viability of many-core architectures for the acceleration of wavefront applications and this report focuses on graphics processing units (GPUs) in particular. To this end, we have implemented NASA’s LU benchmark – a real world production-grade application – on GPUs employing NVIDIA’s Compute Unified Device Architecture (CUDA). This GPU implementation of the benchmark has been used to investigate the performance of a selection of GPUs, ranging from workstation-grade commodity GPUs to the HPC "Tesla” and "Fermi” GPUs. We have also compared the performance of the GPU solution at scale to that of traditional high perfor- mance computing (HPC) clusters based on a range of multi- core CPUs from a number of major vendors, including Intel (Nehalem), AMD (Opteron) and IBM (PowerPC). In previous work we have developed a predictive “plug-and-play” performance model of this class of application running on such clusters, in which CPUs communicate via the Message Passing Interface (MPI). By extending this model to also capture the performance behaviour of GPUs, we are able to: (1) comment on the effects that architectural changes will have on the performance of single-GPU solutions, and (2) make projections regarding the performance of multi-GPU solutions at larger scale.

Simon J Pennycook, Simon D Hammond, Gihan R Mudalige, Stephen A Jarvis (2010)Performance analysis of a hybrid MPI/CUDA implementation of the NAS-LU benchmark ACM

We present the performance analysis of a port of the LU benchmark from the NAS Parallel Benchmark (NPB) suite to NVIDIA's Compute Unified Device Architecture (CUDA), and report on the optimisation efforts employed to take advantage of this platform. Execution times are reported for several different GPUs, ranging from low-end consumer-grade products to high-end HPC-grade devices, including the Tesla C2050 built on NVIDIA's Fermi processor. We also utilise recently developed performance models of LU to facilitate a comparison between future large-scale distributed clusters of GPU devices and existing clusters built on traditional CPU architectures, including a quad-socket, quad-core AMD Opteron cluster and an IBM BlueGene/P.

Hélène Niuklan Lim Choi Keung, J. R. D Dyson, Stephen A Jarvis, G. R Nudd (2003)The globus monitoring and discovery service (MDS-2) : a performance analysis
Ligang He, Chenlin Huang, Kewei Duan, Kenli Li, Hao Chen, Jianhua Sun, Stephen A. Jarvis (2012)Modeling and analyzing the impact of authorization on workflow executions, In: Future generation computer systems28(8)pp. 1177-1193 Elsevier B.V

It has been a subject of a significant amount of research to automate the execution of workflows (or business processes) on computer resources. However, many workflow scenarios still require human involvement, which introduces additional security and authorization concerns. This paper presents a novel mechanism for modeling the execution of workflows with human involvement under Role-based Authorization Control. Our modeling approach applies Colored Timed Petri-Nets to allow various authorization constraints to be modeled, including role, temporal, cardinality, BoD (Binding of Duty), SoD (Separation of Duty), role hierarchy constraints etc. We also model the execution of tasks with different levels of human involvement and as such allow the interactions between workflow authorization and workflow execution to be captured. The modeling mechanism is developed in such a way that the construction of the authorization model for a workflow can be automated. This feature is very helpful for modeling large collections of authorization policies and/or complex workflows. A Petri-net toolkit, the CPN Tools, is utilized in the development of the modeling mechanism and to simulate the constructed models. This paper also presents the methods to analyze and calculate the authorization overhead as well as the performance data in terms of various metrics through the model simulations. Based on the simulation results, this paper further proposes the approaches to improving performance given the deployed authorization policies. This work can be used for investigating the impact of authorization, for capacity planning, for the design of workload management strategies, and also to estimate execution performance, when human resources and authorization policies are employed in tandem. ► We model the executions of workflows with human involvement under RBAC authorization. ► We analyze the impact of authorization on workflow executions. ► We calculate the performance of workflow executions under RBAC authorization. ► We propose the approaches to improving performance given the authorization constraints.

Daniel P Spooner, J Cao, James D Turner, H. N Lin Choi Keung, Stephen A Jarvis, G. R Nudd (2002)Localised workload management using performance prediction and QoS contracts
Jonathan M. R Byrd, Stephen A Jarvis, Abhir Bhalerao (2008)Speculative moves : multithreading Markov Chain Monte Carlo programs

The increasing availability of multi-core and multi-processor architectures provides new opportunities for improving the performance of many computer simulations. Markov Chain Monte Carlo (MCMC) simulations are widely used for approximate counting problems, Bayesian inference and as a means for estimating very high-dimensional integrals. As such MCMC has found a wide variety of applications in biology and biomedical analysis. This paper presents a new method for reducing the runtime of Markov Chain Monte Carlo simulations by using SMP machines to speculatively perform iterations in parallel, reducing the runtime of MCMC programs whilst producing statistically identical results to conventional sequential implementations. We calculate the theoretical reduction in runtime that may be achieved using our technique under perfect conditions, and test and compare the method on a selection of multi-core and multi-processor architectures. Experiments are presented that show reductions in runtime of 35% using two cores and 55% using four cores.

Stephen A Jarvis, J. M. D Hill, C Siniolakis, V. P Vasilev (2001)Portable and architecture independent parallel performance tuning University of Warwick. Department of Computer Science

A call-graph profiling tool has been designed and implemented to analyse the efficiency of programs written in bsplib. This tool highlights computation and communication imbalance in parallel programs, exposing portions of program code which are amenable to improvement. A unique feature of this profiler is that it uses the BSP cost model, thus providing a mechanism for portable and architecture-independent parallel performance tuning. In order to test the capabilities of the model on a real-world example, the performance characteristics of an SQL query processing application are investigated on a number of different parallel architectures.

Arun Prabhakar, Christopher R Goddard, Dario Amirantec, Istvan Z Reguly, Axel Gerstenberger, Jan-Friedrich Suhrmann, Stephen A Jarvis, Leigh Lapworth, Gihan R Mudalige (2022)Virtual certification of gas turbine engines - visualizing the DLR Rig250 compressor

High Performance Computing (HPC) critically underpins the design of aero-engines. With global emissions targets, engine designs require a fundamental change including designs utilizing sustainable aviation fuels and electric/hybrid flight. Virtual certification of designs with HPC is recognized as a key technology to meet these challenges, but require analysis on models with higher fidelity, using ultra-large scale executions. In this explanatory SC-SciVis showcase, we present results from time-accurate simulations of a 4.6B-element full 360-degree model of a production-representative gas turbine engine compressor, the Rig250 at DLR. This represents a grand challenge problem, at the fidelity for virtual certification standards. The results are achieved through Rolls-Royce’s Hydra CFD suite on ARCHER2. The compressor is visualized under off-design conditions, demonstrating flow contours of velocity, Mach number and iso-surfaces of vorticity. The level of detail and the HPC simulations leading to the visualizations demonstrate a step-change towards achieving virtual certification objectives under production settings.

Stephen A Jarvis, Ligang He, Daniel P Spooner, G. R Nudd (2004)The impact of predictive inaccuracies on execution scheduling University of Warwick. Department of Computer Science

This paper investigates the underlying impact of predictive inaccuracies on execution scheduling, with particular reference to execution time predictions. This study is conducted from two perspectives: from that of job selection and from that of resource allocation, both of which are fundamental components in execution scheduling. A new performance metric, termed the degree of misperception, is introduced to express the probability that the predicted execution times of jobs display different ordering characteristics from their real execution times due to inaccurate prediction. Specific formulae are developed to calculate the degree of misperception in both job selection and resource allocation scenarios. The parameters which influence the degree of misperception are also extensively investigated. The results presented in this paper are of significant benefit to scheduling approaches that take into account predictive data; the results are also of importance to the application of these scheduling techniques to real-world high-performance systems.

David Truby, Steven A Wright, Robert Kevis, Satheesh Maheswaran, Andy Herdman, Stephen A Jarvis (2018)BookLeaf: An Unstructured Hydrodynamics Mini-application

With the age of Exascale computing causing a diversification away from traditional CPU-based homogeneous clusters, it is becoming increasingly difficult to ensure that computationally complex codes are able to run on these emerging architectures. This is especially important for large physics simulations that are themselves becoming increasingly complex and computationally expensive. One proposed solution to the problem of ensuring these applications can run on the desired architectures is to develop representative mini-applications that are simpler and so can be ported to new frameworks more easily, but which are also representative of the algorithmic and performance characteristics of the original applications. In this paper we present BookLeaf, an unstructured Arbitrary Lagrangian-Eulerian mini-application to add to the suite of representative applications developed and maintained by the UK Mini-App Consortium (UK-MAC). First, we outline the reference implementation of our application in Fortran. We then discuss a number of alternative implementations using a variety of parallel programming models and discuss the issues that arise when porting such an application to new architectures. To demonstrate our implementation, we present a study of the performance of BookLeaf on number of platforms using alternative designs, and we document a scaling study showing the behaviour of the application at scale.

A. C. Mallinson, S. A. Jarvis, W. P. Gaudin, J. A. Herdman (2014)Experiences at scale with PGAS versions of a Hydrodynamics application, In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models2014-pp. 1-11 ACM

In this work we directly evaluate two PGAS programming models, CAF and OpenSHMEM, as candidate technologies for improving the performance and scalability of scientific applications on future exascale HPC platforms. PGAS approaches are considered by many to represent a promising research direction with the potential to solve some of the existing problems preventing codebases from scaling to exascale levels of performance. The aim of this work is to better inform the exacsale planning at large HPC centres such as AWE. Such organisations invest significant resources maintaining and updating existing scientific codebases, many of which were not designed to run at the scales required to reach exascale levels of computational performance on future system architectures. We document our approach for implementing a recently developed Lagrangian-Eulerian explicit hydrodynamics mini-application in each of these PGAS languages. Furthermore, we also present our results and experiences from scaling these different approaches to high node counts on two state-of-the-art, large scale system architectures from Cray (XC30) and SGI (ICE-X), and compare their utility against an equivalent existing MPI implementation.

Wentai Wu, Ligang He, Weiwei Lin, Rui Mao, Carsten Maple, Stephen Jarvis (2021)SAFA: A Semi-Asynchronous Protocol for Fast Federated Learning With Low Overhead, In: IEEE transactions on computers70(5)9093123pp. 655-668 IEEE

Federated learning (FL) has attracted increasing attention as a promising approach to driving a vast number of end devices with artificial intelligence. However, it is very challenging to guarantee the efficiency of FL considering the unreliable nature of end devices while the cost of device-server communication cannot be neglected. In this article, we propose SAFA, a semi-asynchronous FL protocol, to address the problems in federated learning such as low round efficiency and poor convergence rate in extreme conditions (e.g., clients dropping offline frequently). We introduce novel designs in the steps of model distribution, client selection and global aggregation to mitigate the impacts of stragglers, crashes and model staleness in order to boost efficiency and improve the quality of the global model. We have conducted extensive experiments with typical machine learning tasks. The results demonstrate that the proposed protocol is effective in terms of shortening federated round duration, reducing local resource wastage, and improving the accuracy of the global model at an acceptable communication cost.

Andrew C Mallinson, David A Beckingsale, W. P Gaudin, J. A Herdman, J. M Levesque, Stephen A Jarvis (2013)Cloverleaf : preparing hydrodynamics codes for exascale Cray User Group, Inc

In this work we directly evaluate five candidate programming models for future exascale applications (MPI, MPI+OpenMP, MPI+OpenACC, MPI+CUDA and CAF) using a recently developed Lagrangian-Eulerian explicit hydrodynamics mini-application. The aim of this work is to better inform the exacsale planning at large HPC centres such as AWE. Such organisations invest significant resources maintaining and updating existing scientific codebases, many of which were not designed to run at the scale required to reach exascale levels of computation on future system architectures. We present our results and experiences of scaling these different approaches to high node counts on existing large-scale Cray systems (Titan and HECToR). We also examine the effect that improving the mapping between process layout and the underlying machine interconnect topology can have on performance and scalability, as well as highlighting several communication-focused optimisations.

O. Perks, D. A. Beckingsale, A. S. Dawes, J. A. Herdman, C. Mazauric, S. A. Jarvis (2013)Analysing the influence of InfiniBand choice on OpenMPI memory consumption, In: 2013 International Conference on High Performance Computing & Simulation (HPCS)6641412pp. 186-193 IEEE

The ever increasing scale of modern high performance computing platforms poses challenges for system architects and code developers alike. The increase in core count densities and associated cost of components is having a dramatic effect on the viability of high memory-per-core ratios. Whilst the available memory per core is decreasing, the increased scale of parallel jobs is testing the efficiency of MPI implementations with respect to memory overhead. Scalability issues have always plagued both hardware manufacturers and software developers, and the combined effects can be disabling. In this paper we address the issue of MPI memory consumption with regard to InfiniBand network communications. We reaffirm some widely held beliefs regarding the existence of scalability problems under certain conditions. Additionally, we present results testing memory-optimised runtime configurations and vendor provided optimisation libraries. Using Orthrus, a linear solver benchmark developed by AWE, we demonstrate these memory-centric optimisations and their performance implications. We show the growth of OpenMPI memory consumption (demonstrating poor scalability) on both Mellanox and QLogic InfiniBand platforms. We demonstrate a 616× increase in MPI memory consumption for a 64× increase in core count, with a default OpenMPI configuration on Mellanox. Through the use of the Mellanox MXM and QLogic PSM optimisation libraries we are able to observe a 117× and 115× reduction in MPI memory at application memory high water mark. This significantly improves the potential scalability of the code.

Dean G Chester, Steven A Wright, Simon D Hammond, Timothy R Law, Richard P Smedley-Stevenson, Satheesh Maheswaran, Stephen A Jarvis (2019)Full-system modeling and simulation : contributions towards coupling, contention, and I/O

Production machine performance has large variability. On the UK National Supercomputing Service, the time a job takes to complete can vary by as much as 53%. Load imbalance and shared resource contention are largely responsible, but we find that previous efforts to model application/architecture performance do not typically take these into account. In this research we model and simulate network contention, which allows us to explore the impact of multiple interacting jobs and approaches to alleviate these effects, including network re-design and communication-staging within applications. We show the utility of this work on a variety of systems and interacting applications.

S. A. Wright, S. D. Hammond, S. J. Pennycook, I. Miller, J. A. Herdman, S. A. Jarvis (2012)LDPLFS: Improving I/O Performance without Application Modification, In: 2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops6270802pp. 1352-1359 IEEE

Input/Output (I/O) operations can represent a significant proportion of run-time when large scientific applications are run in parallel and at scale. In order to address the growing divergence between processing speeds and I/O performance, the Parallel Log-structured File System (PLFS) has been developed by EMC Corporation and the Los Alamos National Laboratory (LANL) to improve the performance of parallel file activities. Currently, PLFS requires the use of either (i) the FUSE Linux Kernel module, (ii) a modified MPI library with a customised ROMIO MPI-IO library, or (iii) an application rewrite to utilise the PLFS API directly. In this paper we present an alternative method of utilising PLFS in applications. This method employs a dynamic library to intercept the low-level POSIX operations and retarget them to use the equivalents offered by PLFS. We demonstrate our implementation of this approach, named LDPLFS, on a set of standard UNIX tools, as well on as a set of standard parallel I/O intensive mini-applications. The results demonstrate almost equivalent performance to a modified build of ROMIO and improvements over the FUSE-based approach. Furthermore, through our experiments we demonstrate decreased performance in PLFS when ran at scale on the Lustre file system.

Steven A Wright, Simon J Pennycook, Simon D Hammond, Stephen A Jarvis (2011)RIOT : a parallel input/output tracer

Input/output (I/O) operations can represent a significant proportion of the run-time when large scientific applications are run in parallel. Although there have been advances in the form of file-format libraries, file-system design and I/O hardware, a growing divergence exists between the performance of parallel file-systems and compute processing rates. The effect is often a bottleneck when any form of file-system interaction is required. In this paper we present RIOT – an input/output tracing toolkit being developed at the University of Warwick for dynamic attachment to parallel applications. The two-stage tracing process includes a lightweight library to record I/O events and an in-depth post-execution analysis tool to extract performance metrics such as MPI-IO bandwidth, effective POSIX/file-system bandwidth, duration of individual or aggregated time spent in obtaining or releasing file locks and temporal information relating to parallel file activity. We present a case study on the use of RIOT for three standard industry I/O benchmarks: the BT-IO micro-application from NASA’s Parallel Benchmark suite, FLASH-IO, a benchmark which replicates the check-pointing operations of the FLASH thermonuclear star modelling code and IOR, an industry standard I/O benchmark using HDF-5 and MPI-IO. Furthermore, we utilise RIOT to assess these codes when running with the Parallel Log-structured File System (PLFS) middleware developed by the Los Alamos National Laboratory.

R. F. Bird, P. Gillies, M. R. Bareford, J. A. Herdman, S. A. Jarvis (2015)Mini-App Driven Optimisation of Inertial Confinement Fusion Codes, In: Proceedings / IEEE International Conference on Cluster Computing2015-7307681pp. 768-776 IEEE

In September 2013, the large laser-based inertial confinement fusion device housed in the National Ignition Facility at Lawrence Livermore National Laboratory, was widely acclaimed to have achieved a milestone in controlled fusion -- successfully initiating a reaction that resulted in the release of more energy than the fuel absorbed. Despite this success, we remain some distance from being able to create controlled, self-sustaining fusion reactions. Inertial Confinement Fusion (ICF) represents one leading design for the generation of energy by nuclear fusion. Since the 1950s, ICF has been supported by computing simulations, providing the mathematical foundations for pulse shaping, lasers, and material shells needed to ensure effective and efficient implosion. The research presented here focuses on one such simulation code, EPOCH, a fully relativistic particle-in-cell plasma physics code, developed by a leading network of over 30 UK researchers. A significant challenge in developing large codes like EPOCH is maintaining effective scientific delivery on successive generations of high-performance computing architecture. To support this process, we adopt the use of mini-applications -- small code proxies that encapsulate important computational properties of their larger parent counterparts. Through the development of a mini-app for EPOCH (called miniEPOCH), we investigate known time-step scaling issues within EPOCH and explore possible optimisations: (i) Employing loop fission to increase levels of vectorisation, (ii) Enforcing particle ordering to allow the exploitation of domain specific knowledge and, (iii) Changing underlying data storage to improve memory locality. When applied to EPOCH, these improvements represent a 2.02× speed-up in the core algorithm and a 1.55× speed-up to the overall application runtime, when executed on EPCC's Cray XC30 ARCHER platform.

David R Truby, Carlo Bertolli, Steven A Wright, Gheorghe-Teodor Bercea, Kevin O’Brien, Stephen A Jarvis (2018)Implicit mapping of pointers inside C++ Lambda closure objects in OpenMP target offload regions

With the diversification of HPC architectures beyond traditional CPU-based clusters, a number of new frameworks for performance portability across architectures have arisen. One way of implementing such frameworks is to use C++ templates and lambda expressions to design loop-like functions. However, lower level programming APIs that these implementations must use are often designed with C in mind and do not specify how they interact with C++ features such as lambda expressions. This paper proposes a change to the behavior of the OpenMP specification with respect to lambda expressions such that when functions generated by lambda expressions are called inside GPU regions, any pointers used in the lambda expression correctly refer to device pointers. This change has been implemented in a branch of the Clang C++ compiler and demonstrated with two representative codes. Our results show that the implicit mapping of lambda expressions always exhibits identical performance to an explicit mapping but without breaking the abstraction provided by the high level frameworks, and therefore also reduces the burden on the application developer.

H. N. CHOI KEUNG Lim, D. P. Spooner, S. A. Jarvis, G. R. Nudd, LIZHE Wang, WEI Jie (2002)GRID RESOURCE MANAGEMENT INFORMATION SERVICES FOR SCIENTIFIC COMPUTING, In: H P Lee, K Kumar (eds.), Recent Advances In Computational Science And Engineeringpp. 746-749 PUBLISHED BY IMPERIAL COLLEGE PRESS AND DISTRIBUTED BY WORLD SCIENTIFIC PUBLISHING CO

Scientific applications typically have considerable memory and processor requirements. Nevertheless, even with today's fastest supercomputers, the power of available resources falls short of the demands of large, complex simulations and analysis. The solution fortunately lies in the emergence of innovative resource environments such as Grid computing. This paper addresses issues concerning the discovery and monitoring of resources across multiple administrative domains, which can be harnessed by scientific applications. Grid Information Services are an important infrastructure in computational Grid environments as they provide important information about the state and availability of resources. We will present how the Globus Monitoring and Discovery Service (MDS) architecture can be extended to offer a transparent, unified view of Grid resources. Local resource managers ultimately possess the potential to influence where scientific jobs are processed, depending on the availability and load of their scheduler. Consequently, information about the structure and the state of schedulers should be provided to Grid brokers, which will then decide where to schedule the applications. This paper will present the way in which low-level scheduling information is collated from multiple sources and is incorporated into the unified Grid information view. The existence and availability of Grid resource information also allows reliable application performance prediction to be carried out using the Warwick PACE (Performance Analysis and Characterisation Environment) system. Thus, Grid Information Services are a key component in the efficient execution of scientific applications on Grid computing architectures.

David A Bacigalupo, James D Turner, Stephen A Jarvis, G. R Nudd (2003)Modelling dynamic e-business applications using historical performance data

Distributed e-business application platforms such as the web services framework will require sophisticated workload management infrastructures for the matching of client requests to the most appropriate resources. This paper introduces a novel dynamic predictive framework that uses both historical data and analytical performance modelling to extrapolate a requestís predicted performance under varying workloads and conditions on different systems. Such a framework can facilitate a workload manager in maximising the extent to which service level agreements can be met. Initial results are presented analysing how the response times of an e-business benchmark can be predicted at different system loads and on different application servers.

David A Bacigalupo, Stephen A Jarvis, Ligang He, Daniel P Spooner, G. R Nudd (2006)A comparative evaluation of two techniques for predicting the performance of dynamic enterprise systems John von Neumann Institute for Computing (NIC)

This paper is concerned with predicting the response times an enterprise information system would provide on new server architectures. These predictions can allow a workload to be transferred onto new servers whilst maintaining quality of service levels. Two common techniques are solving queu- ing models and extrapolating from previously gathered performance data. The dynamic recalibration of a layered queuing model and a historical model are investigated experimentally using an established distributed enterprise benchmark. The conclusions provide guidelines as to how to select an appropriate technique, including how to dynamically calibrate each model at a low overhead. Using these guidelines it is shown that both techniques can make low overhead predictions for new server architectures at a good level of predictive accuracy (typically over 80%).

Xinuo Chen, Stephen A Jarvis (2009)Analysing BitTorrent's seeding strategies IEEE

BitTorrent is a typical peer-to-peer (P2P) file distribution application that has gained tremendous popularity in recent years. A considerable amount of research exists regarding BitTorrent’s choking algorithm, which has proved to be effective in preventing freeriders. However, the effect of the seeding strategy on the resistance to freeriders in BitTorrent has been largely overlooked. In addition to this, a category of selfish leechers (termed exploiters), who leave the overlay immediately after completion, has never been taken into account in the previous research. In this paper two popular seeding strategies, the Original Seeding Strategy (OSS) and the Time- based Seeding Strategy (TSS), are chosen and we study via mathematical models and simulation their effects on freeriders and exploiters in BitTorrent networks. The mathematical model is verified and we discover that both freeriders and exploiters impact on system performance, despite the seeding strategy that is employed. However, a selfish-leechers threshold is identified; once the threshold is exceeded, we find that TSS outperforms OSS – that is, TSS reduces the negative impact of selfish lechers more effectively than OSS. Based on these results we discuss the choice of seeding strategy and speculate as to how more effective BitTorrent-based file distribu- tion applications can be built.

M Al-Ghamdi, Adam P Chester, Stephen A Jarvis (2010)Predictive and dynamic resource application for enterprise applications IEEE

Dynamic resource allocation has the potential to provide significant increases in total revenue in enterprise systems through the reallocation of available resources as the demands on hosted applications change over time. This paper investigates the combination of workload prediction algorithms and switching policies: the former aim to forecast the workload associated with Internet services, the latter switch resources between applications according to certain system criteria. An evaluation of two well known switching policies – the proportional switching policy (PSP) and the bottleneck aware switching policy (BSP) – is conducted in the context of seven workload prediction algorithms. This study uses real-world workload traces consisting of approximately 3.5M requests, and models a multi-tiered, cluster-based, multi-server solution. The results show that a combination of the bottleneck aware switching policy and workload predictions based on an autoregressive, integrated, moving-average model can improve system revenue by as much as 43%.

Adam P Chester, J. W. J Xue, Ligang He, Stephen A Jarvis (2008)A system for dynamic resource allocation in application server clusters UKPEW

Application server clusters are often used to service high-throughput web applications. In order to host more than a single application, an organization will usually procure a separate cluster for each application. Over time the utilization of the two clusters will vary, leading to variation in the response times experienced by users of the applications. Techniques that statically assign servers to each application prevent the system from adapting to changes in the workload, and are thus suscep- tible to providing unacceptable levels of service. This paper investigates a system for allocating server resources to applications dynamically, thus allowing applications to automatically adapt to variable workloads. Such a scheme requires meticulous system monitoring, a method for switching application servers between server pools and a means of calculating when a server switch should be made (balancing switching cost against perceived benefits). Experimentation is performed using such a switching system on a Web application testbed hosting two applications across eight application servers. The test bed is used to compare several theoretically derived switching policies. The Average Flow switching policy is shown to provide the best policy, when considering the mean response times for this application.

Michael Luck, Mike Joy, Stephen Jarvis (2017)Processes and devices, In: Introducing UNIX and Linux Bloomsbury Publishing Plc
O. F. J. Perks, D. A. Beckingsale, S. D. Hammond, I. Miller, J. A. Herdman, A. Vadgama, A. H. Bhalerao, L. He, S. A. Jarvis, (2013)Towards Automated Memory Model Generation Via Event Tracing, In: Computer journal56(2)pp. 156-174 Oxford Univ Press

The importance of memory performance and capacity is a growing concern for high performance computing laboratories around the world. It has long been recognized that improvements in processor speed exceed the rate of improvement in dynamic random access memory speed and, as a result, memory access times can be the limiting factor in high performance scientific codes. The use of multi-core processors exacerbates this problem with the rapid growth in the number of cores not being matched by similar improvements in memory capacity, increasing the likelihood of memory contention. In this paper, we present , a lightweight memory tracing tool and analysis framework for parallel codes, which is able to identify peak memory usage and also analyse per-function memory use over time. An evaluation of , in terms of its effectiveness and also its overheads, is performed using nine established scientific applications/benchmark codes representing a variety of programming languages and scientific domains. We also show how can be used to automatically generate a parameterized memory model for one of these applications, a two-dimensional non-linear magnetohydrodynamics application, . Through the memory model we are able to identify an unexpected growth term which becomes dominant at scale. With a refined model we are able to predict memory consumption with under 7% error.

S. A. Wright, S. D. Hammond, S. J. Pennycook, R. F. Bird, J. A. Herdman, I. Miller, A. Vadgama, A. Bhalerao, S. A. Jarvis (2013)Parallel File System Analysis Through Application I/O Tracing, In: Computer journal56(2)pp. 141-155 Oxford Univ Press

Input/Output (I/O) operations can represent a significant proportion of the run-time of parallel scientific computing applications. Although there have been several advances in file format libraries, file system design and I/O hardware, a growing divergence exists between the performance of parallel file systems and the compute clusters that they support. In this paper, we document the design and application of the RIOT I/O toolkit (RIOT) being developed at the University of Warwick with our industrial partners at the Atomic Weapons Establishment and Sandia National Laboratories. We use the toolkit to assess the performance of three industry-standard I/O benchmarks on three contrasting supercomputers, ranging from a mid-sized commodity cluster to a large-scale proprietary IBM BlueGene/P system. RIOT provides a powerful framework in which to analyse I/O and parallel file system behaviour-we demonstrate, for example, the large file locking overhead of IBM's General Parallel File System, which can consume nearly 30% of the total write time in the FLASH-IO benchmark. Through I/O trace analysis, we also assess the performance of HDF-5 in its default configuration, identifying a bottleneck created by the use of suboptimal Message Passing Interface hints. Furthermore, we investigate the performance gains attributed to the Parallel Log-structured File System (PLFS) being developed by EMC Corporation and the Los Alamos National Laboratory. Our evaluation of PLFS involves two high-performance computing systems with contrasting I/O backplanes and illustrates the varied improvements to I/O that result from the deployment of PLFS (ranging from up to 25x speed-up in I/O performance on a large I/O installation to 2x speed-up on the much smaller installation at the University of Warwick).

Istvan Z. Reguly, Andrew M. B. Owenson, Archie Powell, Stephen A. Jarvis, Gihan R. Mudalige (2021)Under the Hood of SYCL - An Initial Performance Analysis with An Unstructured-Mesh CFD Application, In: B L Chamberlain, A L Varbanescu, H Ltaief, P Luszczek (eds.), HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 202112728pp. 391-410 Springer Nature

As the computing hardware landscape gets more diverse, and the complexity of hardware grows, the need for a general purpose parallel programming model capable of developing (performance) portable codes have become highly attractive. Intel's OneAPI suite, which is based on the SYCL standard aims to fill this gap using a modern C++ API. In this paper, we use SYCL to parallelize MGCFD, an unstructured-mesh computational fluid dynamics (CFD) code, to explore current performance of SYCL. The code is benchmarked on several modern processor systems from Intel (including CPUs and the latest Xe LP GPU), AMD, ARM and Nvidia, making use of a variety of current SYCL compilers, with a particular focus on OneAPI and how it maps to Intel's CPU and GPU architectures. We compare performance with other parallelizations available in OP2, including SIMD, OpenMP, MPI and CUDA. The results are mixed; the performance of this class of applications, when parallelized with SYCL, highly depends on the target architecture and the compiler, but in many cases comes close to the performance of currently prevalent parallel programming models. However, it still requires different parallelization strategies or code-paths be written for different hardware to obtain the best performance.

Victoria Houlden, Joao Porto de Albuquerque, Scott Weich, Stephen Jarvis (2021)Does nature make us happier? A spatial error model of greenspace types and mental wellbeing, In: Environment and planning. B, Urban analytics and city science48(4)2399808319887395pp. 655-670 Sage

Exposure to nature is thought to benefit mental health and wellbeing. However, many studies consider greenspace as a single entity, which overlooks the potential significance of the various forms of greenspace, and natural greenspaces in particular. This study was designed to investigate the association between different types of greenspace and mental wellbeing. Drawing wellbeing and socioeconomic data from the Annual Population Survey (2012-2015), and shapefiles from the Greenspace Information for Greater London group, the amount of greenspace accessible within a 300 m walk of individual's postcodes was calculated, and categorised according to type. Spatial Error Models were used to account for spatial patterns in the data. Natural greenspace was significantly associated with improved life satisfaction (B = 0.028, p < 0.001) and happiness (B = 0.023, p = 0.019) scores. The spatial autoregressive parameter (lambda) was small but significant (p < 0.001), implying slight second-order spatial variation in the model. These results imply that natural areas may be more important for hedonic mental wellbeing than other greenspaces. Future research is needed on exploring causal relationships between exposure to greenspace and mental wellbeing outcomes.

David A. Bacigalupo, James W.J. Xue, Simon D. Hammond, Stephen A. Jarvis, Donna N. Dillenberger, Graham R. Nudd (2007)Predicting the effect on performance of container-managed persistence in a distributed enterprise application, In: Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM4228311pp. 1-8
Guang Tan, Stephen A. Jarvis (2008)A payment-based incentive and service differentiation scheme for peer-to-peer streaming broadcast, In: IEEE transactions on parallel and distributed systems19(7)pp. 940-953 IEEE

We propose a novel payment-based incentive scheme for peer-to-peer (P2P) live media streaming. Using this approach, peers earn points by forwarding data to others. The data streaming is divided into fixed-length periods; during each of these periods, peers compete with each other for good parents (data suppliers) for the next period in a first-price-auction-like procedure using their points. We design a distributed algorithm to regulate peer competitions and consider various individual strategies for parent selection from a game-theoretic perspective. We then discuss possible strategies that can be used to maximize a peer's expected media quality by planning different bids for its substreams. Finally, in order to encourage off-session users to remain online and continue contributing to the network, we develop an optimal data forwarding strategy that allows peers to accumulate points that can be used in future services. Simulation results show that the proposed methods effectively differentiate the media qualities received by peers making different contributions (which originate from, for example, different forwarding bandwidths or servicing times) and at the same time maintain high overall system performance.

Andrew Owenson, Steven Wright, Richard Bunt, Stephen Jarvis, Yoon Ho, Matthew Street (2018)Developing and Using a Geometric Multigrid, Unstructured Grid Mini-Application to Assess Many-Core Architectures, In: Proceedings - 26th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2018pp. 68-76
S. A. Jarvis (2011)UK performance engineering workshop 2010, In: Computer journal54(6)pp. 960-961
Nataliya Tkachenko, Rob Procter, Stephen Jarvis (2021)Quantifying people's experience during flood events with implications for hazard risk communication, In: PloS one16(1)0244801 Public Library Science

Semantic drift is a well-known concept in distributional semantics, which is used to demonstrate gradual, long-term changes in meanings and sentiments of words and is largely detectable by studying the composition of large corpora. In our previous work, which used ontological relationships between words and phrases, we established that certain kinds of semantic micro-changes can be found in social media emerging around natural hazard events, such as floods. Our previous results confirmed that semantic drift in social media can be used to for early detection of floods and to increase the volume of 'useful' geo-referenced data for event monitoring. In this work we use deep learning in order to determine whether images associated with 'semantically drifted' social media tags reflect changes in crowd navigation strategies during floods. Our results show that alternative tags can be used to differentiate naive and experienced crowds witnessing flooding of various degrees of severity.

Guang Tan, Stephen A. Jarvis (2006)Prediction of short-lived TCP transfer latency on bandwidth asymmetric links, In: Journal of computer and system sciences72(7)pp. 1201-1210

TCP over bandwidth asymmetric networks such as Cable TV and Asymmetric Digital Subscriber Loop (ADSL) exhibits different characteristics from TCP on symmetric links. A number of techniques have been proposed to address this problem. However, previous research has been largely focused on bulk transfers. This paper investigates the effects of bandwidth asymmetry on Web-like short-lived transfers. Two prediction models, one given in a closed form and the other in a recursive form, are presented for TCP transfers without and with ACK Filtering (AF), a representative optimizing technique for TCP transfers over bandwidth asymmetric links. Ns-2 based experiments show that these models can predict TCP transfer latency with a high degree of accuracy. © 2006 Elsevier Inc. All rights reserved.

Dominic A. S. Brown, Matthew T. Bettencourt, Steven A. Wright, Satheesh Maheswaran, John P. Jones, Stephen A. Jarvis, (2021)Higher-order particle representation for particle-in-cell simulations, In: Journal of computational physics435110255 Elsevier

In this paper we present an alternative approach to the representation of simulation particles for unstructured electrostatic and electromagnetic PIC simulations. In our modified PIC algorithm we represent particles as having a smooth shape function limited by some specified finite radius, r(0). A unique feature of our approach is the representation of this shape by surrounding simulation particles with a set of virtual particles with delta shape, with fixed offsets and weights derived from Gaussian quadrature rules and the value of r(0). As the virtual particles are purely computational, they provide the additional benefit of increasing the arithmetic intensity of traditionally memory bound particle kernels. The modified algorithm is implemented within Sandia National Laboratories' unstructured EMPIRE-PIC code, for electrostatic and electromagnetic simulations, using periodic boundary conditions. We show results for a representative set of benchmark problems, including electron orbit, a transverse electromagnetic wave propagating through a plasma, numerical heating, and a plasma slab expansion. Good error reduction across all of the chosen problems is achieved as the particles are made progressively smoother, with the optimal particle radius appearing to be problem-dependent. Crown Copyright (c) 2021 Published by Elsevier Inc. All rights reserved.

S. D. Hammond, J. A. Smith, G. R. Mudalige, S. A. Jarvis (2009)Predictive Simulation of HPC Applications, In: 2009 INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS5076178pp. 33-40 IEEE

The architectures which support modem supercomputing machinery are as diverse today, as at any point during the last twenty years. The variety of processor core arrangements, threading strategies and the arrival of heterogeneous computation nodes are driving modern-day solutions to petaflop speeds. The increasing complexity of such systems, as well as codes written to take advantage of the new computational abilities, pose significant frustrations for existing techniques which aim to model and analyse the performance of such hardware and software. In this paper we demonstrate the use of post-execution analysis on trace-based profiles to support the construction of simulation-based models. This involves combining the runtime capture of call-graph information with computational timings, which in turn allows representative models code behaviour to be extracted. The main advantage of this technique is that it largely automates performance model development, a burden associated with existing techniques. We demonstrate the capabilities of our approach using both the NAS Parallel Benchmark suite and a real-world supercomputing benchmark developed by the United Kingdom Atomic Weapons Establishment. The resulting models, developed in less than two hours per code, have a good degree of predictive accuracy. We also show how one of these models can be used to explore the performance of the code on over 16,000 cores, demonstrating the scalability of our solution.

David A. Bacigalupo, Stephen A. Jarvis, Ligang He, Daniel P. Spooner, Donna N. Dillenberger, Graham R. Nudd (2005)An investigation into the application of different performance prediction methods to distributed enterprise applications, In: The Journal of supercomputing34(2)pp. 93-111
Elth Ogston, Stephen A. Jarvis (2008)Improving the accuracy of peer-to-peer sampling services, In: The Third International Multi-Conference on Computing in the Global Information Technology : July 27, 2008-August 1, 2008, Athens, Greece4591376pp. 250-255
Jonathan M. R. Byrd, Stephen A. Jarvis, Abhir H. Bhalerao (2008)Reducing the run-time of MCMC programs by multithreading on SMP architectures, In: 2008 IEEE International Symposium on Parallel and Distributed Processingpp. 2242-2249 IEEE

The increasing availability of multi-core and multiprocessor architectures provides new opportunities for improving the performance of many computer simulations. Markov Chain Monte Carlo (MCMC) simulations are widely used for approximate counting problems, Bayesian inference and as a means for estimating very high-dimensional integrals. As such MCMC has found a wide variety of applications in fields including computational biology and physics, financial econometrics, machine learning and image processing. This paper presents a new method for reducing the run-time of Markov Chain Monte Carlo simulations by using SMP machines to speculatively perform iterations in parallel, reducing the runtime of MCMC programs whilst producing statistically identical results to conventional sequential implementations. We calculate the theoretical reduction in runtime that may be achieved using our technique under perfect conditions, and test and compare the method on a selection of multi-core and multi-processor architectures. Experiments are presented that show reductions in runtime of 35% using two cores and 55% using four cores.

A. P. Chester, J. W. J. Xue, L. He, S. A. Jarvis (2008)A System for Dynamic Server Allocation in Application Server Clusters, In: 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications : 10-12 December 2008pp. 130-139 IEEE

Application server clusters are often used to service high-throughput web applications. In order to host more than a single application, an organisation will usually procure a separate cluster for each application. Over time the utilisation of the clusters will vary, leading to variation in the response times experienced by users of the applications. Techniques that statically assign servers to each application prevent the system from adapting to changes in the workload, and are thus susceptible to providing unacceptable levels of service. This paper investigates a system for allocating server resources to applications dynamically, thus allowing applications to automatically adapt to variable workloads. Such a scheme requires meticulous system monitoring, a method for switching application servers between server pools and a means of calculating when a server switch should be made (balancing switching cost against perceived benefits). Experimentation is performed using such a switching system on a Web application testbed hosting two applications across eight application servers. The testbed is used to compare several theoretically derived switching policies under a variety of workloads. Recommendations are made as to the suitability of different policies under different workload conditions.

Wilson M. Tan, Stephen A. Jarvis (2012)Blurring the computation-communication divide: Extraneous memory accesses and their effects on MPI intranode communications, In: 2012 Imperial College Computing Student Workshop, ICCSW 201228pp. 135-141
David Beckingsale, Wayne Gaudin, Andy Herdman, Stephen Jarvis (2015)Resident Block-Structured Adaptive Mesh Refinement on Thousands of Graphics Processing Units, In: 2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP)2015-7349561pp. 61-70 IEEE

Block-structured adaptive mesh refinement (AMR) is a technique that can be used when solving partial differential equations to reduce the number of cells necessary to achieve the required accuracy in areas of interest. These areas (shock fronts, material interfaces, etc.) are recursively covered with finer mesh patches that are grouped into a hierarchy of refinement levels. Despite the potential for large savings in computational requirements and memory usage without a corresponding reduction in accuracy, AMR adds overhead in managing the mesh hierarchy, adding complex communication and data movement requirements to a simulation. In this paper, we describe the design and implementation of a resident GPU-based AMR library, including: the classes used to manage data on a mesh patch, the routines used for transferring data between GPUs on different nodes, and the data-parallel operators developed to coarsen and refine mesh data. We validate the performance and accuracy of our implementation using three test problems and two architectures: an 8 node cluster, and 4,196 nodes of Oak Ridge National Laboratory's Titan supercomputer. Our GPU-based AMR hydrodynamics code performs up to 4.87x faster than the CPU-based implementation, and is scalable on 4,196 K20x GPUs using a combination of MPI and CUDA.

Malik Shahzad K. Awan, Stephen A. Jarvis (2010)Understanding resource provisioning for ClimatePrediction.net, In: Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-20105578267pp. 2532-2539
Guang Tan, Stephen A. Jarvis (2007)Stochastic Analysis and Improvement of the Reliability of DHT-based Multicast, In: Annual Joint Conference of the IEEE Computer and Communications Societies4215836pp. 2198-2206 IEEE

This paper investigates the reliability of application-level multicast based on a distributed hash table (DHT) in a highly dynamic network. Using a node residual lifetime model, we derive the stationary end-to-end delivery ratio of data streaming between a pair of nodes in the worst case, and show through numerical examples that in a practical DHT network, this ratio can be very low (e.g., less than 50%). Leveraging the property of heavy-tailed lifetime distribution, we then consider three optimizing techniques, namely Senior Member Overlay (SMO), Longer-Lived Neighbor Selection (LNS), and Reliable Route Selection (RRS), and present quantitative analysis of data delivery reliability under these schemes. In particular, we discuss the tradeoff between delivery ratio and the load imbalance among nodes. Simulation experiments are also used to evaluate the multicast performance under practical settings. Our model and analytic results provide useful tools for reliability analysis for other overlay-based applications (e.g., those involving persistent data transfers).

Junwei Cao, Daniel P. Spooner, Stephen A. Jarvis, Subhash Saini, Graham R. Nudd (2003)Agent-based grid load balancing using performance-driven task scheduling, In: 17th International Parallel and Distributed Processing Symposium1213139
Timothy R. Law, Jonny Hancox, Tammy M. K. Cheng, Raphael A. G. Chaleil, Steven A. Wright, Paul A. Bates, Stephen A. Jarvis (2016)Optimisation of a Molecular Dynamics Simulation of Chromosome Condensation, In: Proceedings (Symposium on Computer Architecture and High Performance Computing)pp. 126-133 IEEE

We present optimisations applied to a bespoke biophysical molecular dynamics simulation designed to investigate chromosome condensation. Our primary focus is on domainspecific algorithmic improvements to determining short-range interaction forces between particles, as certain qualities of the simulation render traditional methods less effective. We implement tuned versions of the code for both traditional CPU architectures and the modern many-core architecture found in the Intel Xeon Phi coprocessor and compare their effectiveness. We achieve speed-ups starting at a factor of 10 over the original code, facilitating more detailed and larger-scale experiments.

Henry Crosby, Theodore Damoulas, Stephen A. Jarvis (2019)Embedding road networks and travel time into distance metrics for urban modelling, In: International journal of geographical information science : IJGIS33(3)pp. 512-536 Taylor & Francis

Urban environments are restricted by various physical, regulatory and customary barriers such as buildings, one-way systems and pedestrian crossings. These features create challenges for predictive modelling in urban space, as most proximity-based models rely on Euclidean (straight line) distance metrics which, given restrictions within the urban landscape, do not fully capture spatial urban processes. Here, we argue that road distance and travel time provide effective alternatives, and we develop a new low-dimensional Euclidean distance metric based on these distances using an isomap approach. The purpose of this is to produce a valid covariance matrix for Kriging. Our primary methodological contribution is the derivation of two symmetric dissimilarity matrices ( and ), with which it is possible to compute low-dimensional Euclidean metrics for the production of a positive definite covariance matrix with commonly utilised kernels. This new method is implemented into a Kriging predictor to estimate house prices on 3,669 properties in Coventry, UK. We find that a metric estimating a combination of road distance and travel time, in both and , produces a superior house price predictor compared with alternative state-of-the-art methods, that is, a standard Euclidean metric in and a non-restricted road distance metric in and . F

S. D. Hammond, G. R. Mudalige, J. A. Smith, S. A. Jarvis, J. A. Herdman, A. Vadgama (2009)WARPP - A toolkit for simulating high-performance parallel scientific codes, In: SIMUTools 2009 - 2nd International ICST Conference on Simulation Tools and Techniques
M. Al-Ghamdi, A. P. Chester, L. He, S. A. Jarvis (2011)Dynamic resource allocation and active predictive models for enterprise applications, In: CLOSER 2011 - Proceedings of the 1st International Conference on Cloud Computing and Services Sciencepp. 551-562
Ligang He, Stephen A. Jarvis, Daniel P. Spooner, Graham R. Nudd (2004)Dynamic, capability-driven scheduling of DAG-based real-time jobs in heterogeneous clusters, In: International journal of high performance computing and networking2(2-4)pp. 165-177
Guang Tan, Stephen A. Jarvis (2006)A payment-based incentive and service differentiation mechanism for peer-to-peer streaming broadcast, In: 2006 14TH IEEE INTERNATIONAL WORKSHOP ON QUALITY OF SERVICE, PROCEEDINGS4015732pp. 41-50 IEEE

We proposes a novel payment-based incentive mechanism for peer-to-peer (M) live media streaming. Using this approach, peers earn points by forwarding data to others; the data streaming is divided into fixed length periods, during each of which peers compete with each other for good parents (data suppliers) for the next period in a first-price auction like procedure using their points. We design a distributed algorithm to regulate peer competitions, and consider various individual strategies for parent selection from a game theoretic perspective. We then discuss possible strategies that can be used to maximize a peer's expected media quality by planning different bids for its substreams. Finally, in order to encourage off-session users to keep staying online and continue contributing to the network, we develop an optimal data forwarding strategy that allows peers to accumulate points that can be used in future services. Simulations results show that proposed methods effectively differentiate the media qualities received by peers making different contributions (which originate from, for example, different forwarding bandwidths or servicing times), and at the same time maintaining a high system-wide performance.

David A. Bacigalupo, Stephen A. Jarvis, Ligang He, Graham R. Nudd (2004)An Investigation into the application of different performance prediction techniques to e-Commerce applications, In: Parallel and Distributed Processing: Proceedings, 18th International Symposium, Santa Fe, NM, 200418pp. 3395-3402
Ligang He, Stephen A. Jarvis, Daniel P. Spooner, David Bacigalupo, Guang Tan, Graham R. Nudd (2005)Mapping DAG-based applications to multiclusters with background workload, In: 2005 IEEE International Symposium on Cluster Computing and the Grid : May 9-12, 2005, Cardiff, Wales21558651pp. 855-862
Wilson M. Tan, Stephen A. Jarvis (2013)Energy Harvesting Noise Pollution Sensing WSN Mote: Survey of Capabilities and Limitations, In: 2013 IEEE Conference on Wireless Sensor (ICWISE)pp. 53-60 IEEE

Wireless sensor networks (WSNs) are potentially a boon to the implementation and enforcement of noise codes which now exist for many major cities all over the world. Unfortunately, the high energy required by the noise measurement process and the reliance of sensor motes on batteries make the management of noise sensing WSNs cumbersome. Giving motes energy harvesting (EH) capabilities could alleviate such a problem, and several EH WSNs have already been demonstrated. Nevertheless, the high frequency nature of the data required to measure noise places significant additional challenges to the design of EH WSNs. In this paper, we characterize the capabilities and limitations of a WSN mote designed to measure noise. We identify, through specification analysis and experimentation, the performance-limiting factors in each step of the noise measurement process. These steps include sound gathering, data processing, and result transmission. The prospect of the entire process being powered by energy-=harvesting means is also evaluated. For each step, we also discuss and recommend measures that would help improve the overall system performance.

Michael G. Hinchey, Stephen A. Jarvis (1993)Protocol specification & verification methods: An overview, In: Communications and Networks for the Year Two Thousand: Proceedings of SICON - ICIE '932515654pp. 581-585
Guang Tan, Stephen A. Jarvis (2007)Improving the fault resilience of overlay multicast for media streaming, In: IEEE transactions on parallel and distributed systems18(6)pp. 721-734
James W. J. Xue, Adam P. Chester, Ligang He, Stephen A. Jarvis (2008)Dynamic Resource Allocation in Enterprise Systems, In: M Hobbs, Y Xiang, W Zhou (eds.), 2008 14th IEEE International Conference on Parallel and Distributed Systems : 8-10 December 20084724321pp. 203-212 IEEE

It is common that Internet service hosting centres use several logical pools to assign server resources to different applications, and that they try to achieve the highest total revenue by making efficient use of these resources. In this paper, multi-tiered enterprise systems are modelled as multi-class closed queueing networks, with each network station corresponding to each application tier. In such queueing networks, bottlenecks can limit overall system performance, and thus should be avoided. We propose a bottleneck-aware server switching policy, which responds to system bottlenecks and switches servers to alleviate these problems as necessary. The switching engine compares the benefits and penalties of a potential switch, and makes a decision as to whether it is likely to be worthwhile switching. We also propose a simple admission control scheme, in addition to the switching policy, to deal with system overloading and optimise the total revenue of multiple applications in the hosting centre. Performance evaluation has been done via simulation and results are compared with those from a proportional switching policy and also a system that implements no switching policy. The experimental results show that the combination of the bottleneck-aware switching policy and the admission control scheme consistently outperforms the other two policies in terms of revenue contribution.

Guang Tan, Stephen A. Jarvis, Daniel R. Spooner (2006)Improving the fault resilience of overlay multicast for media streaming, In: DSN 2006 INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS20061633544pp. 558-567 IEEE

This paper addresses the problem of fault resilience of overlay-based live media streaming from two aspects: (1) how to construct a stable multicast tree that minimizes the negative impact of frequent member departures on existing overlay, and (2) how to efficiently recover from packet errors caused by end-system or network failures. In particular, this paper makes two contributions: (1) A distributed Reliability-Oriented Switching Tree (ROST) algorithm that minimizes the failure correlation among tree nodes. By exploiting both bandwidth and time properties, the algorithm constructs a more reliable multicast tree than existing algorithms that solely minimize tree depth, while not compromising the quality of the tree in terms of service delay and incurring only a small protocol overhead; (2) A simple Cooperative Error Recovery (CER) protocol that helps recover from packet errors efficiently. Recognizing that a single recovery source is usually incapable of providing timely delivery of the lost data, the protocol recovers from data outages using the residual bandwidths from multiple sources, which are identified using a minimum-loss-correlation algorithm. Extensive simulations are conducted to demonstrate the effectiveness of the proposed schemes.

David Truby, Steven A. Wright, Robert Kevis, Satheesh Maheswaran, J. A. Herdman, Stephen Jarvis (2018)BookLeaf: An Unstructured Hydrodynamics Mini-application Workshop paper: WRAp 2018, In: 2018 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)pp. 615-622 IEEE

With the age of Exascale computing causing a diversification away from traditional CPU-based homogeneous clusters, it is becoming increasingly difficult to ensure that computationally complex codes are able to run on these emerging architectures. This is especially important for large physics simulations that are themselves becoming increasingly complex and computationally expensive. One proposed solution to the problem of ensuring these applications can run on the desired architectures is to develop representative mini-applications that are simpler and so can be ported to new frameworks more easily, but which are also representative of the algorithmic and performance characteristics of the original applications. In this paper we present BookLeaf, an unstructured Arbitrary Lagrangian-Eulerian mini-application to add to the suite of representative applications developed and maintained by the UK Mini-App Consortium (UK-MAC). First, we outline the reference implementation of our application in Fortran. We then discuss a number of alternative implementations using a variety of parallel programming models and discuss the issues that arise when porting such an application to new architectures. To demonstrate our implementation, we present a study of the performance of BookLeaf on number of platforms using alternative designs, and we document a scaling study showing the behaviour of the application at scale.

James W.J. Xue, Ligang He, Stephen A. Jarvis (2007)A scheduling algorithm for revenue maximisation for cluster-based internet services, In: The 13th International Conference on Parallel and Distributed Systems, ICPADS24447841pp. 853-860
Wilson M. Tan, Stephen A. Jarvis (2014)On the design of an energy-harvesting noise-sensing WSN mote, In: EURASIP journal on wireless communications and networking2014(1)167pp. 1-18 Springer Nature

Wireless sensor networks (WSNs) could potentially help in the measurement and monitoring of noise levels, an important step in mitigating and fighting noise pollution. Unfortunately, the high energy required by the noise measurement process and the reliance of sensor motes on batteries make the management of noise-sensing WSNs cumbersome. Giving motes energy harvesting (EH) capabilities could alleviate such a problem, and several EH-WSNs have already been demonstrated. Nevertheless, the high-frequency nature of the data required to measure noise places significant additional challenges to the design of EH-WSNs. In this paper, we present a design and prototype for a mote extension which enables the mote to detect noise levels while being powered by energy harvesting. The noise level detection carried out by the system relies primarily on the concept of peak detection. Results of performance testing are presented. Aside from the hardware design and prototype, we also discuss methods of assigning charge times for application scenarios where there are multiple pulse loads. We also propose a new opportunistic method for charge time determination. Experiments demonstrate that the new method could improve analytically-derived duty cycles by at least 350%.

Nataliya Tkachenko, Stephanie Bricker, Stephen A. Jarvis (2018)To dig or not to dig? Place and perception in subsurface housing, In: Proceedings of the Institution of Civil Engineers. Engineering sustainability171(4)pp. 211-220 Ice Publishing

Cities of the future are envisioned to be fully optimised, owing to technological advancements, distributed sensor networks and automation. With the proliferation of new data sources, opportunities also exist for better understanding of how people act and make decisions, as well as discerning the conditions in which they wish to live and what they expect from their surrounding environment. Following the recently proposed normative strand in urban planning, this study uses distributed personal underground development as a case study for extracting the values behind this emerging self-build movement, alongside observers' opinions obtained from associated web-based data.

David A. Bacigalupo, Stephen A. Jarvis, Ligang He, Daniel P. Spooner, Graham R. Nudd (2005)Comparing layered queuing and historical performance models of a distributed enterprise application, In: Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks, as part of the 23rd IASTED International Multi-Conference on Applied Informatics456-136pp. 608-613
Sarunkorn Chotvijit, Malkiat Thiarai, Stephen A. Jarvis (2019)A Study of Data Continuity in Adult Social Care Services, In: The British journal of social work49(3)pp. 762-786
Tongcheng Guo, Stephen A. Jarvis (2008)DIAO: A scheme of cooperative data distribution for grid-enabled data analysis applications, In: The Third International Multi-Conference on Computing in the Global Information Technology : July 27, 2008-August 1, 2008, Athens, Greece4591369pp. 200-209
Junwei Cao, Daniel P. Spooner, James D. Turner, Stephen A. Jarvis, Darren J. Kerbyson, Subhash Saini, Graham R. Nudd (2002)Agent-based resource management for grid computing, In: Cluster Computing and the Grid (CCGrid 2002): 2nd IEEE International Symposium1540487pp. 350-351
Stephen A. Jarvis, Ligang He, Daniel P. Spooner, Graham R. Nudd (2005)The impact of predictive inaccuracies on execution scheduling, In: Performance evaluation60(1-4)pp. 127-139

This paper investigates the underlying impact of predictive inaccuracies on execution scheduling, with particular reference to execution time predictions. This study is conducted from two perspectives: from that of job selection and from that of resource allocation, both of which are fundamental components in execution scheduling. A new performance metric, termed the degree of misperception, is introduced to express the probability that the predicted execution times of jobs display different ordering characteristics from their real execution times due to inaccurate prediction. Specific formulae are developed to calculate the degree of misperception in both job selection and resource allocation scenarios. The parameters which influence the degree of misperception are also extensively investigated. The results presented in this paper are of significant benefit to scheduling approaches that take into account predictive data; the results are also of importance to the application of these scheduling techniques to real-world high-performance systems. © 2004 Elsevier B.V. All rights reserved.

Guang Tan, Stephen A. Jarvis, Xinuo Chen, Daniel P. Spooner (2006)Performance Analysis and Improvement of Overlay Construction for Peer-to-Peer Live Streaming, In: Simulation (San Diego, Calif.)82(2)pp. 93-106
Elth Ogston, Stephen A. Jarvis (2010)Peer-to-peer aggregation techniques dissected, In: International journal of parallel, emergent and distributed systems25(1)pp. 51-71 Taylor & Francis

Aggregation is the process of gathering and combining information from a number of sources. In peer-to-peer systems, aggregation is a basic component of a range of applications, including monitoring and complex-query resolution. Peer-to-peer aggregation services themselves are dependent on a number of other fundamental peer-to-peer services - directories, multicasting and system-size approximation. The overall performance characteristics of an aggregation service are affected by the chosen implementation method for these underlying services. To illustrate this relationship, aggregation techniques for internet-based peer-to-peer systems are surveyed and dissected into their component parts. We further consider the problem of running one-off aggregation queries in a peer-to-peer network. A new aggregation service, Bliksum, which uses a novel combination of underlying services, is introduced. Bliksum employs unstructured peer-to-peer techniques for node sampling, multicasting and system-size approximation, in combination with a method of building a temporary tree structure for aggregation itself. Unstructured peer-to-peer techniques have been shown to be highly resilient to node churn, avoiding the problem inherent in structured systems of maintaining the desired structure when the set of nodes changes rapidly. We present experiments showing that Bliksum retains these advantages while reducing communications cost and reducing information loss compared to pure gossip-based aggregation.

Jonathan M.D. Hill, Stephen A. Jarvis, Constantinos Siniolakis, Vasil P. Vasilev (1998)Analysing an SQL application with a BSPlib call-graph profiling tool, In: Euro-Par 1998 Parallel Processing - 4th International Conference, Proceedings1470pp. 157-164
Junwei Cao, Stephen A. Jarvis, Subhash Saini, Darren J. Kerbyson, Graham R. Nudd (2002)ARMS: An agent-based resource management system for grid computing, In: Scientific programming10(2)pp. 135-148
Dominic A. S. Brown, Steven A. Wright, Stephen A. Jarvis (2018)Performance of a Second Order Electrostatic Particle-in-Cell Algorithm on Modern Many-Core Architectures, In: Electronic notes in theoretical computer science340pp. 67-84 Elsevier

In this paper we present the outline of a novel electrostatic, second order Particle-in-Cell (PIC) algorithm, that makes use of 'ghost particle& located around true particle positions in order to represent a charge distribution. We implement our algorithm within EMPIRE-PIC, a PIC code developed at Sandia National Laboratories. We test the performance of our algorithm on a variety of many-core architectures including NVIDIA GPUs, conventional CPUs, and Intel's Knights Landing. Our preliminary results show the viability of second order methods for PIC applications on these architectures when compared to previous generations of many-core hardware. Specifically, we see an order of magnitude improvement in performance for second order methods between the Tesla K20 and Tesla P100 GPU devices, despite only a 4x improvement in the theoretical peak performance between the devices. Although these initial results show a large increase in runtime over first order methods, we hope to be able to show improved scaling behaviour and increased simulation accuracy in the future.

Ligang He, Stephen A. Jarvis, Daniel P. Spooner, Graham R. Nudd (2004)Optimising static workload allocation in multiclusters, In: Parallel and Distributed Processing: Proceedings, 18th International Symposium, Santa Fe, NM, 200418pp. 531-540
Victoria Houlden, Scott Weich, Stephen Jarvis (2017)A cross-sectional analysis of green space prevalence and mental wellbeing in England, In: BMC public health17(1)460pp. 460-9 Springer Nature

Background: With urbanisation increasing, it is important to understand how to design changing environments to promote mental wellbeing. Evidence suggests that local-area proportions of green space may be associated with happiness and life satisfaction; however, the available evidence on such associations with more broadly defined mental wellbeing in still very scarce. This study aimed to establish whether the amount of neighbourhood green space was associated with mental wellbeing. Methods: Data were drawn from Understanding Society, a national survey of 30,900 individuals across 11,096 Census Lower-Layer Super Output Areas (LSOAs) in England, over the period 2009-2010. Measures included the multidimensional Warwick-Edinburgh Mental Well-Being Scale (SWEMWBS) and LSOA proportion of green space, which was derived from the General Land Use Database (GLUD), and were analysed using linear regression, while controlling for individual, household and area-level factors. Results: Those living in areas with greater proportions of green space had significantly higher mental wellbeing scores in unadjusted analyses (an expected increase of 0.17 points (95% CI 0.11, 0.23) in the SWEMWBS score for a standard deviation increase of green space). However, after adjustment for confounding by respondent sociodemographic characteristics and urban/rural location, the association was attenuated to the null (regression coefficient B = -0.01, 95% CI -0.08, 0.05, p = 0.712). Conclusions: While the green space in an individual's local area has been shown through other research to be related to aspects of mental health such as happiness and life satisfaction, the association with multidimensional mental wellbeing is much less clear from our results. While we did not find a statistically significant association between the amount of green space in residents' local areas and mental wellbeing, further research is needed to understand whether other features of green space, such as accessibility, aesthetics or use, are important for mental wellbeing.

Hélène N.Lim Choi Keung, Justin R.D. Dyson, Stephen A. Jarvis, Graham R. Nudd (2004)Self-adaptive and self-optimising resource monitoring for dynamic grid environments, In: Database and Expert Systems Applications: 15th International Conference, Dexa 2004zaragoza, Spain, August 30-September 3, 2004, Proceedings15pp. 689-693
Ligang He, Stephen A. Jarvis, Daniel P. Spooner, Xinuo Chen, Graham R. Nudd (2004)Hybrid Performance-Oriented Scheduling of Moldable Jobs with QoS Demands in Multiclusters and Grids, In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)3251pp. 217-224
James Xue, Stephen Jarvis (2018)Mining association rules for admission control and service differentiation in e-commerce applications, In: Wiley interdisciplinary reviews. Data mining and knowledge discovery8(3)1241pp. e1241-n/a Wiley

Workload demands in e-commerce applications are very dynamic in nature, therefore it is essential for internet service providers to manage server resources effectively to maximize total revenue in server overloading situations. In this paper, a data mining technique is applied to a typical e-commerce application model for identification of composite association rules that capture user navigation patterns. Two algorithms are then developed based on the derived rules for admission control, service differentiation, and priority scheduling. Our approach takes the following aspects into consideration: (a) only final purchase requests result in company revenue; (b) any other request can potentially lead to final purchase, depending upon the likelihood of the navigation sequence that starts from current request and leads to final purchase; (c) service differentiation and priority assignment are based on aggregated confidence and average support of the composite association rules. As identification of composite association rules and computation of confidence and support of the rules can be pre-computed offline, the proposed approach incurs minimum performance overheads. The evaluation results suggest that the proposed approach is effective in terms of request management for revenue maximization. This article is categorized under: Application Areas > Science and Technology Algorithmic Development > Association Rules Algorithmic Development > Web Mining

Stephen A. Jarvis, Steven A. Wright, Simon D. Hammond (2015)High performance computing systems, In: High Performance Computing Systems : Performance Modeling, Benchmarking, and Simulation - 5th International Workshop, PMBS 2014, Revised Selected Papers8966 Springer

This book constitutes the thoroughly refereed proceedings of the 5th International Workshop, PMBS 2014 in New Orleans, LA, USA in November 2014. The 12 full and 2 short papers presented in this volume were carefully reviewed and selected from 53 submissions. The papers cover topics on performance benchmarking and optimization; performance analysis and prediction; and power, energy and checkpointing.

S. Chotvijit, M. Thiarai, S. Jarvis (2018)Maintaining social care provision in the context of financial austerity, In: International journal of population data science3(1)A4
Ligang He, Stephen A. Jarvis, Daniel P. Spooner, Hong Jiang, Donna N. Dillenberger, Graham R. Nudd (2006)Allocating non-real-time and soft real-time jobs in multiclusters, In: IEEE transactions on parallel and distributed systems17(2)pp. 99-111
Kelvin K. F. Li, Stephen A. Jarvis, Fayyaz Minhas (2021)Elementary effects analysis of factors controlling COVID-19 infections in computational simulation reveals the importance of social distancing and mask usage, In: Computers in biology and medicine134104369pp. 104369-104369 Elsevier

COVID-19 was declared a pandemic by the World Health Organisation (WHO) on March 11th, 2020. With half of the world's countries in lockdown as of April due to this pandemic, monitoring and understanding the spread of the virus and infection rates and how these factors relate to behavioural and societal parameters is crucial for developing control strategies. This paper aims to investigate the effectiveness of masks, social distancing, lockdown and self-isolation for reducing the spread of SARS-CoV-2 infections. Our findings from an agent-based simulation modelling showed that whilst requiring a lockdown is widely believed to be the most efficient method to quickly reduce infection numbers, the practice of social distancing and the usage of surgical masks can potentially be more effective than requiring a lockdown. Our multivariate analysis of simulation results using the Morris Elementary Effects Method suggests that if a sufficient proportion of the population uses surgical masks and follows social distancing regulations, then SARS-CoV-2 infections can be controlled without requiring a lockdown.

L. He, S. A. Jarvis, D. P. Spooner, X. Chen, G. R. Nudd (2004)Hybrid performance-based workload management for multiclusters and grids, In: IEE proceedings. Software151(5)pp. 224-231
D. P. Spooner, J. Cao, S. A. Jarvis, L. He, G. R. Nudd (2005)Performance-aware workflow management for grid computing, In: Computer journal48(3)pp. 347-357
Stephen I. Roberts, Steven A. Wright, Suhaib A. Fahmy, Stephen A. Jarvis (2017)Metrics for Energy-Aware Software Optimisation, In: J M Kunkel, R Yokota, P Balaji, D Keyes (eds.), HIGH PERFORMANCE COMPUTING (ISC HIGH PERFORMANCE 2017)10266pp. 413-430 Springer Nature

Energy consumption is rapidly becoming a limiting factor in scientific computing. As a result, hardware manufacturers increasingly prioritise energy efficiency in their processor designs. Performance engineers are also beginning to explore software optimisation and hardware/software co-design as a means to reduce energy consumption. Energy efficiency metrics developed by the hardware community are often re-purposed to guide these software optimisation efforts. In this paper we argue that established metrics, and in particular those in the Energy Delay Product (Et-n) family, are unsuitable for energyaware software optimisation. A good metric should provide meaningful values for a single experiment, allow fair comparison between experiments, and drive optimisation in a sensible direction. We show that Et-n metrics are unable to fulfil these basic requirements and present suitable alternatives for guiding energy-aware software optimisation. We finish with a practical demonstration of the utility of our proposed metrics.

Ligang He, Stephen A. Jarvis, David Bacigalupo, Daniel P. Spooner, Xinuo Chen, Graham R. Nudd (2004)Queueing network-based optimisation techniques for workload allocation in clusters of computers, In: 2004 IEEE International Conference on Services Computing : (SCC 2004) : proceedings : Shanghai, China : September 15-18, 2004pp. 207-216
Ligang He, Deqing Zou, Zhang Zhang, Kai Yang, Hai Jin, Stephen A. Jarvis (2011)Optimizing resource consumptions in clouds, In: Proceedings - 2011 12th IEEE/ACM International Conference on Grid Computing, Grid 20116076497pp. 42-49
Ligang He, Nadeem Chaudhary, Stephen A. Jarvis, Kenli Li (2012)Allocating Resources for Workflows Running under Authorization Control, In: 2012 ACM/IEEE 13TH INTERNATIONAL CONFERENCE ON GRID COMPUTING (GRID)6319155pp. 58-65 IEEE

Automating the execution of workflows (or business processes) on computer resources has been the subject of much research. However, many workflow scenarios still require human involvement, which introduces additional authorization concerns. Role-Based Authorization Control (RBAC), under which the users are assigned to certain roles while the roles are associated with prescribed permissions, is a popular authorisation control scheme. When we allocate resources for workloads and plan system capacities, it is often assumed that when a task is allocated to a resource, the resource will accept the task and start the execution once the processor becomes available. However, the authorization policies impose further constraints on task executions, and therefore may incur performance penalty and affect both application-and system-oriented performance. This paper investigates the issue of allocating resources for running workflows under the role-based authorization control. The resource allocation strategies are developed in this paper for both human resources and computing resources. The allocation strategy for human resources takes into account the authorization constraints and establishes the optimization equation subject to the constraint of the budget available to hire human resources. Then the optimization equation is solved to obtain the number of human resources allocated to each authorization role. The allocation strategy for computing resources also takes into account authorization constraints, calculating not only the number of computing resources, but also the proportion of processing capacity in each resource allocated to serve the tasks assuming each role. The simulation experiments have been conducted to verify the effectiveness of the developed allocation strategies. The experimental results show that the allocation strategy developed in this paper outperforms the traditional allocation strategies, which do not consider authorization constraints, in terms of both average response time and resource utilization.

S. A. Jarvis, D. P. Spooner, H. N.L.C. Keung, J. R.D. Dyson, Lei Zhao, G. R. Nudd (2003)Performance-based middleware services for grid computing, In: 5th Annual International Workshop on Active Middleware Services1210215pp. 151-159
H. N.L.C. Keung, J. R.D. Dyson, S. A. Jarvis, G. R. Nudd (2003)Predicting the performance of globus monitoring and discovery service (MDS-2) queries, In: Proceedings - 4th International Workshop on Grid Computing, GRID 20032003-1261713pp. 176-183
S. J. Pennycook, S. D. Hammond, G. R. Mudalige, S. A. Wright, S. A. Jarvis (2012)On the Acceleration of Wavefront Applications using Distributed Many-Core Architectures, In: Computer journal55(2)pp. 138-153 Oxford Univ Press

In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectures to accelerate pipelined wavefront applications-a ubiquitous class of parallel algorithms used for the solution of a number of scientific and engineering applications. Specifically, we employ a recently developed port of the LU solver (from the NAS Parallel Benchmark suite) to investigate the performance of these algorithms on high-performance computing solutions from NVIDIA (Tesla C1060 and C2050) as well as on traditional clusters (AMD/InfiniBand and IBM BlueGene/P). Benchmark results are presented for problem classes A to C and a recently developed performance model is used to provide projections for problem classes D and E, the latter of which represents a billion-cell problem. Our results demonstrate that while the theoretical performance of GPU solutions will far exceed those of many traditional technologies, the sustained application performance is currently comparable for scientific wavefront applications. Finally, a breakdown of the GPU solution is conducted, exposing PCIe overheads and decomposition constraints. A new k-blocking strategy is proposed to improve the future performance of this class of algorithm on GPU-based architectures.

Giannis Moutsinas, Choudhry Shuaib, Weisi Guo, Stephen Jarvis (2021)Graph hierarchy: a novel framework to analyse hierarchical structures in complex networks, In: Scientific reports11(1)13943pp. 13943-11 NATURE PORTFOLIO

Trophic coherence, a measure of a graph's hierarchical organisation, has been shown to be linked to a graph's structural and dynamical aspects such as cyclicity, stability and normality. Trophic levels of vertices can reveal their functional properties, partition and rank the vertices accordingly. Trophic levels and hence trophic coherence can only be defined on graphs with basal vertices, i.e. vertices with zero in-degree. Consequently, trophic analysis of graphs had been restricted until now. In this paper we introduce a hierarchical framework which can be defined on any simple graph. Within this general framework, we develop several metrics: hierarchical levels, a generalisation of the notion of trophic levels, influence centrality, a measure of a vertex's ability to influence dynamics, and democracy coefficient, a measure of overall feedback in the system. We discuss how our generalisation relates to previous attempts and what new insights are illuminated on the topological and dynamical aspects of graphs. Finally, we show how the hierarchical structure of a network relates to the incidence rate in a SIS epidemic model and the economic insights we can gain through it.

R. A. Bunt, S. A. Wright, S. A. Jarvis, Y. K. Ho, M. J. Street (2016)Predictive Evaluation of Partitioning Algorithms Through Runtime Modelling, In: Proceedings - International Conference on High Performance Computingpp. 351-361 IEEE

Performance modelling unstructured mesh codes is a challenging process, due to the difficulty of capturing their memory access patterns, and their communication patterns at varying scale. In this paper we first develop extensions to an existing runtime performance model, aimed at overcoming the former, which we validate on up to 1,024 cores of a Haswell-based cluster, using both a geometric partitioning algorithm and ParMETIS to partition the input deck, with a maximum absolute runtime error of 12.63% and 11.55% respectively. To overcome the latter, we develop an application representative of the mesh partitioning process internal to an unstructured mesh code. This application is able to generate partitioning data that is usable with the performance model to produce predicted application runtimes within 7.31% of those produced using empirically collected data. We then demonstrate the use of the performance model by undertaking a predictive comparison among several partitioning algorithms on up to 30,000 cores. Additionally, we correctly predict the ineffectiveness of the geometric partitioning algorithm at 512 and 1024 cores.

Ligang He, Chenlin Huang, Kenli Li, Hao Chen, Jianhua Sun, Bo Gao, Kewei Duan, Stephen A. Jarvis (2011)Modelling and analyzing the authorization and execution of video workflows, In: 2011 18th International Conference on High Performance Computing6152727pp. 1-10
Guang Tan, Stephen A. Jarvis, Anne-Marie Kermarrec (2008)Connectivity-Guaranteed and Obstacle-Adaptive Deployment Schemes for Mobile Sensor Networks, In: 28TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, VOLS 1 AND 2, PROCEEDINGS4595912pp. 429-437 IEEE

Mobile sensors can move and self-deploy into a network. While focusing on the problems of coverage, existing deployment schemes mostly over-simplify the conditions for network connectivity: they either assume that the communication range is large enough for sensors in geometric neigh borhoods to obtain each other's location by local communications, or assume a dense network that remains connected. At the same time, an obstacle-free field or full knowledge of the field layout is often assumed. We present new schemes that are not restricted by these assumptions, and thus adapt to a much wider range of application scenarios. While maximizing sensing coverage, our schemes can achieve connectivity for a network with arbitrary sensor communication/sensing ranges or node densities, at the cost of a small moving distance; the schemes do not need any knowledge of the field layout, which can be irregular and have obstacles/holes of arbitrary shape. Simulations results show that the proposed schemes achieve the targeted properties.

Xinuo Chen, Stephen A. Jarvis, Jenny Shuangyan Liu (2009)ppBlast: A computational service over peer-to-peer network for BLAST, In: 1st International Conference on Advances in P2P Systems, AP2PS 20095359042pp. 122-131
Ligang He, Stephen A. Jarvis, Daniel P. Spooner, Xinuo Chen, Graham R. Nudd (2004)Dynamic scheduling of parallel jobs with QoS demands in multiclusters and grids, In: Proceedings - Fifth IEEE/ACM International Workshop on Grid Computingpp. 402-409
Guang Tan, Stephen A. Jarvis, James W.J. Xue, Simon D. Hammond (2007)Distributed broadcast scheduling in mobile ad hoc networks with unknown topologies, In: Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM4228318pp. 1-7
S. J. Pennycook, C. J. Hughes, M. Smelyanskiy, S. A. Jarvis (2013)Exploring SIMD for Molecular Dynamics, Using Intel (R) Xeon (R) Processors and Intel (R) Xeon Phi (TM) Coprocessors, In: Proceedings - IEEE International Parallel and Distributed Processing Symposium6569887pp. 1085-1097 IEEE

We analyse gather-scatter performance bottle-necks in molecular dynamics codes and the challenges that they pose for obtaining benefits from SIMD execution. This analysis informs a number of novel code-level and algorithmic improvements to Sandia's miniMD benchmark, which we demonstrate using three SIMD widths (128-, 256- and 512-bit). The applicability of these optimisations to wider SIMD is discussed, and we show that the conventional approach of exposing more parallelism through redundant computation is not necessarily best. In single precision, our optimised implementation is up to 5x faster than the original scalar code running on Intel (R) Xeon (R) processors with 256-bit SIMD, and adding a single Intel (R) Xeon Phi (TM) coprocessor provides up to an additional 2x performance increase. These results demonstrate: (i) the importance of effective SIMD utilisation for molecular dynamics codes on current and future hardware; and (ii) the considerable performance increase afforded by the use of Intel (R) Xeon Phi (TM) coprocessors for highly parallel workloads.

Weisi Guo, Neha Gupta, Ganna Pogrebna, Stephen Jarvis (2016)Understanding Happiness in Cities using Twitter: Jobs, Children, and Transport, In: IEEE SECOND INTERNATIONAL SMART CITIES CONFERENCE (ISC2 2016)pp. 329-335 IEEE

The demographics and landscape of cities are changing rapidly, and there is an emphasis to better understand the factors which influence citizen happiness in order to design smarter urban systems. Few studies have attempted to understand how large-scale sentiment maps to urban human geography. Inferring sentiment from social media data is one such scalable solution. In this paper, we apply natural language processing (NLP) techniques to 0.4 million geo-tagged Tweets in the Greater London area to understand the influence of socioeconomic and urban geography parameters on happiness. Our results not only verify established thinking: that job opportunities correlate with positive sentiments; but also reveal two insights: (1) happiness is negatively correlated with number of children, and (2) happiness has a U-shaped (parabolic) relationship with access to public transportation. The latter implies that the happiest people are those who have good access to public transport, or such poor access that they use private transportation. The number of jobs, children, and transportation availability are every day facets of urban living and individually account for up to 47% of the variations in people's happiness. Our results show that they influence happiness more significantly than long term socioeconomic parameters such as degradation, education, income, housing, and crime. This study will enable urban planners and system designers to move beyond the traditional cost-benefit methodology and to incorporate citizens' happiness.

David Truby, Carlo Bertolli, Steven A. Wright, Gheorghe-Teodor Bercea, Kevin O'Brien, Stephen A. Jarvis (2018)Pointers Inside Lambda Closure Objects in OpenMP Target Offload Regions, In: PROCEEDINGS OF LLVM-HPC 2018: IEEE/ACM 5TH WORKSHOP ON THE LLVM COMPILER INFRASTRUCTURE IN HPC (LLVM-HPC)pp. 10-17 IEEE

With the diversification of HPC architectures beyond traditional CPU-based clusters, a number of new frameworks for performance portability across architectures have arisen. One way of implementing such frameworks is to use C++ templates and lambda expressions to design loop-like functions. However, lower level programming APIs that these implementations must use are often designed with C in mind and do not specify how they interact with C++ features such as lambda expressions. This paper discusses a change to the behavior of the OpenMP specification with respect to lambda expressions such that when functions generated by lambda expressions are called inside GPU regions, any pointers used in the lambda expression correctly refer to device pointers. This change has been implemented in a branch of the Clang C++ compiler and demonstrated with two representative codes. This change has also been accepted into the draft OpenMP (R) specification for inclusion in OpenMP 5. Our results show that the implicit mapping of lambda expressions always exhibits identical performance to an explicit mapping but without breaking the abstraction provided by the high level frameworks.

Oliver Perks, Simon D. Hammond, Simon J. Pennycook, Stephen A. Jarvis (2011)WMTools - Assessing Parallel Application Memory Utilisation at Scale, In: N Thomas (eds.), COMPUTER PERFORMANCE ENGINEERING6977pp. 148-162 Springer Nature

The divergence between processor and memory performance has been a well discussed aspect of computer architecture literature for some years. The recent use of multi-core processor designs has, however, brought new problems to the design of memory architectures - as more cores are added to each successive generation of processor, equivalent improvement in memory capacity and memory sub-systems must be made if the compute components of the processor are to remain sufficiently supplied with data. These issues combined with the traditional problem of designing cache-efficient code help to ensure that memory remains an on-going challenge for application and machine designers. In this paper we present a comprehensive discussion of WMTools - a trace-based toolkit designed to support the analysis of memory allocation for parallel applications. This paper features an extended discussion of the WMTrace tracing tool presented in previous work including a revised discussion on trace-compression and several refinements to the tracing methodology to reduce overheads and improve tool scalability. The second half of this paper features a case study in which we apply WMTools to five parallel scientific applications and benchmarks, demonstrating its effectiveness at recording high-water mark memory consumption as well as memory use per-function over time. An in-depth analysis is provided for an unstructured mesh benchmark which reveals significant memory allocation imbalance across its participating processes. This study demonstrates the use of WMTools in elucidating memory allocation issues in high-performance scientific codes.

Stephen A. Jarvis, Guang Tan, Daniel P. Spooner, Graham R. Nudd (2006)Constructing reliable and efficient overlays for P2P live media streaming, In: International journal of simulation systems, science & technology7(2)pp. 54-62
S. D. Hammond, G. R. Mudalige, J. A. Smith, J. A. Davis, A. B. Mills, S. A. Jarvis, J. Holt, I. Miller, J. A. Herdman, A. Vadgama (2009)Performance prediction and procurement in practice: assessing the suitability of commodity cluster components for wavefront codes, In: IET software3(6)pp. 509-521 Inst Engineering Technology-Iet

The cost of state-of-the-art supercomputing resources makes each individual purchase a length and expensive process. Often each candidate architecture will need to be benchmarked using a variety of tools to assess likely performance. However, benchmarking alone only provides a limited insight into the suitability of each architecture for key codes and will give potentially misleading results when assessing their scalability. In this study the authors present a case study of the application of recently developed performance models of the Chimaera benchmarking code written by the United Kingdom Atomic Weapons Establishment (AWE), with a view to analysing how the code will perform and scale on a medium sized, commodity-based InfiniBand cluster. The models are validated and demonstrate a greater than 90% accuracy for an existing InfiniBand machine; the models are then used as the basis for predicting code performance on a variety of alternative hardware configurations which include changes in the underlying network, the use of faster processors and the use of a higher core density per processor. The results demonstrate the compute-bound nature of Chimaera and its sensitivity to network latency at increased processor counts. By using these insights the authors are able to discuss potential strategies which may be employed during the procurement of future mid-range clusters for wavefront-rich workloads.

Konstantin Klemmer, Daniel B. Neill, Stephen A. Jarvis (2021)Understanding spatial patterns in rape reporting delays, In: Royal Society open science8(2)201795 Royal Soc London

Under-reporting and delayed reporting of rape crime are severe issues that can complicate the prosecution of perpetrators and prevent rape survivors from receiving needed support. Building on a massive database of publicly available criminal reports from two US cities, we develop a machine learning framework to predict delayed reporting of rape to help tackle this issue. Motivated by large and unexplained spatial variation in reporting delays, we build predictive models to analyse spatial, temporal and socio-economic factors that might explain this variation. Our findings suggest that we can explain a substantial proportion of the variation in rape reporting delays using only openly available data. The insights from this study can be used to motivate targeted, data-driven policies to assist vulnerable communities. For example, we find that younger rape survivors and crimes committed during holiday seasons exhibit longer delays. Our insights can thus help organizations focused on supporting survivors of sexual violence to provide their services at the right place and time. Due to the non-confidential nature of the data used in our models, even community organizations lacking access to sensitive police data can use these findings to optimize their operations.

Jonathan M.D. Hill, Stephen A. Jarvis, Constantinos J. Siniolakis, Vasil P. Vasilev (1998)Portable and architecture independent parallel performance tuning using a call-graph profiling tool, In: Workshop on Parallel and Distributed Processingpp. 286-292
Bo Gao, Ligang He, Limin Liu, Kenli Li, Stephen A. Jarvis (2012)From Mobiles to Clouds: Developing Energy-aware Offloading Strategies for Workflows, In: 2012 ACM/IEEE 13TH INTERNATIONAL CONFERENCE ON GRID COMPUTING (GRID)6319164pp. 139-146 IEEE

Cloud computing and mobile computing are two of the most influential technologies that look set to change the face of computing in the coming years. Combination of the two provides us with an unprecedented opportunity to provide highly portable and yet content-rich and computation-intensive services to the end user. In this paper we investigate the possibility of using code/task offload techniques between mobile and cloud in order to reduce the energy cost of workflows deployed on mobile devices. We first present a vision in which mobile devices are coordinated over a network, which is equipped with a layer of cloud-like infrastructures which we term cloudlets, whose computational resources can be leveraged by the mobile devices to host the execution of mission-critical mobile workflows in an energy-aware manner. We then build a model that encompasses various characteristics of the workflow's software and the network's hardware devices. With this model, we construct the objective functions that guide the offload decisions. We then present a heuristic algorithm that produces statistical and dynamic offload plans according to these objective functions and their variations both statically and dynamically. We conclude the paper with a series of simulation studies, the results of which give insight into the offload-ability of workflows of different characteristics. The results also illustrate how different hardware specifications can affect offload efficiency. These studies indicate that our offload algorithm can significantly improve the energy efficiency and execution speed of mobile workflows.

Liam Steadman, Nathan Griffiths, Stephen Jarvis, Stuart McRobbie, Caroline Wallbank (2019)2D-STR: Reducing Spatio-temporal Traffic Datasets by Partitioning and Modelling, In: C Grueau, R Laurini, L Ragia (eds.), PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON GEOGRAPHICAL INFORMATION SYSTEMS THEORY, APPLICATIONS AND MANAGEMENT (GISTAM 2019)pp. 41-52 Scitepress

Spatio-temporal data generated by sensors in the environment, such as traffic data, is widely used in the transportation domain. However, learning from and analysing such data is increasingly problematic as the volume of data grows. Therefore, methods are required to reduce the quantity of data needed for multiple types of subsequent analysis without losing significant information. In this paper, we present the 2-Dimensional Spatio-Temporal Reduction method (2D-STR), which partitions the spatio-temporal matrix of a dataset into regions of similar instances, and reduces each region to a model of its instances. The method is shown to be effective at reducing the volume of a traffic dataset to

P. Coetzee, M. Leeke, S. Jarvis (2014)Towards unified secure on- and off-line analytics at scale, In: Parallel computing40(10)pp. 738-753 Elsevier

Data scientists have applied various analytic models and techniques to address the oft-cited problems of large volume, high velocity data rates and diversity in semantics. Such approaches have traditionally employed analytic techniques in a streaming or batch processing paradigm. This paper presents CRUCIBLE, a first-in-class framework for the analysis of large-scale datasets that exploits both streaming and batch paradigms in a unified manner. The CRUCIBLE framework includes a domain specific language for describing analyses as a set of communicating sequential processes, a common runtime model for analytic execution in multiple streamed and batch environments, and an approach to automating the management of cell-level security labelling that is applied uniformly across runtimes. This paper shows the applicability of CRUCIBLE to a variety of state-of-the-art analytic environments, and compares a range of runtime models for their scalability and performance against a series of native implementations. The work demonstrates the significant impact of runtime model selection, including improvements of between 2.3 x and 480x between runtime models, with an average performance gap of just 14x between CRUCIBLE and a suite of equivalent native implementations. (C) 2014 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/3.0/).

Chao Chen, Ligang He, Hao Chen, Jianhua Sun, Bo Gao, Stephen A. Jarvis (2013)Developing Communication-aware Service Placement Frameworks in the Cloud Economy, In: 2013 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)6702668pp. 1-8 IEEE

In a Cloud system, a number of services are often deployed with each service being hosted by a collection of Virtual Machines (VM). The services may interact with each other and the interaction patterns may be dynamic, varying according to the system information at runtime. These impose a challenge in determining the amount of resources required to deliver a desired level of QoS for each service. In this paper, we present a method to determine the sufficient number of VMs for the interacting Cloud services. The proposed method borrows the ideas from the Leontief Open Production Model in economy. Further, this paper develops a communication-aware strategy to place the VMs to Physical Machines (PM), aiming to minimize the communication costs incurred by the service interactions. The developed communication-aware placement strategy is formalized in a way that it does not need to the specific communication pattern between individual VMs. A genetic algorithm is developed to find a VM-to-PM placement with low communication costs. Simulation experiments have been conducted to evaluate the performance of the developed communication-aware placement framework. The results show that compared with the placement framework aiming to use the minimal number of PMs to host VMs, the proposed communication-aware framework is able to reduce the communication cost significantly with only a very little increase in the PM usage.

Yi Pan, Ahmad Khonsari, Alireza Shahrabi, Abdel Elah Al-Ayyoub, V. Gianuzzi, Hadi Larijani, Geyong Min, Albert Y. Zomaya, Stephen A. Jarvis, Azzedine Boukerche, Irfan Awan, Rod Fatoohi, Pradip K. Srimani, Mirela Sechi Moretti Annoni Notare, Hamid Sarbazi-Azad, Lewis M. Mackenzie, M. Naimi, Abderezak Touzene, Khalid Al-Begain, Behrooz Parhami, H. Karatza, K. Joe, Ahmed Al-Dubai, Khaled Day, S. Loucif, Mohamed Ould-Khaoua, Nigel Thomas, Sotirios G. Ziavras, H. Ahrabian (2006)Message from the PEN-PCGCS Workshop Co-chairs, In: 2006 International Conference on Parallel Processing Workshops (ICPPW'06)pp. xiv-xiv IEEE
Wilson M. Tan, Stephen A. Jarvis (2013)Self-determination of Maximum Supportable Receiver Wakeup Intervals in Energy Harvesting WSN Nodes, In: IFIP Wireless Days6686440pp. 1-7 IEEE

Energy harvesting wireless sensor network nodes would not be able to operate without duty cycling. In TinyOS, duty cycling is supported through Low Power Listening or LPL. LPL is sender-centric: the longer the wakeup interval, the more power a receiver saves, at the cost of more energy per transmission for the sender. Due to the limitations of energy storage technologies, there is a limit to the sender wakeup interval which energy harvesting senders could support. Currently, the limit could be derived computationally or experimentally. Computational derivation is overly conservative, while manual experimentation is labour intensive. In this paper, we present a protocol which enables sensor nodes to determine the wakeup interval limit experimentally without human intervention or the aid of other nodes. Not only does the protocol allow for easier determination of the said limit, it also allows network nodes to adjust to environmental changes that nodes encounter while in deployment, such as capacitor ageing.

Guang Tan, Stephen A. Jarvis, Xinuo Chen, Daniel P. Spooner, Graham R. Nudd (2005)Performance analysis and improvement of overlay construction for peer-to-peer live media streaming, In: Proceedings - International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems20051521130pp. 169-176
Guang Tan, Stephen A. Jarvis (2006)Inter-overlay cooperation in high-bandwidth overlay multicast, In: W C Feng (eds.), 2006 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS1690645pp. 417-424 IEEE

The cooperation of end users can be exploited to boost the performance of high-bandwidth multicast. While intraoverlay cooperation, the mechanism for cooperation within a single overlay (multicast group), has been extensively studied, little attention has been paid to inter-overlay cooperation. In this paper we explore the possibility and effects of cooperation among co-existing heterogeneous overlays in the context of live media streaming, where bandwidth is the bottleneck resource. To motivate such a kind of cooperation, we design a reputation-based incentive mechanism that differentiates user' streaming qualities based on the amount of data actually forwarded by individual users. This not only stimulates users to contribute as much forwarding bandwidth as possible, but also motivates those with spare bandwidths in resource-rich overlays to find downstream users in external, often resource-poor overlays so as to accumulate more reputation scores. Under this mechanism, an adaptive bandwidth exporting/reclaiming algorithm is developed which allows users to dynamically allocate bandwidth according to the resource availability of multiple overlays. Simulation results are reported with enhanced system performance in terms of users' average media quality.

Ligang He, Stephen A. Jarvis, Daniel P. Spooner, Graham R. Nudd (2003)Dynamic scheduling of parallel real-time jobs by modelling spare capabilities in heterogeneous clusters, In: Proceedings - IEEE International Conference on Cluster Computing, CLUSTER 20032003-1253293pp. 2-10
Xiaolong Jin, Ahmed Y. Al-Dubai, Shoukat Ali, Stephen Jarvis (2015)Guest Editorial: Ubiquitous Multimedia Systems and Applications, In: Multimedia tools and applications74(8)pp. 2743-2744 Springer Nature
Malkiat Thiarai, Sarunkorn Chotvijit, Stephen Jarvis (2019)Balancing information governance obligations when accessing social care data for collaborative research, In: Records management journal (London, England)29(1-2)pp. 194-209 Emerald Group Publishing

Purpose There is significant national interest in tackling issues surrounding the needs of vulnerable children and adults. This paper aims to argue that much value can be gained from the application of new data-analytic approaches to assist with the care provided to vulnerable children. This paper highlights the ethical and information governance issues raised in the development of a research project that sought to access and analyse children's social care data. Design/methodology/approach The paper documents the process involved in identifying, accessing and using data held in Birmingham City Council's social care system for collaborative research with a partner organisation. This includes identifying the data, its structure and format; understanding the Data Protection Act 1998 and 2018 (DPA) exemptions that are relevant to ensure that legal obligations are met; data security and access management; the ethical and governance approval process. Findings The findings will include approaches to understanding the data, its structure and accessibility tasks involved in addressing ethical and legal obligations and requirements of the ethical and governance processes. Originality/value The aim of this research is to highlight the potential use of use new data-analytic techniques to examine the flow of children's social care data from referral, through the assessment process, to the resulting service provision. Data held by Birmingham City Council are used throughout, and this paper highlights key ethical and information governance issues which were addressed in preparing and conducting the research. The findings provide insight for other data-led studies of a similar nature.

Stephen Jarvis, Nigel Thomas, Aad van Moorsel (2004)Open issues in grid performability, In: International journal of simulation systems, science & technology5(5)pp. 3-12
Ligang He, Kewei Duan, Xueguang Chen, Deqing Zou, Zongfen Han, Ali Fadavinia, Stephen A. Jarvis (2011)Modelling workflow executions under role-based authorisation control, In: Proceedings - 2011 IEEE International Conference on Services Computing, SCC 20116009262pp. 200-208
Stephen Jarvis, Massimo Coppola, Junwei Cao, Darren Kerbyson (2010)Performance Prediction and Evaluation, In: P DAmbar, M Guarracino, D Talia (eds.), EURO-PAR 2010 PARALLEL PROCESSING, PT I6271(1)pp. 86-87 Springer Nature
D A Bacigalupo, J D Turner, S A Jarvis, G R Nudd (2003)A dynamic predictive framework for e-business workload management, In: N Callaos, A M DiSciullo, T Ohta, T K Liu (eds.), 7TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL I, PROCEEDINGSpp. 390-395 Int Inst Informatics & Systemics

Distributed e-business application platforms such as the web services framework will require sophisticated workload management infrastructures for the routing of client requests to the most appropriate services. This paper introduces a novel and dynamic predictive framework that uses both analytical performance modelling and historical data to extrapolate a request's predicted performance under varying workloads and conditions on different systems. Such a framework can facilitate a routing algorithm in meeting quality of service-based metric requirements. An e-business benchmark is used as a demonstrator to provide an insight into the techniques used in predicting request response times.

Ligang He, Mark Calleja, Mark Hayes, Stephen A. Jarvis (2009)Performance Prediction for Running Workflows under Role-based Authorization Mechanisms, In: 2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5 IEEE

When investigating the performance of running scientific/commercial workflows in parallel and distributed systems, we often take into account only the resources allocated to the tasks constituting the workflow, assuming that computational resources will accept the tasks and execute them to completion once the processors are available. In reality, and in particular in Grid or e-business environments, security policies may be implemented in the individual organisations in which the computational resources reside. It is therefore expedient to have methods to calculate the performance of executing workflows under security policies. Authorisation control, which specifies who is allowed to perform which tasks when, is one of the most fundamental security considerations in distributed systems such as Grids. Role-Based Access Control (RBAC), under which the users are assigned to certain roles while the roles are associated with prescribed permissions, remains one of the most popular authorisation control mechanisms. This paper presents a mechanism to theoretically compute the performance of running scientific workflows under RBAC authorisation control. Various performance metrics are calculated, including both system-oriented metrics, (such as system utilisation, throughput and mean response time) and user-oriented metrics (such as mean response time of the workflows submitted by a particular client). With this work, if a client informs an organisation of the workflows they are going to submit, the organisation is able to predict the performance of these workflows running in its local computational resources (e.g. a high-performance cluster) enforced with RBAC authorisation control, and can also report client-oriented performance to each individual user.

James W. J. Xue, Stephen A. Jarvis (2010)QoS-Aware Service Selection, In: N Griffiths, K M Chao (eds.), Advanced Information and Knowledge Processingpp. 225-251 Springer Nature

With the widespread use of the Internet, the number of web services that can provide similar functionality has increased rapidly in recent years. Web service selection has to be based on some non-functional attributes of the services, such as the quality of service (QoS). In this chapter, we use a server switching service that is commonly used in Internet hosting environments to explain how an agent can use a performance model to evaluate services and select the most suitable services among a number of functionally similar services returned by the service discovery. The various criteria that can be used to assess QoS are introduced in this chapter, including mean response time, throughput, system utilisation and others closely related to business such as revenue and operating costs. Service selection in the chosen case study depends on the quality and suitability of various switching policies, in other words, different switching policies can be selected depending on the QoS of the services and the run-time system state. Since the system performance can be evaluated using an analytic model, therefore, the QoS of services is assessed based on the output of the performance model.

G. R. Mudalige, S. D. Hammond, J. A. Smith, S. A. Jarvis (2009)Predictive Analysis and Optimisation of Pipelined Wavefront Computations, In: 2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5pp. 1208-1215 IEEE

Pipelined wavefront computations are a ubiquitous class of parallel algorithm used for the solution of a number of scientific and engineering applications. This paper investigates three optimisations to the generic pipelined wavefront algorithm, which are investigated through the use of predictive analytic models. The modelling of potential optimisations is supported by a recently developed reusable LogGP-based analytic performance model, which allows the speculative evaluation, of each optimisation within the context of an industry-strength pipelined wavefront benchmark, developed and maintained by the United Kingdom Atomic Weapons Establishment (AWE). The paper details the quantitative and qualitative benefits of: (1) parallelising computation blocks of the wavefront algorithm using OpenMP; (2) a novel restructuring/shifting of computation within the wavefront code and, (3) performing simultaneous multiple sweeps through the data grid.

Justin R.D. Dyson, Nathan E. Griffiths, Hélène N.Lim Choi Keung, Stephen A. Jarvis, Graham R. Nudd (2004)Trusting agents for grid computing, In: 2004 IEEE International Conference on Systems, Man and Cybernetics, SMC 20044pp. 3187-3192
Wilson M. Tan, Stephen A. Jarvis (2013)Determination of Maximum Supportable Receiver Wakeup Intervals in Energy Harvesting WSN Nodes Using a Client-Server Setup, In: 2013 IEEE Conference on Wireless Sensor (ICWISE)6728780pp. 61-67 IEEE

Energy harvesting wireless sensor network nodes would not be able to operate without duty cycling. In TinyOS, duty cycling is supported through Low Power Listening or LPL. LPL is sender-centric: the longer the wakeup interval, the more power a receiver saves, at the cost of more energy per transmission for the sender. Due to the limitations of energy storage technologies, there is a limit to the sender wakeup interval which energy harvesting senders could support. Currently, the limit could be derived computationally or experimentally. Computational derivation is overly conservative, while manual experimentation is labor intensive. In this paper, we present a protocol which enables sensor nodes to autonomously determine the wakeup interval limit experimentally, using a client-server architecture. Not only does the protocol allow for easier determination of the said limit, it also allows network nodes to adjust to environmental changes that nodes encounter while in deployment, such as capacitor ageing.

Wilson M. Tan, Stephen A. Jarvis (2014)Quality over Quantity: Target Identifiability in Directional Sensor Networks, In: 2014 IEEE CONFERENCE ON WIRELESS SENSORS (ICWISE)pp. 41-48 IEEE

Existing algorithms for orienting sensors in directional sensor networks have primarily concerned themselves with the problem of maximizing the number of covered targets, assuming that target identification as a non-issue. Such an assumption however, does not hold true in all situations. In this paper, three heuristic algorithms for choosing active sensors and orienting them with the goal of balancing coverage and identifiability are presented. The first algorithm is parameterizable, allowing users to specify the desired level of trade-off between coverage and identifiability. The other two algorithms are not parameterizable, and are more suited for situations where coverage is preferred over identifiability. The performance of the algorithms are verified via simulations, and shown through simulations to confer increased target identifiability compared to algorithms originally designed to simply maximize the number of targets covered.

Stephen Roberts, Steven A. Wright, Suhaib A. Fahmy, Stephen A. Jarvis (2019)The Power-optimised Software Envelope, In: ACM transactions on architecture and code optimization16(3)21pp. 1-27 Assoc Computing Machinery

Advances in processor design have delivered performance improvements for decades. As physical limits are reached, refinements to the same basic technologies are beginning to yield diminishing returns. Unsustainable increases in energy consumption are forcing hardware manufacturers to prioritise energy efficiency in their designs. Research suggests that software modifications may be needed to exploit the resulting improvements in current and future hardware. New tools are required to capitalise on this new class of optimisation. In this article, we present the Power Optimised Software Envelope (POSE) model, which allows developers to assess the potential benefits of power optimisation for their applications. The POSE model is metric agnostic and in this article, we provide derivations using the established Energy-Delay Product metric and the novel Energy-Delay Sum and Energy-Delay Distance metrics that we believe are more appropriate for energy-aware optimisation efforts. We demonstrate POSE on three platforms by studying the optimisation characteristics of applications from the Mantevo benchmark suite. Our results show that the Pathfinder application has very little scope for power optimisation while TeaLeaf has the most, with all other applications in the benchmark suite falling between the two. Finally, we extend our POSE model with a formulation known as System Summary POSE-a meta-heuristic that allows developers to assess the scope a system has for energy-aware software optimisation independent of the code being run.

Guang Tan, Stephen A. Jarvis, Ligang He, Xinuo Chen, Daniel P. Spooner, Graham R. Nudd (2005)Modelling web transfer performance over asymmetric networks, In: Proceedings - 11th International Conference on Parallel and Distributed Systems Workshops, ICPADS 20052pp. 110-114
Ligang He, Nadeem Chaudhary, Stephen A. Jarvis (2014)Developing security-aware resource management strategies for workflows, In: Future generation computer systems38pp. 61-68 Elsevier

This paper investigates the resource allocation problem for a type of workflow in pervasive computing. These workflows are abstracted from the enterprise-level applications in the business or commerce area. The activities in these workflows require not only computing resources, but also human resources. Human involvement introduces additional security concerns. When we plan/allocate resource capacities, we often assume that when a task is allocated to a resource, the resource will accept the task and start the execution once the processor becomes available. However, the security policies impose further constraints on task executions, and therefore may affect both application- and system-oriented performance. Authorization is an important aspect in security. This paper investigates the issue of allocating resources for running workflows under the role-based authorization control, which is one of the most popular authorization mechanisms. By taking into account the authorization constraints, the resource allocation strategies are developed in this paper for both human resources and computing resources. In the allocation strategy for human resources, the optimization equation is constructed subject to the constraint of the budget available to hire human resources. Then the optimization equation is solved to obtain the number of human resources allocated to each authorization role. The allocation strategy for computing resources calculates not only the number of computing resources, but also the proportion of processing capacity in each resource allocated to serve the tasks assuming each role. The simulation experiments have been conducted to verify the effectiveness of the developed allocation strategies. The experimental results show that the allocation strategy developed in this paper outperforms the traditional allocation strategies, which do not consider authorization constraints, in terms of both average response time and resource utilization. (C) 2013 Elsevier B.V. All rights reserved.

Peter Coetzee, Stephen Jarvis (2013)CRUCIBLE: Towards unified secure on- and off-line analytics at scale, In: DISCS 2013 : proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems : November 18, 2013pp. 43-48
Ligang He, Stephen A. Jarvis, Daniel P. Spooner, Graham R. Nudd (2006)Performance evaluation of scheduling applications with DAG topologies on multiclusters with independent local schedulers, In: 20th International Parallel and Distributed Processing Symposium : proceedings : April 25-29, 2006, Rhodes Island, Greece20061639635
S. A. Jarvis, B. P. Foley, P. J. Isitt, D. P. Spooner, D. Rueckert, G. R. Nudd (2008)Performance prediction for a code with data-dependent runtimes, In: Concurrency and computation20(3)pp. 195-206 Wiley

In this paper we present a predictive performance model for a key biomedical imaging application found as part of the U.K. e-Science Information eXtraction from Images (IXI) project. This code represents a significant challenge for our existing performance prediction tools as it has internal structures that exhibit highly variable runtimes depending on qualities in the input data provided. Since the runtime can vary by more than an order of magnitude, it has been difficult to apply meaningful quality of service criteria to workflows that use this code. The model developed here is used in the context of an interactive scheduling system which provides rapid feedback to the users, allowing them to tailor their workloads to available resources or to allocate extra resources to scheduled workloads. Copyright (c) 2007 John Wiley & Sons, Ltd.

Gihan R. Mudalige, Mary K. Vernon, Stephen A. Jarvis (2008)A plug-and-play model for evaluating wavefront computations on parallel architectures, In: 2008 IEEE International Symposium on Parallel and Distributed Processing IEEE

This paper develops a plug-and-play reusable LogGP model that can be used to predict the runtime and scaling behavior of different MPI-based pipelined wavefront applications running on modern parallel platforms with multicore nodes. A key new feature of the model is that it requires only a few simple input parameters to project performance for wavefront codes with different structure to the sweeps in each iteration as well as different behavior during each wavefront computation and/or between iterations. We apply the model to three key benchmark applications that are used in high performance computing procurement, illustrating that the model parameters yield insight into the key differences among the codes. We also develop new, simple and highly accurate models of MPI send, receive, and group communication primitives on the dual-core Cray XT system. We validate the reusable model applied to each benchmark on up to 8192 processors on the XT3/XT4. Results show excellent accuracy for all high performance application and platform configurations that we were able to measure. Finally we use the model to assess application and hardware configurations, develop new metrics for procurement and configuration, identify bottlenecks, and assess new application design modifications that, to our knowledge, have not previously been explored.

Fernando Munoz-Mendez, Ke Han, Konstantin Klemmer, Stephen Jarvis (2018)Community Structures, Interactions and Dynamics in London's Bicycle Sharing Network, In: Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Computerspp. 1015-1023 Assoc Computing Machinery

We apply a novel clustering technique to London's bikesharing network, deriving distinctive behavioral patterns and assessing community interactions and spatio-temporal dynamics. The analyses reveal self-contained, interconnected and hybrid clusters that mimic London's physical structure. Exploring changes over time, we find geographically isolated and specialized communities to be relatively consistent, while the remaining system exhibits volatility. We increase understanding of the collective behavior of the bikesharing users.

Elth Ogston, Stephen A. Jarvis (2009)Peer sampling with improved accuracy, In: Peer-to-peer networking and applications2(1)pp. 24-36 Springer Nature

Node sampling services provide peers in a peer-to-peer system with a source of randomly chosen addresses of other nodes. Ideally, samples should be independent and uniform. The restrictions of a distributed environment, however, introduce various dependancies between samples. We review gossip-based sampling protocols proposed in previous work, and identify sources of inaccuracy. These include replicating the items from which samples are drawn, and imprecise management of the process of refreshing items. Based on this analysis, we propose a new protocol, Eddy, which aims to minimize temporal and spatial dependancies between samples. We demonstrate, through extensive simulation experiments, that these changes lead to an improved sampling service. Eddy maintains a balanced distribution of items representing active system nodes, even in the face of realistic levels of message loss and node churn. As a result, it behaves more like a centralized random number generator than previous protocols. We demonstrate this by showing that using Eddy improves the accuracy of a simple algorithm that uses random samples to estimate the size of a peer-to-peer network.

Ligang He, Huanzhou Zhu, Stephen A. Jarvis (2016)Developing Graph-Based Co-Scheduling Algorithms on Multicore Computers, In: IEEE transactions on parallel and distributed systems27(6)pp. 1617-1632 IEEE

It is common that multiple cores reside on the same chip and share the on-chip cache. As a result, resource sharing can cause performance degradation of co-running jobs. Job co-scheduling is a technique that can effectively alleviate this contention and many co-schedulers have been reported in related literature. Most solutions however do not aim to find the optimal co-scheduling solution. Being able to determine the optimal solution is critical for evaluating co-scheduling systems. Moreover, most co-schedulers only consider serial jobs, and there often exist both parallel and serial jobs in real-world systems. In this paper a graph-based method is developed to find the optimal co-scheduling solution for serial jobs; the method is then extended to incorporate parallel jobs, including multi-process, and multi-threaded parallel jobs. A number of optimization measures are also developed to accelerate the solving process. Moreover, a flexible approximation technique is proposed to strike a balance between the solving speed and the solution quality. Extensive experiments are conducted to evaluate the effectiveness of the proposed co-scheduling algorithms. The results show that the proposed algorithms can find the optimal co-scheduling solution for both serial and parallel jobs. The proposed approximation technique is also shown to be flexible in the sense that we can control the solving speed by setting the requirement for the solution quality.

J. A. Davis, G. R. Mudalige, S. D. Hammond, J. A. Herdman, I. Miller, S. A. Jarvis (2011)Predictive analysis of a hydrodynamics application on large-scale CMP clusters, In: Computer science (Berlin, Germany)26(3-4)pp. 175-185 Springer Nature

We present the development of a predictive performance model for the high-performance computing code Hydra, a hydrodynamics benchmark developed and maintained by the United Kingdom Atomic Weapons Establishment (AWE). The developed model elucidates the parallel computation of Hydra, with which it is possible to predict its run-time and scaling performance on varying largescale chip multiprocessor (CMP) clusters. A key feature of the model is its granularity; with the model we are able to separate the contributing costs, including computation, point-to-point communications, collectives, message buffering and message synchronisation. The predictions are validated on two contrasting large-scale HPC systems, an AMD Opteron/InfiniBand cluster and an IBM BlueGene/P, both of which are located at the Lawrence Livermore National Laboratory (LLNL) in the US. We validate the model on up to 2,048 cores, where it achieves a >85% accuracy in weakscaling studies. We also demonstrate use of the model in exposing the increasing costs of collectives for this application, and also the influence of node density on network accesses, therefore highlighting the impact of machine choice when running this hydrodynamics application at scale.

Wilson M. Tan, Stephen A. Jarvis (2015)A Distributed Heuristic Solution to the Target Identifiability Problem in Directional Sensor Networks, In: 2015 International Conference on Computing, Networking and Communications (ICNC)7069337pp. 176-182 IEEE

Existing algorithms for orienting sensors in directional sensor networks have primarily concerned themselves with the problem of maximizing the number of covered targets, assuming that target identification as a non-issue. Such an assumption however, does not hold true in all situations. In this paper, a distributed heuristic algorithm for choosing active sensors and orienting them with the goal of balancing coverage and identifiability is presented. The performance of the algorithm is verified via simulations, and shown through simulations to confer increased target identifiability compared to algorithms originally designed to simply maximize the number of targets covered.

A. M. B. Owenson, S. A. Wright, R. A. Bunt, Y. K. Ho, M. J. Street, S. A. Jarvis (2020)An unstructured CFD mini-application for the performance prediction of a production CFD code, In: Concurrency and computation32(10)5443 Wiley

Maintaining the performance of large scientific codes is a difficult task. To aid in this task, a number of mini-applications have been developed that are more tractable to analyze than large-scale production codes while retaining the performance characteristics of them. These "mini-apps" also enable faster hardware evaluation and, for sensitive commercial codes, allow evaluation of code and system changes outside of access approval processes. In this paper, we develop MG-CFD, a mini-application that represents a geometric multigrid, unstructured computational fluid dynamics (CFD) code, designed to exhibit similar performance characteristics without sharing commercially sensitive code. We detail our experiences of developing this application using guidelines detailed in existing research and contributing further to these. Our application is validated against the inviscid flux routine of HYDRA, a CFD code developed by Rolls-Royce plc for turbomachinery design. This paper (1) documents the development of MG-CFD, (2) introduces an associated performance model with which it is possible to assess the performance of HYDRA on new HPC architectures, and (3) demonstrates that it is possible to use MG-CFD and the performance models to predict the performance of HYDRA with a mean error of 9.2% for strong-scaling studies.

Zhenyu Li, James Davis, Stephen A. Jarvis (2018)Optimizing Machine Learning on Apache Spark in HPC Environments, In: PROCEEDINGS OF 2018 IEEE/ACM MACHINE LEARNING IN HPC ENVIRONMENTS (MLHPC 2018)pp. 95-105 IEEE

Machine learning has established itself as a powerful tool for the construction of decision making models and algorithms through the use of statistical techniques on training data. However, a significant impediment to its progress is the time spent training and improving the accuracy of these models this is a data and compute intensive process, which can often take days, weeks or even months to complete. A common approach to accelerate this process is to employ the use of multiple machines simultaneously, a trait shared with the field of High Performance Computing (HPC) and its clusters. However, existing distributed frameworks for data analytics and machine learning are designed for commodity servers, which do not realize the full potential of a HPC cluster, and thus denies the effective use of a readily available and potentially useful resource. In this work we adapt the application of Apache Spark, a distributed data-flow framework, to support the use of machine learning in HPC environments for the purposes of machine learning. There are inherent challenges to using Spark in this context; memory management, communication costs and synchronization overheads all pose challenges to its efficiency. To this end we introduce: (i) the application of MapRDD, a fine grained distributed data representation; (ii) a task-based all-reduce implementation; and (iii) a new asynchronous Stochastic Gradient Descent (SGD) algorithm using non-blocking all-reduce. We demonstrate up to a 2.6x overall speedup (or a 11.2x theoretical speedup with a Nvidia K80 graphics card), a 82-91% compute ratio, and a 80% reduction in the memory usage, when training the GoogLeNet model to classify 10% of the ImageNet dataset on a 32-node cluster. We also demonstrate a comparable convergence rate using the new asynchronous SGD with respect to the synchronous method. With increasing use of accelerator cards, larger cluster computers and deeper neural network models, we predict a 2x further speedup (i.e. 22.4x accumulated speedup) is obtainable with the new asynchronous SGD algorithm on heterogeneous clusters.

Henry Crosby, Stephen A. Jarvis, Paul Davis (2016)Spatially-Intensive Decision Tree Prediction of Traffic Flow across the entire UK Road Network, In: Proceedings / IEEE International Symposium on Distributed Simulation and Real-Time Applications7789880pp. 116-119 IEEE

This paper introduces a novel approach to predicting UK-wide daily traffic counts on all roads in England and Wales, irrespective of sensor data availability. A key finding of this research is that many roads in a network may have no local connection, but may still share some common law, and this fact can be exploited to improve simulation. In this paper we show that: (1) Traffic counts are a function of dependant spatial, temporal and neighbourhood variables; (2) Large open-source data, such as school location and public transport hubs can, with appropriate GIS and machine learning, assist the prediction of traffic counts; (3) Real-time simulation can be scaled-up to large networks with the aid of machine learning and, (4) Such techniques can be employed in real-world tools. Validation of the proposed approach demonstrates an 88.2% prediction accuracy on traffic counts across the UK.

S. J. Pennycook, S. D. Hammond, S. A. Wright, J. A. Herdman, I. Miller, S. A. Jarvis (2013)An investigation of the performance portability of OpenCL, In: Journal of parallel and distributed computing73(11)pp. 1439-1450 Elsevier

This paper reports on the development of an MPI/OpenCL implementation of LU, an application-level benchmark from the NAS Parallel Benchmark Suite. An account of the design decisions addressed during the development of this code is presented, demonstrating the importance of memory arrangement and work-item/work-group distribution strategies when applications are deployed on different device types. The resulting platform-agnostic, single source application is benchmarked on a number of different architectures, and is shown to be 1.3-1.5x slower than native FORTRAN 77 or CUDA implementations on a single node and 1.3-3.1x slower on multiple nodes. We also explore the potential performance gains of OpenCL's device fissioning capability, demonstrating up to a 3 x speed-up over our original OpenCL implementation. (C) 2012 Elsevier Inc. All rights reserved.

Stephen Jarvis, Steven Wright, Simon Hammond (2014)High performance computing systems, In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)8551 Springer

This book constitutes the refereed proceedings of the 4th International Workshop, PMBS 2013 in Denver, CO, USA in November 2013. The 14 papers presented in this volume were carefully reviewed and selected from 37 submissions. The selected articles broadly cover topics on massively parallel and high-performance simulations, modeling and simulation, model development and analysis, performance optimization, power estimation and optimization, high performance computing, reliability, performance analysis, and network simulations

Junwei Cao, S. A. Jarvis, S. Saini, G. R. Nudd (2003)GridFlow: Workflow management for grid computing, In: 3rd IEEE International Symposium on Cluster Computing and the Grid1199369pp. 198-205
Konstantin Klemmer, Daniel B. Neill, Stephen A. Jarvis (2018)Modeling rape reporting delays using spatial, temporal and social features, In: arXiv
Malik Shahzad K. Awan, Stephen A. Jarvis (2012)MalikCredit - A New Credit Unit for P2P Computing, In: G Min, L Lefevre, J Hu, L Liu, L T Yang, S Seelam (eds.), ICESS ... International Conference on Embedded Software and Systems (Online)6332291pp. 1060-1065 IEEE

The peer-to-peer (P2P) paradigm operates in an uncontrolled voluntary environment and, as the shared resources contributed by the general public have no legal obligation with respect to resource provisioning, an incentive-based approach is needed to motivate and encourage participants to continue with the project and attract new volunteers. The Berkeley Open Infrastructure for Network Computing (BOINC), one of the most widely used middleware platforms in P2P systems, has devised an Accounting System to provide incentives to the project participants in order to attract more participants and retain existing ones. The primary incentive provided by the BOINC Accounting System is the award of 'credits', which themselves have no monetary value. However, one of the problems with the credit system is the use of significantly inconsistent performance results obtained from the Dhrystone and Whetstone benchmarks for calculating credits. This study analyses the existing BOINC Credits calculation system and proposes a new credit unit - MalikCredits - based on a more consistent synthetic lightweight benchmark - MalikStone, which is specifically designed for the dynamic challenges of the P2P paradigm. The results of this newly proposed credit calculation system have highlighted its superiority over the BOINC Credit calculation system in terms of consistency.

Ligang He, Deqing Zou, Zhang Zhang, Chao Chen, Hai Jin, Stephen A. Jarvis (2014)Developing resource consolidation frameworks for moldable virtual machines in clouds, In: Future generation computer systems32(1)pp. 69-81 Elsevier

This paper considers the scenario where multiple clusters of Virtual Machines (i.e., termed Virtual Clusters) are hosted in a. Cloud system consisting of a cluster of physical nodes. Multiple Virtual Clusters (VCs) cohabit in the physical cluster, with each VC offering a particular type of service for the incoming requests. In this context, VM consolidation, which strives to use a minimal number of nodes to accommodate all VMs in the system, plays an important role in saving resource consumption. Most existing consolidation methods proposed in the literature regard VMs as "rigid" during consolidation, i.e., VMs' resource capacities remain unchanged. In VC environments, QoS is usually delivered by a VC as a single entity. Therefore, there is no reason why VMs' resource capacity cannot be adjusted as long as the whole VC is still able to maintain the desired QoS. Treating VMs as "moldable" during consolidation may be able to further consolidate VMs into an even fewer number of nodes. This paper investigates this issue and develops a Genetic Algorithm (GA) to consolidate moldable VMs. The GA is able to evolve an optimized system state, which represents the VM-to-node mapping and the resource capacity allocated to each VM. After the new system state is calculated by the GA, the Cloud will transit from the current system state to the new one. The transition time represents overhead and should be minimized. In this paper, a cost model is formalized to capture the transition overhead, and a reconfiguration algorithm is developed to transit the Cloud to the optimized system state with low transition overhead. Experiments have been conducted to evaluate the performance of the GA and the reconfiguration algorithm. (C) 2012 Elsevier B.V. All rights reserved.

Ligang He, Stephen A. Jarvis, Daniel P. Spooner, Graham R. Nudd (2004)Performance-based dynamic scheduling of hybrid real-time applications on a cluster of heterogeneous workstations, In: Lecture notes in computer science2790pp. 195-200
M. Al-Ghamdi, A. P. Chester, S. A. Jarvis (2010)Predictive and dynamic resource allocation for enterprise applications, In: Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-20105578546pp. 2776-2783
Gihan R. Mudalige, Istvan Z. Reguly, Arun Prabhakar, Dario Amirante, Leigh Lapworth, Stephen A. Jarvis (2022)Towards Virtual Certification of Gas Turbine Engines With Performance-Portable Simulations, In: 2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022)2022-pp. 206-217 IEEE

We present the large-scale, computational fluid dynamics (CFD) simulation of a full gas-turbine engine compressor, demonstrating capability towards overcoming current limitations for virtual certification of aero-engine design. The simulation is carried out through a performance portable code-base on multi-core/many-core HPC clusters with a CFD-to-CFD coupled execution, combining an industrial CFD solver linked using custom coupler software. The application innovates in its design for performance portability through the OP2 domain specific library for the CFD components, allowing the automatic generation of highly optimized platform-specific parallelizations for both multi-core (CPU) and many-core (GPU) clusters from a single high-level source. The code is used for the simulation of a 4.58B node, full-annulus 10-row production-grade test compressor (DLR's Rig250), using a coupled sliding-plane setup on the ARCHER2 and Cirrus supercomputers at EPCC. The OP2 generated multiple parallelizations, together with optimized coupler configurations on heterogeneous/hybrid settings achieve, for the first time, execution of 1 revolution in less than 6 hours on 512 nodes of ARCHER2 (65k cores), with a parallel scaling efficiency of over 80% compared to a 107 node run. Results indicate a speed up of the CFD suite by an order of a magnitude (approximate to 30x) relative to current production capability. Benchmarking and performance modelling project a time-to-solution of less than 5 hours on a cluster of 488xNVIDIA V100 GPUs, about 3x-4x speedup over CPU clusters. The work demonstrates a step-change towards achieving virtual certification of aircraft engines with the requisite fidelity and tractable time-to-solution that was previously out of reach under production settings.