Simulating Data Access Profiles of Computational Jobs in Data Grids

Simulating Data Access Profiles of Computational Jobs in Data Grids

Abstract

The data access patterns of applications running in computing grids are changing due to the recent proliferation of high-speed local and wide area networks. The data-intensive jobs are no longer strictly required to run at the computing sites, where the respective input data are located. Instead, jobs may access the data employing arbitrary combinations of dataplacement, stage-in and remote data access. These data access profiles exhibit partially non-overlapping throughput bottlenecks. This fact can be exploited in order to minimize the time jobs spend waiting for input data. In this work we present a novel grid computing simulator, which puts a heavy emphasis on the various data access profiles. Its purpose is to enable reproducible performance studies on data access patterns. The fundamental assumptions underlying our simulator are justified by empirical experiments performed in the Worldwide LHC Computing Grid (WLCG) at CERN. We demonstrate how to calibrate the simulator parameters in accordance with the true system using posterior inference with likelihood-free Markov Chain Monte Carlo. Thereafter, we validate the simulator’s output with respect to authentic production workloads from WLCG, demonstrating its remarkable accuracy.

Grafik Top
Authors
  • Begy, Volodimir
  • Hermans, Joeri
  • Barisits, Martin
  • Lassnig, Mario
  • Schikuta, Erich
Grafik Top
Shortfacts
Category
Paper in Conference Proceedings or in Workshop Proceedings (Paper)
Event Title
15th IEEE International Conference on eScience 2019
Divisions
Workflow Systems and Technology
Subjects
Datenbanken
Computersimulation
Event Location
San Diego, USA
Event Type
Conference
Event Dates
September 24 – 27, 2019
Date
24 September 2019
Export
Grafik Top