Data Allocation Based on Evolutionary Data Popularity Clustering

Data Allocation Based on Evolutionary Data Popularity Clustering

Abstract

This study is motivated by the high-energy physics experiment ATLAS, which is one of the four major experiments at the Large Hadron Collider at CERN. It comprises 130 data centers worldwide with datasets in the Petabyte range. Processing data across the grid, transfer delays and subsequent performance loss became an issue. The two major costs are the waiting time until input data is ready and the job computation time. In the ATLAS workflows, the input to computational jobs is based on grouped datasets. The waiting time stems mainly from WAN transfers between data centers when job properties require execution at one data center but the dataset is distributed among other data centers. Our novel data allocation algorithm redistributes the constituent files of datasets such that the job effciency is increased in terms of the cost metric. We propose an evolutionary algorithm that addresses the data allocation problem in a network based on data popularity and clustering. We use the job's file transfers as the main metric and show that we can gradually improve job waiting times by faster input data readiness.

Grafik Top
Authors
  • Vamosi, Ralf
  • Lassnig, Mario
  • Schikuta, Erich
Grafik Top
Shortfacts
Category
Paper in Conference Proceedings or in Workshop Proceedings (Paper)
Event Title
18th Annual International Conference on Computational Science (ICCS 2018)
Divisions
Workflow Systems and Technology
Subjects
Datenverarbeitungsmanagement
Datenbanken
Datenspeicher
Event Location
Wuxi, China
Event Type
Conference
Event Dates
11/06/18 - 13/06/18
Series Name
Lecture Notes in Computer Science
ISSN/ISBN
0302-9743/978-3-319-93697-0
Page Range
pp. 153-166
Date
2018
Export
Grafik Top