Data Allocation Based on Evolutionary Data Popularity Clustering
This study is motivated by the high-energy physics experiment ATLAS, which is one of the four major experiments at the Large Hadron Collider at CERN. It comprises 130 data centers worldwide with datasets in the Petabyte range. Processing data across the grid, transfer delays and subsequent performance loss became an issue. The two major costs are the waiting time until input data is ready and the job computation time. In the ATLAS workflows, the input to computational jobs is based on grouped datasets. The waiting time stems mainly from WAN transfers between data centers when job properties require execution at one data center but the dataset is distributed among other data centers. Our novel data allocation algorithm redistributes the constituent files of datasets such that the job effciency is increased in terms of the cost metric. We propose an evolutionary algorithm that addresses the data allocation problem in a network based on data popularity and clustering. We use the job's file transfers as the main metric and show that we can gradually improve job waiting times by faster input data readiness.
Top- Vamosi, Ralf
- Lassnig, Mario
- Schikuta, Erich
Category |
Paper in Conference Proceedings or in Workshop Proceedings (Paper) |
Event Title |
18th Annual International Conference on Computational Science (ICCS 2018) |
Divisions |
Workflow Systems and Technology |
Subjects |
Datenverarbeitungsmanagement Datenbanken Datenspeicher |
Event Location |
Wuxi, China |
Event Type |
Conference |
Event Dates |
11/06/18 - 13/06/18 |
Series Name |
Lecture Notes in Computer Science |
ISSN/ISBN |
0302-9743/978-3-319-93697-0 |
Page Range |
pp. 153-166 |
Date |
2018 |
Export |