Data Allocation with Neural Similarity Estimation for Data-Intensive Computing

Data Allocation with Neural Similarity Estimation for Data-Intensive Computing

Abstract

Science collaborations such as ATLAS at the high-energy particle accelerator at CERN use a computer grid to run expensive computational tasks on massive, distributed data sets. Dealing with big data on a grid demands workload management and data allocation to maintain a continuous workflow. Data allocation in a computer grid necessitates some data placement policy that is conditioned on the resources of the system and the usage of data. In part, automatic and manual data policies shall achieve a short time-to-result. There are efforts to improve data policies. Data placement/allocation is vital to coping with the increasing amount of data processing in different data centers. A data allocation/placement policy decides which locations sub-sets of data are to be placed. In this paper, a novel approach copes with the bottleneck related to wide-area file transfers between data centers and large distributed data sets with high dimensionality. The model estimates similar data with a neural network on sparse and uncertain observations and then proceeds with the allocation process. The allocation process comprises evolutionary data allocation for finding near-optimal solutions and improves over 5% on network transfers for the given data centers.

Grafik Top
Authors
  • Vamosi, Ralf
  • Schikuta, Erich
Grafik Top
Shortfacts
Category
Paper in Conference Proceedings or in Workshop Proceedings (Paper)
Event Title
ICCS 2022: International Conference on Computational Science
Divisions
Workflow Systems and Technology
Subjects
Datenverarbeitungsmanagement
Datenbanken
Datenspeicher
Event Location
London, United Kingdom
Event Type
Conference
Event Dates
21-23 June, 2022
Date
2022
Export
Grafik Top