Learning Spatiotemporal Failure Dependencies for Resilient Edge Computing Services
Edge computing services are exposed to infrastructural failures due to geographical dispersion, ad hoc deployment, and rudimentary support systems. Two unique characteristics of the edge computing paradigm necessitate a novel failure resilience approach. First, edge servers, contrary to cloud counterparts with reliable data center networks, are typically connected via ad hoc networks. Thus, link failures need more attention to ensure truly resilient services. Second, network delay is a critical factor for the deployment of edge computing services. This restricts replication decisions to geographical proximity and necessitates joint consideration of delay and resilience. In this article, we propose a novel machine learning based mechanism that evaluates the failure resilience of a service deployed redundantly on the edge infrastructure. Our approach learns the spatiotemporal dependencies between edge server failures and combines them with the topological information to incorporate link failures. Ultimately, we infer the probability that a certain set of servers fails or disconnects concurrently during service runtime. Furthermore, we introduce Dependency- and Topology-aware Failure Resilience (DTFR), a two-stage scheduler that minimizes either failure probability or redundancy cost, while maintaining low network delay. Extensive evaluation with various real-world failure traces and workload configurations demonstrate superior performance in terms of availability, number of failures, network delay, and cost with respect to the state-of-the-art schedulers.
Top- Aral, Atakan
- Brandić, Ivona
Category |
Journal Paper |
Divisions |
Scientific Computing |
Subjects |
Datenverarbeitungsmanagement Kuenstliche Intelligenz Parallele Datenverarbeitung Systemarchitektur Allgemeines |
Journal or Publication Title |
IEEE Transactions on Parallel and Distributed Systems (TPDS) |
ISSN |
1045-9219 |
Publisher |
IEEE |
Page Range |
pp. 1578-1590 |
Number |
7 |
Volume |
32 |
Date |
22 December 2020 |
Export |