Local Fast Failover Routing With Low Stretch

Local Fast Failover Routing With Low Stretch

Abstract

Network failures are frequent and disruptive, and can significantly reduce the throughput even in highly connected and regular networks such as datacenters. While many modern networks support some kind of local fast failover to quickly reroute flows encountering link failures to new paths, employing such mechanisms is known to be non-trivial, as conditional failover rules can only depend on local failure information. While over the last years, important insights have been gained on how to design failover schemes providing high resiliency, existing approaches have the shortcoming that the resulting failover routes may be unnecessarily long, i.e., they have a large stretch compared to the original route length. This is a serious drawback, as long routes entail higher latencies and introduce loads, which may cause the rerouted flows to interfere with existing flows and harm throughput. This paper presents the first deterministic local fast failover algorithms providing provable resiliency and failover route lengths, even in the presence of many concurrent failures. We present stretch-optimal failover algorithms for different network topologies, including multi-dimensional grids, hypercubes and Clos networks, as they are frequently deployed in the context of HPC clusters and datacenters. We show that the computed failover routes are optimal in the sense that no failover algorithm can provide shorter paths for a given number of link failures

Grafik Top
Authors
  • Foerster, Klaus-Tycho
  • Pignolet, Yvonne-Anne
  • Schmid, Stefan
  • Tredan, Gilles
Grafik Top
Supplemental Material
Shortfacts
Category
Journal Paper
Divisions
Communication Technologies
Subjects
Informatik Allgemeines
Journal or Publication Title
ACM SIGCOMM Computer Communication Review (2018)
ISSN
0146-4833
Page Range
pp. 35-41
Number
1
Volume
48
Date
January 2018
Export
Grafik Top