Optimization of Data Remapping in Data-Parallel Languages

Optimization of Data Remapping in Data-Parallel Languages

Abstract

The user-controlled mapping of data across the local memories of processing nodes is one of the central features of data-parallel languages like High Performance Fortran (HPF). Since many scientific applications typically consist of different computational phases owning each a best data mapping, dynamic remappings have proven useful in maintaining good data locality and workload balance. HPF supports remappings by procedure calls and by executing redistribute/realign-directives. But remappings can be quite expensive as communication is required to migrate the array elements to their new owning processors and can significantly degrade a program's performance. Hence, elimination of unnecessary remappings is of key importance. It is essential because even well-written HPF programs may result in unnecessary remappings. In this thesis we optimize the overall time spent for dynamic data remappings in a program run by reducing the number of executed remappings. Elimination of redundant and dead remappings does not suffice. Code motion to eliminate remappings further is required as well. Our approach eliminates unnecessary remappings by combining both partial dead code elimination and partial redundancy elimination. In this way we succeed in sinking and hoisting remappings out of loops and eliminating them on straight-line code as well. For both partial dead code elimination and partial redundancy elimination we present a bidirectional and unidirectional approach and discuss the differences. It has been shown that partial dead code elimination and partial redundancy elimination can be solved optimally when applied separately. Here we show that in general an optimal solution does not exist any more if both algorithms are combined. Further it turns out that a straight-forward interprocedural adaption of the intraprocedural approach fails. For the intraprocedural as well as interprocedural approach we arrive at a uniform framework which can be applied for both ordinary and mapping assignments. The framework provides a hierarchy of algorithms of varying power and efficiency supporting user-customized solutions. The power and flexibility of our approach are demonstrated by illustrating examples.

Grafik Top
Authors
  • Mehofer, Eduard
Grafik Top
Shortfacts
Category
Technical Report (Technical Report)
Divisions
Scientific Computing
Publisher
PhD thesis, Institute for Software Technology and Parallel Systems, University of Vienna
Date
November 1998
Official URL
http://www.par.univie.ac.at/publications/download/...
Export
Grafik Top