Optimization of Data Remapping in Data-Parallel Languages
The user-controlled mapping of data across the local memories of processing nodes is one of the central features of data-parallel languages like High Performance Fortran (HPF). Since many scientific applications typically consist of different computational phases owning each a best data mapping, dynamic remappings have proven useful in maintaining good data locality and workload balance. HPF supports remappings by procedure calls and by executing redistribute/realign-directives. But remappings can be quite expensive as communication is required to migrate the array elements to their new owning processors and can significantly degrade a program's performance. Hence, elimination of unnecessary remappings is of key importance. It is essential because even well-written HPF programs may result in unnecessary remappings. In this thesis we optimize the overall time spent for dynamic data remappings in a program run by reducing the number of executed remappings. Elimination of redundant and dead remappings does not suffice. Code motion to eliminate remappings further is required as well. Our approach eliminates unnecessary remappings by combining both partial dead code elimination and partial redundancy elimination. In this way we succeed in sinking and hoisting remappings out of loops and eliminating them on straight-line code as well. For both partial dead code elimination and partial redundancy elimination we present a bidirectional and unidirectional approach and discuss the differences. It has been shown that partial dead code elimination and partial redundancy elimination can be solved optimally when applied separately. Here we show that in general an optimal solution does not exist any more if both algorithms are combined. Further it turns out that a straight-forward interprocedural adaption of the intraprocedural approach fails. For the intraprocedural as well as interprocedural approach we arrive at a uniform framework which can be applied for both ordinary and mapping assignments. The framework provides a hierarchy of algorithms of varying power and efficiency supporting user-customized solutions. The power and flexibility of our approach are demonstrated by illustrating examples.
Top- Mehofer, Eduard
Category |
Technical Report (Technical Report) |
Divisions |
Scientific Computing |
Publisher |
PhD thesis, Institute for Software Technology and Parallel Systems, University of Vienna |
Date |
November 1998 |
Official URL |
http://www.par.univie.ac.at/publications/download/... |
Export |