Improved Data Locality Using Morton-order Curve on the Example of LU Decomposition
The LU decomposition is an important element used in many linear algebra applications. Furthermore, it is used in LINPACK to benchmark the performance of modern multi-core processor environments. These processors offer a large memory hierarchy including multiple registers and various levels of cache. Registers or L1 data cache are small in size but also very fast. The memory of the L2 or L3 cache is usually shared among other cores and larger, but slower. For the LU decomposition, the latency of fetching data from the main memory to the registers to perform a calculation also depends on the memory access pattern of the input matrix. Here, we look at the block factorization algorithm, where the performance of the LU decomposition depends on the performance of the matrix multiplication. In both cases, the LU decomposition and the matrix multiplication, such a matrix is traversed by three nested loops. In this paper, we propose to traverse such loops in an order defined by a space-filling curve. This traversal dramatically improves data locality and offers effective exploitation of the memory hierarchy. Besides the canonical (or line-by-line) access pattern, we demonstrate the traversal in Hilbert-, Peano and Morton order. Our extensive experiments show that the Morton order (or Z-order) as well as the inverse Morton order (or И-order) has a better runtime performance in comparison to the others.
Top- Perdacher, Martin
- Plant, Claudia
- Böhm, Christian
Category |
Paper in Conference Proceedings or in Workshop Proceedings (Paper) |
Event Title |
IEEE International Conference on Big Data (Big Data) |
Divisions |
Data Mining and Machine Learning |
Subjects |
Angewandte Informatik Sonstiges Parallele Datenverarbeitung Rechnerarchitektur |
Event Location |
Atlanta, Georgia USA |
Event Type |
Conference |
Event Dates |
10.-13.12.2020 |
Date |
14 December 2020 |
Export |