Space-filling curves for improved cache-locality in shared memory environments
Today’s microprocessors consist of multiple cores each of which can perform multiple additions, multiplications, or other operations simultaneously in one clock cycle. In shared-memory environments at least two types of parallelism must be applied to exploit the maximum performance of the algorithm: MIMD (Multiple Instruction Multiple Data) where each core simultaneously perform different operations on different types of input data streams and SIMD (Single Instruction Multiple Data) where within a core, the same operation is executed at once on various data. Additionally, modern microprocessors offer a rich memory hierarchy, including various levels of cache and registers. Some of these memories (like main memory, L3 cache) are big but slow and shared among all cores. Others (registers, cache lines, L1 cache) are fast and exclusively assigned to a single core but small. Only if data access has a high locality, we can avoid excessive data transfers between the different levels of the memory hierarchy. Algorithms in linear algebra are often defined by three or more nested loops. In this thesis, we propose to traverse such loops in an order defined by a space-filling curve, such as the Hilbert or the Morton-order curve. The low-level kernels used in this work are based on Advanced Vector Extensions (AVX), which allow the exploitation of several levels of parallelism in shared memory environments. We apply our space-filling curves in several algorithms ranging from linear algebra (matrix-multiplication, Cholesky decomposition, LU factorization) or clustering (K-means) as well as in database queries (i.e., similarity join).
Top- Perdacher, Martin
Category |
Thesis (PhD) |
Divisions |
Data Mining and Machine Learning |
Subjects |
Datenbanken Kuenstliche Intelligenz Datenspeicher Parallele Datenverarbeitung |
Date |
21 January 2021 |
Official URL |
http://othes.univie.ac.at/65584/ |
Export |