Cache-oblivious High-performance Similarity Join
A similarity join combines vectors based on a distance condition. Typically, such algorithms apply a filter step (by indexing or sorting) and then refine pairs of candidate vectors. In this paper, we propose to refine the pairs in an order defined by a space-filling curve which dramatically improves data locality. Modern multi-core microprocessors are supported by a deep memory hierarchy including RAM, various levels of cache, and registers. The space-filling curve makes our proposed algorithm cache-oblivious to fully exploit the memory hierarchy and to reach the possible peak performance of a multi-core processor. Our novel space-filling curve called Fast General Form (FGF) Hilbert solves a number of limitations of well-known approaches: it is non-recursive, it is not restricted to traverse squares, and it has a constant time and space complexity. As we demonstrate the easy transformation from conventional into cache-oblivious loops we believe that many algorithms for complex joins and other database operators could be transformed systematically into cache-oblivious SIMD and MIMD parallel algorithms.
Top- Perdacher, Martin
- Plant, Claudia
- Böhm, Christian
Category |
Paper in Conference Proceedings or in Workshop Proceedings (Paper) |
Event Title |
Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019 |
Divisions |
Data Mining and Machine Learning |
Subjects |
Datenbanken Parallele Datenverarbeitung |
Event Location |
Amsterdam |
Event Type |
Conference |
Event Dates |
June 30 - July 5, 2019 |
Series Name |
Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019 |
ISSN/ISBN |
978-1-4503-5643-5 |
Page Range |
pp. 87-104 |
Date |
2019 |
Official URL |
http://dx.doi.org/10.1145/3299869.3319859 |
Export |