Cache-oblivious High-performance Similarity Join

Cache-oblivious High-performance Similarity Join

Abstract

A similarity join combines vectors based on a distance condition. Typically, such algorithms apply a filter step (by indexing or sorting) and then refine pairs of candidate vectors. In this paper, we propose to refine the pairs in an order defined by a space-filling curve which dramatically improves data locality. Modern multi-core microprocessors are supported by a deep memory hierarchy including RAM, various levels of cache, and registers. The space-filling curve makes our proposed algorithm cache-oblivious to fully exploit the memory hierarchy and to reach the possible peak performance of a multi-core processor. Our novel space-filling curve called Fast General Form (FGF) Hilbert solves a number of limitations of well-known approaches: it is non-recursive, it is not restricted to traverse squares, and it has a constant time and space complexity. As we demonstrate the easy transformation from conventional into cache-oblivious loops we believe that many algorithms for complex joins and other database operators could be transformed systematically into cache-oblivious SIMD and MIMD parallel algorithms.

Grafik Top
Authors
  • Perdacher, Martin
  • Plant, Claudia
  • Böhm, Christian
Grafik Top
Shortfacts
Category
Paper in Conference Proceedings or in Workshop Proceedings (Paper)
Event Title
Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019
Divisions
Data Mining and Machine Learning
Subjects
Datenbanken
Parallele Datenverarbeitung
Event Location
Amsterdam
Event Type
Conference
Event Dates
June 30 - July 5, 2019
Series Name
Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019
ISSN/ISBN
978-1-4503-5643-5
Page Range
pp. 87-104
Date
2019
Official URL
http://dx.doi.org/10.1145/3299869.3319859
Export
Grafik Top