Classification Framework for the Parallel Hash Join with a Performance Analysis on the GPU
The hash join operator is one of the most important relational operators in database applications and a prominent research topic in the domain of parallel processing. However, up to date, no consistent algorithm design guidelines for high-performance implementations on parallel platforms have been derived from the available experimental results. In this work we define a taxonomy of the parallel hash join operator landscape and categorize state of the art research accordingly. Moreover, we implement and benchmark three taxonomy types: A sequential implementation on the CPU, a hybrid CPU-GPU implementation as well as a fully parallel version on the GPU. The results show that (1) the hybrid CPUGPU type outperforms the other two, showcasing the benefits of a good fit between algorithm type and hardware platform choice, (2) the poor end-to-end performance of the GPU-only type highlights the impact of GPU specific synchronization and contention issues that appear with an unfit design choice, (3) parallelization improves runtime by a factor of 2.2X in the end-to-end algorithm, a factor of 83X in the join phase and shows good scaling behavior with increasing number of threads. This proves that the GPU is a valuable co-processor option for computation offloading in database applications. We anticipate this classification framework to be a starting-point for design decisions for parallel big data hash join operators on other heterogeneous systems.
Top- Wozniak, Kinga Anna
- Schikuta, Erich
Category |
Paper in Conference Proceedings or in Workshop Proceedings (Paper) |
Event Title |
15th IEEE International Symposium on Parallel and Distributed Processing with Applications ISPA 2017 |
Divisions |
Workflow Systems and Technology |
Subjects |
Datenbanken Parallele Datenverarbeitung |
Event Location |
Guangzhou, China |
Event Type |
Conference |
Event Dates |
December 12-15, 2017 |
Date |
December 2017 |
Export |