Exploring the performance of fine-grained synchronization and data exchange across process boundaries on modern multi-core architectures

Content

Abstract
Authors
Projects
Shortfacts

Abstract

Whether to use multiple threads in one process (MPI+X) or multiple processes (pure MPI) has long been an important question in HPC. Techniques like in situ analysis and visualization further complicate matters, as it may be very difficult to couple the different components in a way that would allow them to run in the same process. Combined with the growing interest in task-based programming models, which often rely on fine-grained tasks and synchronization, a question arises: Is it possible to run two tightly coupled task-based applications in two separate processes efficiently or do they have to be combined into one application? Through a range of experiments on the latest Intel Xeon Scalable (Skylake) and AMD EPYC (Zen) many-core architectures, we have compared performance of fine-grained synchronization and data exchange between threads in the same process and threads in two different processes. Our experiments show that although there may be a small price to pay for having two processes, it is still possible to achieve very good performance. The key factors are utilizing shared memory, selecting the right thread affinity, and carefully selecting the way the processes are synchronized.

Top

Authors

Dokulil, Jiri
Benkner, Siegfried

Top

Projects

Dynamic Runtime Systems For Future Parallel Architectures

Top

Shortfacts

Category	Paper in Conference Proceedings or in Workshop Proceedings (Poster)
Event Title	International Conference on Computational Science (ICCS 2019)
Divisions	Scientific Computing
Subjects	Parallele Datenverarbeitung
Event Location	Faro, Portugal
Event Type	Conference
Event Dates	12-14 Jun 2019
Date	June 2019
Export

Top