Exploring the performance of fine-grained synchronization and data exchange across process boundaries on modern multi-core architectures

Exploring the performance of fine-grained synchronization and data exchange across process boundaries on modern multi-core architectures

Abstract

Whether to use multiple threads in one process (MPI+X) or multiple processes (pure MPI) has long been an important question in HPC. Techniques like in situ analysis and visualization further complicate matters, as it may be very difficult to couple the different components in a way that would allow them to run in the same process. Combined with the growing interest in task-based programming models, which often rely on fine-grained tasks and synchronization, a question arises: Is it possible to run two tightly coupled task-based applications in two separate processes efficiently or do they have to be combined into one application? Through a range of experiments on the latest Intel Xeon Scalable (Skylake) and AMD EPYC (Zen) many-core architectures, we have compared performance of fine-grained synchronization and data exchange between threads in the same process and threads in two different processes. Our experiments show that although there may be a small price to pay for having two processes, it is still possible to achieve very good performance. The key factors are utilizing shared memory, selecting the right thread affinity, and carefully selecting the way the processes are synchronized.

Grafik Top
Authors
  • Dokulil, Jiri
  • Benkner, Siegfried
Grafik Top
Shortfacts
Category
Paper in Conference Proceedings or in Workshop Proceedings (Poster)
Event Title
International Conference on Computational Science (ICCS 2019)
Divisions
Scientific Computing
Subjects
Parallele Datenverarbeitung
Event Location
Faro, Portugal
Event Type
Conference
Event Dates
12-14 Jun 2019
Date
June 2019
Export
Grafik Top