Exploring the performance of fine-grained synchronization and data exchange across process boundaries on modern multi-core architectures
Whether to use multiple threads in one process (MPI+X) or multiple processes (pure MPI) has long been an important question in HPC. Techniques like in situ analysis and visualization further complicate matters, as it may be very difficult to couple the different components in a way that would allow them to run in the same process. Combined with the growing interest in task-based programming models, which often rely on fine-grained tasks and synchronization, a question arises: Is it possible to run two tightly coupled task-based applications in two separate processes efficiently or do they have to be combined into one application? Through a range of experiments on the latest Intel Xeon Scalable (Skylake) and AMD EPYC (Zen) many-core architectures, we have compared performance of fine-grained synchronization and data exchange between threads in the same process and threads in two different processes. Our experiments show that although there may be a small price to pay for having two processes, it is still possible to achieve very good performance. The key factors are utilizing shared memory, selecting the right thread affinity, and carefully selecting the way the processes are synchronized.
Top- Dokulil, Jiri
- Benkner, Siegfried
Category |
Paper in Conference Proceedings or in Workshop Proceedings (Poster) |
Event Title |
International Conference on Computational Science (ICCS 2019) |
Divisions |
Scientific Computing |
Subjects |
Parallele Datenverarbeitung |
Event Location |
Faro, Portugal |
Event Type |
Conference |
Event Dates |
12-14 Jun 2019 |
Date |
June 2019 |
Export |