University of Illinois at Chicago
Browse
- No file added yet -

Configurable Algorithms for All-to-All Collectives

Download (5.41 MB)
conference contribution
posted on 2024-07-08, 20:26 authored by Ke Fan, Steve Petruzza, Thomas Gilray, Sidharth Kumar
MPI_Alltoall is a commonly used collective that allows a fixed-size data block to be exchanged between every pair of processes. The function can be implemented through a logarithmic number of point-to-point communication rounds, where the exact number of rounds and total data exchanged among processes depend on the log base (radix). This paper presents a mathematical foundation for studying all communication patterns for the all-to-all collective by developing parameterized formulas for total communication rounds and data exchanged. The model is used to narrow down a radix, √P (P: process count), that effectively balances latency and bandwidth concerns, yielding optimal performance-as also confirmed via evaluation on the Theta and Polaris supercomputers at ANL. We also present a novel two-layer tunable radix algorithm to take advantage of the shared-memory parallelism offered by modern systems. The algorithm decouples communication rounds into two phases that can be individually optimized to take advantage of the shared memory and high-speed interconnect separately. Our approach demonstrates improvements of up to 3.8× on Theta and 4.2× on Polaris over the vendor-optimized MPICH-based implementation of MPI_Alltoall for fast Fourier transform application.

Funding

Collaborative Research: SHF: Small: Scalable and Extensible I/O Runtime and Tools for Next Generation Adaptive Data Layouts | Funder: National Science Foundation | Grant ID: CCF-2401274

Collaborative Research: SHF: Small: Scalable and Extensible I/O Runtime and Tools for Next Generation Adaptive Data Layouts | Funder: National Science Foundation | Grant ID: 2401274

Collaborative Research: PPoSS: Large: A Full-stack Approach to Declarative Analytics at Scale | Funder: University of Alabama at Birmingham | Grant ID: 2316157

Collaborative Research: PPoSS: Large: A Full-stack Approach to Declarative Analytics at Scale | Funder: University of Alabama at Birmingham

History

Citation

Fan, K., Petruzza, S., Gilray, T.Kumar, S. (2024, May). Configurable Algorithms for All-to-All Collectives. ISC High Performance 2024 Research Paper Proceedings (39th International Conference) (pp. 1-12). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.23919/isc.2024.10528936

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC