University of Illinois Chicago
Browse

Scalable and Tunable Algorithms for Adaptive All-to-all Data Exchange

thesis
posted on 2025-08-01, 00:00 authored by Ke Fan
All-to-all data exchange, involving bidirectional data transfer among all participating processes, is a fundamental class of data movement in HPC systems. This contains two modalities: uniform (fixed-size) and non-uniform (variable-size) exchange patterns. The significance of these exchange protocols has grown considerably with the advent of exascale computing, as they serve as critical communication primitives for numerous large-scale applications. These include data-driven machine learning, graph mining, fast Fourier transform, quantum computer simulations, and certain advanced preconditioners and solvers. Implementing all-to-all in a manner that scales for different workload sizes, distribution patterns, process counts, and hardware configurations is very challenging. To facilitate scalability across various configurations, tunability emerges as the principal consideration. The ability to dynamically tune execution parameters for maximum bandwidth utilization while minimizing latency overhead across different input configurations is key to achieving scalability. This requires an analytical focus on two critical metrics: the total number of communication rounds (latency-relevant) and the total volume of data exchanged (bandwidth-relevant). Existing implementations typically rely on algorithms with extremal radix (log-base) values of 2 (latency minimization) and process count (bandwidth maximization). This limits the ability to dynamically optimize performance across the latency–bandwidth spectrum and significantly constrains adaptability to diverse demands. Additionally, current approaches illustrate a fundamental gap between theoretical algorithm design and practical hardware constraints, forfeiting performance gains available through multi-tiered parallelism in modern distributed systems. This thesis addresses these challenges by designing scalable and tunable algorithms for adaptive all-to-all data exchange, enabling optimal performance across various communication demands. To achieve it, we first formulate a mathematical model that characterizes communication patterns in uniform all-to-all through parameterized expressions quantifying total communication rounds and data exchanged. This model facilitates informed radix selection and effectively balances latency and bandwidth considerations. Subsequently, we develop a tunable radix-based algorithm for uniform all-to-all, which is then extended to non-uniform cases via a novel two-phase communication scheme. We then design hierarchical parameterized algorithms for both uniform and non-uniform all-to-all, strategically leveraging the multi-tiered parallelism of modern HPC architectures. Lastly, we integrate machine learning techniques to autonomously calibrate algorithmic parameters, enabling dynamic and adaptive all-to-all optimization.

History

Advisor

Sidharth Kumar

Department

Computer Science

Degree Grantor

University of Illinois Chicago

Degree Level

  • Doctoral

Degree name

PhD, Doctor of Philosophy

Committee Member

Michael Papka Zhiling Lan Thomas Gilray Rajeev Thakur Venkatram Vishwanath

Thesis type

application/pdf

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC