Characterization of NCCL and Unified Memory Under Normal and Oversubscribed Memory Conditions
thesis
posted on 2024-08-01, 00:00authored byRiccardo Strina
The NVIDIA Collective Communications Library (NCCL) is a multi-GPU communication library widely used in applications such as deep learning, molecular dynamics, linear algebra, and graph processing. NCCL aims to replace CUDA-Aware MPI by enabling direct communica- tion between GPUs, thereby reducing latency compared to MPI’s GPU-to-CPU communication, which is a significant bottleneck in many applications. Unified Memory (UM), which allows a single memory address space shared by all CPUs and GPUs within a system, eliminates the need for explicit memory transfers between CPUs and GPUs. It also allows memory’s over- subscription, which allows to allocate more memory than is physically available on the GPU. This technology is invaluable in scenarios where program requirements exceed available GPU memory. This research aims to evaluate the performance and energy consumption of NCCL in conjunction with Unified Memory, both under normal conditions and when the oversubscrip- tion feature is utilized. The objective of this investigation is to gain a better understanding of the practical benefits and potential limitations of integrating these technologies in diverse applications.