posted on 2025-08-01, 00:00authored byAlessandro Martinolli
SUMMARY
High-Performance Computing (HPC) systems play a pivotal role in modern scientific and en-
gineering advancements. However, the increasing demand for computational power comes with
significant energy consumption challenges. This thesis investigates the impact of GPU fre-
quency tuning and power capping on performance and energy efficiency across three generations
of NVIDIA GPUs: Pascal (P100), Volta (V100), and Ampere (A100). The study leverages the
Altis Benchmark Suite to comprehensively evaluate the performance and energy behavior of
diverse workloads. Through systematic power management strategies, including power capping
and frequency tuning, we aim to identify optimal configurations that balance computational
performance and energy consumption. The results reveal that power capping is particularly
effective for compute-bound workloads, while frequency tuning provides finer-grained control
over energy efficiency, especially for architecture-specific optimizations. Additionally, the ex-
periments demonstrate that the Ampere architecture (A100) offers more precise control over
power-performance trade-offs compared to its predecessors, highlighting the evolution of power
management capabilities in modern GPUs. A novel contribution of this research is the integra-
tion of predictive modeling to estimate energy consumption under varying frequency and power
cap configurations. Three machine learning models were employed: Gaussian Process Regres-
sion (GPR), Long Short-Term Memory (LSTM) neural networks, and Random Forest (RF).
These models were trained and validated on experimental data to forecast energy consump-
tion trends, allowing for adaptive energy-aware strategies in HPC environments. The results demonstrate that GPR excels in capturing smooth, non-linear patterns, LSTM effectively han-
dles workloads with temporal dependencies, and RF provides robust performance for diverse
benchmarks. This predictive framework not only enables real-time optimization but also as-
sists in developing energy-efficient scheduling strategies for future HPC systems. Furthermore,
this work introduces a classification of benchmark behaviors into consistent and non-consistent
categories, aiding in the identification of workloads that benefit the most from specific power
management techniques. By establishing clear guidelines for frequency tuning and power cap-
ping, this thesis provides practical recommendations for optimizing energy efficiency without
compromising performance. In conclusion, this research enhances the understanding of GPU
power management techniques and introduces a predictive framework for dynamic energy op-
timization in HPC clusters. The findings pave the way for more efficient and sustainable
high-performance computing systems, enabling better resource allocation, reduced operational
costs, and improved environmental impact
History
Language
en
Advisor
Zhiling Lan
Department
Computer Science
Degree Grantor
University of Illinois Chicago
Degree Level
Masters
Degree name
MS, Master of Science
Committee Member
Marco Domenico Santambrogio
Antonio Rosario Miele
Michael Papka