Tools and Models for Evaluating Cloud and On-Premises HPC Resources
thesis
posted on 2025-05-01, 00:00authored byMarco Bonafini
Deciding between on-premises and cloud resources presents a significant challenge for or- ganizations with diverse computational needs, ranging from small clusters to supercomputers. This thesis offers a comprehensive analysis and develops quantitative models to support this decision-making process. Central to this research is a Total Cost of Ownership (TCO) model that accounts for both capital expenditures (CAPEX) and operational expenditures (OPEX) as- sociated with on-premises HPC facilities. Implemented as a web-based tool, this model provides efficient and detailed cost assessments that complement popular cloud pricing calculators. This integration enables users to perform direct and customizable comparisons between on-premises and cloud-based solutions seamlessly.
However, determining the most suitable infrastructure involves more than identifying the cheaper option. Therefore, this thesis also conducts an in-depth analysis of real-world ap- plications and workloads, with a particular focus on High Performance Computing (HPC). By examining workload-specific requirements and infrastructure considerations, this study provides practical insights into the cost-performance trade-offs between cloud and on-premises deploy- ments. Case studies, including examples from Argonne National Laboratory, demonstrate how workload characteristics and job mixes influence the overall TCO and inform strategic resource planning. The findings of this research aim to guide stakeholders in optimizing cost and performance for future infrastructure investments, supporting the ideal allocation and provisioning of com- putational resources across various organizational needs.