University of Illinois Chicago
Browse

Declarative Analytics on Heterogeneous HPC Systems

thesis
posted on 2025-08-01, 00:00 authored by Ahmedur Rahman Shovon
The emergence of exascale systems marks a transforming era in high-performance computing (HPC) powered by extensive use of GPUs. GPGPU's popularity in HPC, due to performance gains and power efficiency, demands redesigning traditional algorithms to exploit GPU parallelism. However, declarative languages, like Datalog, can directly leverage these advancements due to their ability to express complex problems through simple rules and queries, which can be efficiently compiled into relational algebra operations for execution on GPGPUs. Integrating Datalog's declarative syntax with GPGPU's computational power enables scalable declarative analytics across big data, graph mining, and program analysis on HPC systems. While recent advancements have focused on multi-threaded and multi-core implementations of Datalog, the evolution of exascale systems presents a compelling opportunity to extend Datalog’s capabilities to multi-node, multi-GPU environments. This thesis addresses this gap by developing the first multi-GPU, multi-node Datalog engine. First we investigate the parallelization of iterated operations involving relational algebra primitives on GPUs, which are fundamental to Datalog operations. Then, we address challenges specific to heterogeneous architectures, including optimized communication strategies, recursive aggregation techniques, and efficient join operations, all tailored for a heterogeneous Datalog backend. We focus on optimizing specialized Datalog implementations for graph algorithms, including path-finding and topology-based feature extraction. For testing and benchmarking of the algorithms, we utilize publicly available datasets from the Stanford Large Network Dataset Collection and the SuiteSparse Matrix Collection. Our research extends beyond traditional graph mining and program analysis, exploring Datalog's potential in emerging domains such as topological data analysis, machine learning, and visual analytics for high-dimensional data. Evaluating power consumption alongside performance enhancement is increasingly vital in HPC systems, as energy efficiency significantly impacts operational sustainability and cost-effectiveness. Thus we conduct power analysis across GPU-based Datalog engines, which differ primarily in their recursive join strategies and underlying data structures. We evaluate how variations in implementation techniques for the same application, executed on identical hardware and datasets, influence power consumption. By advancing Datalog's applicability in exascale environments, we aim to demonstrate its scalability and suitability for performance and energy-efficient analysis of complex data on next-generation computing platforms.

History

Language

  • en

Advisor

Sidharth Kumar

Department

Computer Science

Degree Grantor

University of Illinois Chicago

Degree Level

  • Doctoral

Degree name

PhD, Doctor of Philosophy

Committee Member

Michael E. Papka Zhiling Lan Stavros Sintos Gopikrishna Deshpande Thomas Gilray

Thesis type

application/pdf

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC