posted on 2025-08-01, 00:00authored byHengrui Zhang
The practical deployment of Graph Neural Networks (GNNs), a primary form of deep learning on graphs, is hindered by intertwined challenges of effectiveness and scalability. This thesis, "Towards Effective and Scalable Deep Learning on Graph-Structured Data," proposes novel methodologies to address these limitations across four main research thrusts.
To address scalability in learning node embeddings, one paper introduces CCA-SSG, a self-supervised framework that learns robust node embeddings. It efficiently avoids the computational burden of negative sampling by using a feature decorrelation objective to prevent representational collapse.
To enhance MLP-based models, which are faster but less accurate than GNNs, two papers are presented. OrthoReg tackles an "over-correlation" issue with a soft orthogonality constraint, making MLPs competitive with leading GNNs. The second framework achieves true end-to-end MLP efficiency by offloading graph computations to a one-time pre-processing step, eliminating iterative complexity.
Finally, for training on massive graphs, this thesis proposes Data-Centric Graph Condensation (DCGC). This framework recasts condensation as a distribution matching problem, creating a small, task-agnostic synthetic graph. This approach significantly improves cross-architecture generalization and reduces condensation time compared to traditional gradient-matching techniques. The proposed models are validated on public benchmarks, demonstrating significant improvements in performance and computational efficiency.