Distributed ML Training with Large-scale Accelerator Datasets for Automated Performance Optimization | Funder: Discovery Partner Institute (DPI)
Optimizing Distributed Training for Large and Noisy Data | Funder: CAHSI-Google Institutional Research Program (NSF, Google)
History
Citation
Almasi, H., Mishra, H., Vamanan, B.Ravi, S. N. (2024, May). Flag Aggregator: Scalable Distributed Training under Failures and Augmented Losses using Convex Optimization. International Conference on Learning Representations (ICLR), Vienna, Austria.