Supervisor of Doctorate Candidates
Supervisor of Master's Candidates
Fast Parameter Synchronization for Distributed Learning with Selective Multicast
DOI number:10.1109/ICC45855.2022.9838266
Journal:ICC 2022 - IEEE International Conference on Communications
Abstract:Recent advances in distributed machine learning show theoretically and empirically that, for many models, provided workers would participate in the synchronizations eventually, i) the training still converges, even if only p workers take part in each round of synchronization, and ii) a larger p generally leads to a faster rate of convergence. These findings shed light on eliminating the bottleneck effects of parameter synchronization in large-scale data-parallel distributed training, having motivated several optimization designs.In this paper, we focus on optimizing the parameter synchronization for peer-to-peer distributed learning, in which workers generally broadcast or multicast their updated parameters to others for synchronization, and propose SELMCAST, an expressive and Pareto-optimal multicast receiver selection algorithm, to achieve the goal. Compared with the state-of-the-art design that randomly selects exactly p receivers for each worker’s multicast in a bandwidth-agnostic way, SELMCAST chooses receivers based on the global view of their available bandwidth and loads, yielding two advantages. Firstly, it could optimize the bottleneck sending rate, thus cutting down the time cost of parameter synchronization. Secondly, when more than p receivers are with sufficient bandwidth, they would be selected as many as possible, bringing benefits to the convergence of training. Extensive evaluations show that SELMCAST is efficient and always achieves near-optimal performance.
Co-author:Shouxi Luo,Pingzhi Fan,Ke Li,Huanlai Xing,Long Luo,Hongfang Yu
Document Code:10.1109/ICC45855.2022.9838266
ISSN No.:1938-1883
Translation or Not:no
Date of Publication:2022-03-20
The Last Update Time : ..