Skip to content

Latest commit

 

History

History
51 lines (51 loc) · 2.08 KB

2024-04-18-battash24a.md

File metadata and controls

51 lines (51 loc) · 2.08 KB
title abstract layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
Revisiting the Noise Model of Stochastic Gradient Descent
The effectiveness of stochastic gradient descent (SGD) in neural network optimization is significantly influenced by stochastic gradient noise (SGN). Following the central limit theorem, SGN was initially described as Gaussian, but recently Simsekli et al (2019) demonstrated that the $S\alpha S$ Lévy distribution provides a better fit for the SGN. This assertion was purportedly debunked and rebounded to the Gaussian noise model that had been previously proposed. This study provides robust, comprehensive empirical evidence that SGN is heavy-tailed and is better represented by the $S\alpha S$ distribution. Our experiments include several datasets and multiple models, both discriminative and generative. Furthermore, we argue that different network parameters preserve distinct SGN properties. We develop a novel framework based on a Lévy-driven stochastic differential equation (SDE), where one-dimensional Lévy processes describe each parameter. This leads to a more accurate characterization of the dynamics of SGD around local minima. We use our framework to study SGD properties near local minima; these include the mean escape time and preferable exit directions.
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
battash24a
0
Revisiting the Noise Model of Stochastic Gradient Descent
4780
4788
4780-4788
4780
false
Battash, Barak and Wolf, Lior and Lindenbaum, Ofir
given family
Barak
Battash
given family
Lior
Wolf
given family
Ofir
Lindenbaum
2024-04-18
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
238
inproceedings
date-parts
2024
4
18