Presented By: Industrial & Operations Engineering
IOE 899: Finding the needle in the haystack: How gradient descent converges to low-dimensional solutions in over-parameterized models
Salar Fattahi

About the speaker: Salar Fattahi is an assistant professor of Industrial and Operations Engineering at the University of Michigan. He received his PhD from UC Berkeley in 2020. He has been the recipient of a National Science Foundation CAREER Award and the Deans’ MLK Spirit Award. His research has been recognized by multiple awards, including six best paper awards from INFORMS, such as the 2023 INFORMS Junior Faculty Interest Group Best Paper Award, the 2024 INFORMS Data Mining Best General Paper Award, and the 2022 INFORMS Computing Society Best Student Paper Award. He serves as an Associate Editor for the INFORMS Journal on Data Science, and as an area chair for several premier conferences, including NeurIPS, ICML, and ICLR. His research is currently supported by two grants from the National Science Foundation and one from the Office of Naval Research.
Abstract: In contemporary machine learning, realistic models exhibit increasing nonconvexity and overwhelming overparameterization. This nonconvex nature often leads to numerous undesirable or "spurious" local solutions, while overparameterization exacerbates the risk of overfitting. Yet, simple “short-sighted” algorithms, such as gradient descent (GD) or its variants, often find the needle in the haystack: they converge to low-dimensional solutions even when such structures are neither explicitly encoded in the model nor required by the algorithm. This talk delves into explaining this desirable performance of GD-based algorithms by studying their fine-grained trajectory on over-parameterized models, spanning from low-rank models to deep neural networks.
Abstract: In contemporary machine learning, realistic models exhibit increasing nonconvexity and overwhelming overparameterization. This nonconvex nature often leads to numerous undesirable or "spurious" local solutions, while overparameterization exacerbates the risk of overfitting. Yet, simple “short-sighted” algorithms, such as gradient descent (GD) or its variants, often find the needle in the haystack: they converge to low-dimensional solutions even when such structures are neither explicitly encoded in the model nor required by the algorithm. This talk delves into explaining this desirable performance of GD-based algorithms by studying their fine-grained trajectory on over-parameterized models, spanning from low-rank models to deep neural networks.