Presented By: Industrial & Operations Engineering
IOE 899: High-dimensional Optimization with Applications to Compute-Optimal Neural Scaling Laws
Courtney Paquette with McGill University
About the speaker: Courtney Paquette is an assistant professor at McGill University and a CIFAR Canada AI chair, MILA. She was awarded a Sloan Research Fellowship in Computer Science in 2024. Paquette’s research broadly focuses on designing and analyzing algorithms for large-scale optimization problems, motivated by applications in data science. She is also interested in scaling limits of stochastic learning algorithms. She received her PhD from the mathematics department at the University of Washington (2017), held postdoctoral positions at Lehigh University (2017-2018) and the University of Waterloo (NSF postdoctoral fellowship, 2018-2019), and works 20% as a research scientist at Google DeepMind, Montreal.
Abstract: Given the massive scale of modern ML models, we now only get a single shot to train them effectively. This restricts our ability to test multiple architectures and hyper-parameter configurations. Instead, we need to understand how these models scale, allowing us to experiment with smaller problems and then apply those insights to larger-scale models. In this talk, I will present a framework for analyzing scaling laws in stochastic learning algorithms using a power-law random features model, leveraging high-dimensional probability and random matrix theory. I will then use this scaling law to address the compute-optimal question: How should we choose model size and hyper-parameters to achieve the best possible performance in the most compute-efficient manner?
Abstract: Given the massive scale of modern ML models, we now only get a single shot to train them effectively. This restricts our ability to test multiple architectures and hyper-parameter configurations. Instead, we need to understand how these models scale, allowing us to experiment with smaller problems and then apply those insights to larger-scale models. In this talk, I will present a framework for analyzing scaling laws in stochastic learning algorithms using a power-law random features model, leveraging high-dimensional probability and random matrix theory. I will then use this scaling law to address the compute-optimal question: How should we choose model size and hyper-parameters to achieve the best possible performance in the most compute-efficient manner?
Explore Similar Events
-
Loading Similar Events...