Presented By: Industrial & Operations Engineering
IOE 899: Optimization methods for compressing large neural networks
Rahul Mazumder with MIT Sloan School of Management
About the speaker: Rahul Mazumder is the NTU Associate Professor of Operations Research and Statistics at MIT Sloan School of Management. He is affiliated with MIT Operations Research Center, and MIT
Center for Statistics. His research interests are at the intersection of statistics, machine learning and mathematical programming (large-scale convex and mixed integer optimization), and their applications to industry, the government, and the sciences. He is a recipient of the Leo Breiman Junior Award from the American Statistical Association, International Indian Statistical Association Early Career Award in Statistics and Data Science, INFORMS Donald P. Gaver, Jr. Early Career Award for Excellence in Operations Research, INFORMS Optimization Society Young Researchers Prize, Office of Naval Research Young Investigator Award, INFORMS ICS Prize (Honorable Mention). He is currently serving as an Associate/Action Editor of the Annals of Statistics, Bernoulli, Operations Research, and the Journal of Machine Learning Research.
Abstract: Foundation models have achieved remarkable performance across various domains, but their large model sizes lead to high computational costs (storage, inference latency, memory, etc). Neural network pruning, roughly categorized as unstructured and structured, aims to reduce these costs by removing less-important parameters while retaining model utility as much as possible. Structured pruning is a practical way to improve inference latency on standard hardware in contrast to unstructured pruning, requiring specialized hardware and software. In this talk, I will discuss discrete optimization methods to address such problems. Interestingly, algorithms from sparse regression and high-dimensional statistics can be useful here. I'll discuss how model compression tools can aid interpretability in black-box decision tree ensembles; and how our investigations in large model pruning motivate new algorithms to accelerate branch-and-bound (integer programming) solvers with GPUs.
Center for Statistics. His research interests are at the intersection of statistics, machine learning and mathematical programming (large-scale convex and mixed integer optimization), and their applications to industry, the government, and the sciences. He is a recipient of the Leo Breiman Junior Award from the American Statistical Association, International Indian Statistical Association Early Career Award in Statistics and Data Science, INFORMS Donald P. Gaver, Jr. Early Career Award for Excellence in Operations Research, INFORMS Optimization Society Young Researchers Prize, Office of Naval Research Young Investigator Award, INFORMS ICS Prize (Honorable Mention). He is currently serving as an Associate/Action Editor of the Annals of Statistics, Bernoulli, Operations Research, and the Journal of Machine Learning Research.
Abstract: Foundation models have achieved remarkable performance across various domains, but their large model sizes lead to high computational costs (storage, inference latency, memory, etc). Neural network pruning, roughly categorized as unstructured and structured, aims to reduce these costs by removing less-important parameters while retaining model utility as much as possible. Structured pruning is a practical way to improve inference latency on standard hardware in contrast to unstructured pruning, requiring specialized hardware and software. In this talk, I will discuss discrete optimization methods to address such problems. Interestingly, algorithms from sparse regression and high-dimensional statistics can be useful here. I'll discuss how model compression tools can aid interpretability in black-box decision tree ensembles; and how our investigations in large model pruning motivate new algorithms to accelerate branch-and-bound (integer programming) solvers with GPUs.
Explore Similar Events
-
Loading Similar Events...