Skip to Content

Sponsors

No results

Keywords

No results

Types

No results

Search Results

Events

No results
Search events using: keywords, sponsors, locations or event type
When / Where

Presented By: Applied Interdisciplinary Mathematics (AIM) Seminar - Department of Mathematics

AIM Seminar: Mean-Field Dynamics of Transformers: From Modeling to Clustering and Critical Scaling

Shi Chen (MIT)

Abstract: Self-attention is a central component of modern transformer architectures and is one of the key mechanisms behind the success of Large Language Models. Understanding its mathematical structure is therefore essential for explaining how these models process information and learn useful representations. In this talk, I will describe an interacting-particle perspective on self-attention, which has been developed in recent work by several authors as a fruitful framework for analyzing transformer dynamics. Unlike the classical mean-field theory for deep neural networks, where the particles are neurons and the mean-field limit is tied to overparameterization, here the particles are tokens whose representations evolve through attention interactions. This mean-field perspective leads to a new viewpoint on transformer dynamics, with consequences for both theory and practice. In particular, I will explain how it helps illuminate clustering behavior in deep transformers and the critical temperature scaling laws that arise in many frontier models.

Contact: Zhiyan Ding

Explore Similar Events

  •  Loading Similar Events...

Keywords


Back to Main Content