BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//UM//UM*Events//EN
CALSCALE:GREGORIAN
BEGIN:VTIMEZONE
TZID:America/Detroit
TZURL:http://tzurl.org/zoneinfo/America/Detroit
X-LIC-LOCATION:America/Detroit
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20070311T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20071104T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260415T132944
DTSTART;TZID=America/Detroit:20260428T113000
DTEND;TZID=America/Detroit:20260428T130000
SUMMARY:Lecture / Discussion:Topics on the Generalization and Learnability of Modern Machine Learning
DESCRIPTION:Learning theory is a subfield of machine learning research where we analyze the theoretical properties of machine learning problems and algorithms through mathematics. This thesis is a culmination of three standalone works in the field of learning theory.\n\nFirst\, we analyze the generalization bounds of the Transformer architecture to show that it does not depend on maximum sequence length. To do this\, we analyze the Rademacher Complexity of the architecture and create novel covering number bounds on linear functions that do not depend on the amount of samples. We also run a simulation and show the results support our theoretical findings.\n\nIn the next chapter\, we analyze a quirk seen in the training of modern large language models. Most of these models are trained to only predict the next token of the output\; however\, the output of the model is a sequence of tokens. We study this mismatch in training optimization and output through the lens of the surrogate loss consistency framework. We analyze different ways of decoding these next-token predictors to see when we achieve asymptotic consistency for two use cases when encoded as loss functions.\n\nIn the final work of this thesis\, the theoretical learnability of multiclass forgiving 0-1 loss functions is studied through the PAC-Learnability framework. We show a generalization the Natarajan dimension [Natarajan\, 1989] characterizes the learnability of many instantiations of learning problems that use forgiving 0-1 loss functions. We also show how this setting can be used to model other known settings in the literature.
UID:147770-21901951@events.umich.edu
URL:https://events.umich.edu/event/147770
CLASS:PUBLIC
STATUS:CONFIRMED
CATEGORIES:Dissertation
LOCATION:West Hall - 335
CONTACT:
END:VEVENT
END:VCALENDAR