BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//UM//UM*Events//EN
CALSCALE:GREGORIAN
BEGIN:VTIMEZONE
TZID:America/Detroit
TZURL:http://tzurl.org/zoneinfo/America/Detroit
X-LIC-LOCATION:America/Detroit
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20070311T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20071104T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20251119T082050
DTSTART;TZID=America/Detroit:20251118T160000
DTEND;TZID=America/Detroit:20251118T170000
SUMMARY:Workshop / Seminar:Statistics Department Seminar Series: Will Wei Sun\, Associate Professor\, Department of Quantitative Methods\, Department of Statistics (by courtesy)\, Purdue University
DESCRIPTION:Abstract: Reinforcement learning from human feedback (RLHF) has emerged as the leading approach to aligning large language models (LLMs) with human preferences. Despite its success\, two challenges remain fundamental: feedback is costly and heterogeneous across annotators\, and the resulting reward models often lack principled measures of uncertainty. This talk presents recent advances that address these challenges by integrating tools from optimal design and statistical inference into the RLHF framework. First\, I introduce a dual active learning approach\, inspired by optimal design\, that adaptively selects both conversations and annotators to maximize information gain\, improving the efficiency of limited feedback budgets. Second\, I present a framework for uncertainty quantification in reward learning\, enabling valid statistical comparisons across LLM models and more reliable best-of-n alignment policies. Together\, these results illustrate how statistics can help trustworthy and data-efficient LLM alignment.
UID:141342-21888654@events.umich.edu
URL:https://events.umich.edu/event/141342
CLASS:PUBLIC
STATUS:CONFIRMED
CATEGORIES:seminar
LOCATION:West Hall - 411
CONTACT:
END:VEVENT
END:VCALENDAR