BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//UM//UM*Events//EN
CALSCALE:GREGORIAN
BEGIN:VTIMEZONE
TZID:America/Detroit
TZURL:http://tzurl.org/zoneinfo/America/Detroit
X-LIC-LOCATION:America/Detroit
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20070311T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20071104T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20241202T110327
DTSTART;TZID=America/Detroit:20241210T160000
DTEND;TZID=America/Detroit:20241210T170000
SUMMARY:Workshop / Seminar:Statistics Department Seminar Series: Kihyuk Hong\, PhD Candidate\, Department of Statistics\, University of Michigan
DESCRIPTION:Abstract: The infinite-horizon average-reward Markov decision process (MDP) provides a natural framework for sequential decision-making under uncertainty in settings where an agent interacts with the environment continuously\, such as inventory management and network routing systems. Unlike episodic settings\, where the environment is periodically reset\, continuous interaction without resets introduces unique challenges. For example\, it necessitates assumptions on the underlying MDP to avoid pathological scenarios where the agent becomes trapped in an unfavorable state with no path to recovery. Additionally\, the average reward optimality criterion complicates algorithm design\, as the corresponding Bellman operator is not a contraction\, preventing the straightforward application of optimistic value iteration algorithms.\n\nWe address reinforcement learning in infinite-horizon average-reward settings\, focusing on MDPs where both the reward function and the transition probability kernel are linear in a feature representation of state-action pairs. Existing approaches in this setting either face computational inefficiencies or rely on strong assumptions\, such as ergodicity\, to achieve order-optimal regret bounds. In this talk\, we introduce a computationally efficient algorithm that attains an order-optimal regret bound under a mild assumption on the underlying MDP. The algorithm learns a discounted-reward MDP as a surrogate for the average-reward problem. Leveraging the contraction property of the associated Bellman operator for the surrogate problem\, we design an optimistic value iteration algorithm and employ value function clipping technique for improving statistical efficiency. We show that appropriately tuning the discounting factor for the surrogate problem achieves an order-optimal regret for the original average-reward problem.\n\nhttps://kihyukh.github.io/
UID:124601-21853253@events.umich.edu
URL:https://events.umich.edu/event/124601
CLASS:PUBLIC
STATUS:CONFIRMED
CATEGORIES:seminar
LOCATION:West Hall - 411
CONTACT:
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20241202T110721
DTSTART;TZID=America/Detroit:20241210T170000
DTEND;TZID=America/Detroit:20241210T180000
SUMMARY:Livestream / Virtual:2025 CCSFP Summer Fellowship Information Session
DESCRIPTION:(UROP) at the University of Michigan (UM)\, Ann Arbor offers a 10 week paid State-Wide Summer Research Fellowship for currently enrolled Michigan community college students who are interested in transferring to U-M\, Ann Arbor\, or any other institution in the future. The fellowship runs from Tuesday\, May 27 - August 1\, 2025.
UID:129546-21863591@events.umich.edu
URL:https://events.umich.edu/event/129546
CLASS:PUBLIC
STATUS:CONFIRMED
CATEGORIES:Workshop,Urop,Undergraduate Students,Undergraduate,Transfer Students
LOCATION:Off Campus Location
CONTACT:
END:VEVENT
END:VCALENDAR