All occurrences of this event have passed.
This listing is displayed for historical purposes.

Free Workshop / Seminar

Presented By: Department of Statistics

Ask a Question About This Event

Statistics Department Seminar Series: Kihyuk Hong, PhD Candidate, Department of Statistics, University of Michigan

Name: Statistics Department Seminar Series: Kihyuk Hong, PhD Candidate, Department of Statistics, University of Michigan
Start: 2024-12-10T16:00:00-05:00
End: 2024-12-10T17:00:00-05:00
Location: West Hall

"Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation"

Abstract: The infinite-horizon average-reward Markov decision process (MDP) provides a natural framework for sequential decision-making under uncertainty in settings where an agent interacts with the environment continuously, such as inventory management and network routing systems. Unlike episodic settings, where the environment is periodically reset, continuous interaction without resets introduces unique challenges. For example, it necessitates assumptions on the underlying MDP to avoid pathological scenarios where the agent becomes trapped in an unfavorable state with no path to recovery. Additionally, the average reward optimality criterion complicates algorithm design, as the corresponding Bellman operator is not a contraction, preventing the straightforward application of optimistic value iteration algorithms.

We address reinforcement learning in infinite-horizon average-reward settings, focusing on MDPs where both the reward function and the transition probability kernel are linear in a feature representation of state-action pairs. Existing approaches in this setting either face computational inefficiencies or rely on strong assumptions, such as ergodicity, to achieve order-optimal regret bounds. In this talk, we introduce a computationally efficient algorithm that attains an order-optimal regret bound under a mild assumption on the underlying MDP. The algorithm learns a discounted-reward MDP as a surrogate for the average-reward problem. Leveraging the contraction property of the associated Bellman operator for the surrogate problem, we design an optimistic value iteration algorithm and employ value function clipping technique for improving statistical efficiency. We show that appropriately tuning the discounting factor for the surrogate problem achieves an order-optimal regret for the original average-reward problem.

https://kihyukh.github.io/

Co-Sponsored By

Department of Statistics Seminar Series

Explore Similar Events

Loading Similar Events...

Keywords

Seminar

0 upcoming occurrence
0 expired occurrence

Happening @ Michigan

The University of Michigan Events Calendar

Sponsors

Keywords

Types

Search Results

Events

Statistics Department Seminar Series: Kihyuk Hong, PhD Candidate, Department of Statistics, University of Michigan

"Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation"

Related Links

Co-Sponsored By

Explore Similar Events

Keywords

Contact Event Organizers: Department of Statistics

When and Where

Map West Hall - 411

December 2024

Contact Us

Happening @ Michigan

The University of Michigan Events Calendar

Sponsors

Keywords

Types

Search Results

Events

Statistics Department Seminar Series: Kihyuk Hong, PhD Candidate, Department of Statistics, University of Michigan

"Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation"

Related Links

Co-Sponsored By

Share Event

Explore Similar Events

Keywords

Contact Event Organizers: Department of Statistics

When and Where

Map West Hall - 411

December 2024

Contact Us