Skip to Content

Sponsors

No results

Keywords

No results

Types

No results

Search Results

Events

No results
Search events using: keywords, sponsors, locations or event type
When / Where
All occurrences of this event have passed.
This listing is displayed for historical purposes.

Presented By: Department of Statistics Dissertation Defenses

Theoretical Advances in Reinforcement Learning: Online Average-Reward and Offline Constrained Settings

Kihyuk Hong

This dissertation presents theoretical advancements in reinforcement learning (RL), focusing on two key settings: online RL under the infinite-horizon average-reward criterion and offline constrained RL under partial data coverage.
The first part addresses the online setting, where the objective is to optimize long-run average rewards through interaction with the environment. A family of value-iteration-based algorithms is proposed by approximating the average-reward objective using a carefully tuned discounted surrogate. This part resolves an open problem by establishing a computationally efficient algorithm for linear Markov decision processes under a weak structural assumption. The proposed algorithm employs span-constrained value clipping and a decoupled planning strategy that mitigates statistical inefficiencies arising from the complexity of the function class.
The second part is motivated by safety-critical applications and studies the offline setting, where the agent must learn a policy from a fixed dataset without further interaction. Primal-dual algorithms are developed for both linear MDPs and general function-approximation regimes, based on the linear programming formulation of RL. These methods are oracle-efficient and provably sample-efficient under partial data coverage. Moreover, they extend to the constrained RL setting, where the policy must satisfy additional safety constraints defined by auxiliary reward signals and threshold levels.
Together, the contributions of this dissertation advance the theoretical foundations of reinforcement learning in settings that prioritize safety and long-term performance.

Explore Similar Events

  •  Loading Similar Events...

Keywords


Back to Main Content