Skip to Content

Sponsors

No results

Tags

No results

Types

No results

Search Results

Events

No results
Search events using: keywords, sponsors, locations or event type
When / Where
All occurrences of this event have passed.
This listing is displayed for historical purposes.

Presented By: Department of Statistics Dissertation Defenses

Thesis Defense: Advances in Sequential Decision Making Problems with Causal and Low-rank Structures

Yangyi Lu

Defense Flyer Defense Flyer
Defense Flyer
Abstract: Bandits and Markov Decision Processes are powerful sequential decision making paradigms that have been widely applied to solve real world problems. However, existing algorithms often suffer from high sample complexity due to the large action space. In this thesis, we present several contributions to reduce the sample complexity by exploiting the problem structure.
In the first part, we study how to utilize the given causal information represented as a causal graph along with associated conditional distributions for bandit problems. We propose two algorithms, causal upper confidence bound (C-UCB) and causal Thompson Sampling (C-TS), that enjoy improved cumulative regret bounds compared with algorithms that do not use causal information. Further, we extend C-UCB and C-TS to the linear bandit setting. We also show that under certain causal structures, our algorithms scale better than the standard bandit algorithms as the number of interventions increases.

In the second part, we further explore how to utilize the given causal information for Markov Decision Processes. We introduce causal Markov Decision Processes, a new formalism for sequential decision making which combines the standard Markov Decision Process formulation with causal structures over state transition and reward functions. We propose the causal upper confidence bound value iteration (C-UCBVI) algorithm that exploits the causal structure and improves the performance of standard reinforcement learning algorithms that do not take causal knowledge into account. To tackle the large state space problem in Markov Decision Process, we further formulate causal factored Markov Decision Process and design new algorithms with reduced regret. Lastly, we explore the connection between linear Markov Decision Process and causal Markov Decision Process.

In the third part, we tackle the challenging setting where the causal information is unknown. We propose mild identifiability conditions and design new causal bandit algorithms for causal trees, causal forests and a general class of causal graphs. We prove that the regret guarantees of our algorithms greatly improve upon those of standard multi-armed bandit algorithms. Lastly, we prove our mild conditions are necessary: without them one cannot do better than standard bandit algorithms.
In the fourth part, we investigate a challenging problem associated with the causal structure: unobserved confounders. We study to what extent the unobserved confounders affect the estimation in the offline policy evaluation problem in reinforcement learning. We give the first minimax lower bound for error due to unobserved confounder. We also analyze two algorithms and show they are minimax optimal. Lastly, we propose a new model-based method and show it is never worse than the model-free method proposed in prior work.

In the last part, we explore another problem structure, the low-rank property of the ground truth parameter. We study linear bandits and generalized linear bandits, and we present algorithms via a novel of combination of online-to-confidence-set conversion and the exponentially weighted average forecaster constructed by a covering of low-rank matrices. To get around the computational intractability of covering based approaches, we propose an efficient algorithm using the subspace exploration technique. Our theoretical and empirical results demonstrate the effectiveness of utilizing the low-rank structures in reducing the regret.
Defense Flyer Defense Flyer
Defense Flyer

Explore Similar Events

  •  Loading Similar Events...

Back to Main Content