Skip to Content

Sponsors

No results

Keywords

No results

Types

No results

Search Results

Events

No results
Search events using: keywords, sponsors, locations or event type
When / Where
All occurrences of this event have passed.
This listing is displayed for historical purposes.

Presented By: Department of Statistics Dissertation Defenses

Thesis Defense: Random Perturbations in Stochastic Bandit Problems

Baekjin Kim

Baekjin, Kim Defense Flyer Baekjin, Kim Defense Flyer
Baekjin, Kim Defense Flyer
Abstract:

Two standard exploration strategies in stochastic bandits are Upper Confidence Bound and Thomson sampling. The former is a deterministic algorithm built upon the optimism in face of uncertainty, and the latter is a Bayesian solution that maximizes the expected rewards via random sampling from posterior distribution. There are significant benefits of using the controlled injection of randomness in bandit algorithms; (1) randomized exploration empirically performs well and is robust to corrupted or delayed feedback, and (2) it often uses perturbation as a means to achieve oracle-efficient computation while implementing UCB algorithms can lead to NP-hard optimization problems even for convex action sets. However, the empirical and computational benefits of randomness come at a cost: its theoretical guarantees are often not available or worse than UCB algorithm since its innate randomness makes its mathematical analysis difficult.

First, we present the first unified regret analysis for perturbation algorithms in the stochastic multi-armed bandit problem. Our analysis relies on the simple but powerful observation that Thompson sampling with Gaussian priors and rewards can also be interpreted as a perturbation algorithm with Gaussian perturbations. We are able to generalize both the upper and lower bounds from the work of Agrawal and Goyal (2013) in two respects; (1) from the special Gaussian perturbation to general sub-Weibull or bounded perturbations, and (2) from the special Gaussian rewards to general sub-Gaussian rewards.

Next, we examine stationary stochastic linear bandits and explicate the role of two perturbation approaches in overcoming conservatism that UCB-type algorithms chronically suffer from in practice. In one approach, we replace optimism with a simple randomization when using confidence sets. In the other, we add random perturbations to the current estimate before maximizing the expected reward. In non-stationary stochastic linear bandits, we propose two randomized algorithms D-RandLinUCB and D-LinTS to gracefully adjust to the time-variation in the true parameter by combining weighted least square estimator and two perturbation approaches respectively, and investigate the statistical optimality versus computational efficiency trade-off between them.
Baekjin, Kim Defense Flyer Baekjin, Kim Defense Flyer
Baekjin, Kim Defense Flyer

Explore Similar Events

  •  Loading Similar Events...

Keywords


Back to Main Content