Skip to Content

Sponsors

No results

Keywords

No results

Types

No results

Search Results

Events

No results
Search events using: keywords, sponsors, locations or event type
When / Where
All occurrences of this event have passed.
This listing is displayed for historical purposes.

Presented By: Financial/Actuarial Mathematics Seminar - Department of Mathematics

Convergence Analysis of Discrete Sampling in Continuous-Time Reinforcement Learning and High-Dimensional Numerical Integration

Du Ouyang, Tsinghua

Stochastic policies (also known as relaxed controls) are widely used in continuous-time Reinforcement Learning (RL) algorithms. However, a critical disconnect remains between theory and practice. The theoretical aggregated dynamics, driven by averaged coefficients, provide a convenient basis for deriving RL algorithms but cannot be directly implemented. Physical execution requires the agent to sample concrete actions from the policy. Since continuously sampling independent actions poses significant mathematical and computational challenges, practical implementation must rely on discrete sampling. Yet, for general diffusion processes, the accuracy of such discretely sampled dynamics has lacked rigorous theoretical justification.

In this talk, I will bridge this gap by introducing and rigorously analyzing a policy execution framework that samples actions from a stochastic policy at discrete time points and implements them as piecewise constant controls. We prove that as the sampling mesh size tends to zero, the controlled state process converges weakly to the dynamics with coefficients aggregated according to the stochastic policy. We explicitly quantify the convergence rate based on the regularity of the coefficients and establish an optimal first-order convergence rate for sufficiently regular coefficients. Additionally, we prove a 1/2-order weak convergence rate that holds uniformly over the sampling noise with high probability, and establish a 1/2-order pathwise convergence for each realization of the system noise in the absence of volatility control. Building on these results, we analyze the bias and variance of various policy evaluation and policy gradient estimators based on discrete-time observations. Our results provide theoretical justification for the exploratory stochastic control framework in [H. Wang, T. Zariphopoulou, and X.Y. Zhou, J. Mach. Learn. Res., 21 (2020), pp. 1-34].

Finally, I will also briefly discuss my research on Quasi-Monte Carlo sampling methods for efficient computation in high-dimensional numerical integration.

Livestream Information

 Livestream
January 28, 2026 (Wednesday) 4:00pm
Joining Information Not Yet Available

Explore Similar Events

  •  Loading Similar Events...

Keywords


Back to Main Content