Presented By: Department of Statistics
Statistics Department Seminar Series: Yuting Wei, Associate Professor, Department of Statistics & Data Science, University of Pennsylvania
Efficient Sampling with Diffusion Models: Sharp and Adaptive Guarantees
The score-based diffusion models have become a cornerstone of modern generative AI. While recent works aim to develop sharp convergence guarantees, the iteration complexity of existing analyses typically scales with the ambient data dimension $d$ of the target distribution, leading to overly conservative theory that fails to explain its practical efficiency. This motivates us to understand how diffusion models can achieve sampling speed-ups through automatic exploitation of intrinsic low dimensionality of data for both continuous and discrete distributions.
This talk explores two key scenarios: (1) For a broad class of continous distributions with intrinsic dimension $k$, we show that the iteration complexity of the denoising diffusion probabilistic model (DDPM) scales nearly linearly with $k$, which is optimal under the KL divergence metric; (2) For masking discrete diffusions, under a continuous-time Markov chain (CTMC) formulation, we introduce a modified $\tau$-leaping sampler whose convergence rate is governed by an intrinsic information-theoretic quantity, termed the \emph{effective total correlation}, which is upper bounded by $d \log S$ (with $S$ the vocabulary size) but can be sublinear or even constant for structured discrete distributions.
This talk explores two key scenarios: (1) For a broad class of continous distributions with intrinsic dimension $k$, we show that the iteration complexity of the denoising diffusion probabilistic model (DDPM) scales nearly linearly with $k$, which is optimal under the KL divergence metric; (2) For masking discrete diffusions, under a continuous-time Markov chain (CTMC) formulation, we introduce a modified $\tau$-leaping sampler whose convergence rate is governed by an intrinsic information-theoretic quantity, termed the \emph{effective total correlation}, which is upper bounded by $d \log S$ (with $S$ the vocabulary size) but can be sublinear or even constant for structured discrete distributions.