Skip to Content

Sponsors

No results

Keywords

No results

Types

No results

Search Results

Events

No results
Search events using: keywords, sponsors, locations or event type
When / Where

Presented By: Department of Statistics Dissertation Defenses

Bayesian Generative Modeling of Latent Subpopulations with Non- parametric Distributions

Yilei Zhang

Across many scientific domains, researchers increasingly collect large heterogeneous datasets containing multiple meaningful subpopulations whose labels are unavailable. These subpopulations may be related in complex ways, and each may exhibit rich internal structure. Scientific analysis often requires not only assigning observations to latent subpopulations, but also characterizing the distributional structure within each subpopulation. Mixture models provide a natural framework for this goal. However, most existing work assumes that component distributions belong to specified parametric families, which are almost always misspecified in practice. Capturing complex subpopulation structures therefore requires extending mixture models to allow nonparametric component distributions. This extension immediately raises fundamental challenges of identifiability and inference: since only the overall population distribution is observed, it is unclear what should count as a distinct subpopulation; when components are highly flexible, it is unclear whether they can be separated, especially in overlapping regions; and even when separation is theoretically possible, reliably estimating latent subpopulations remains a major inferential challenge. In this dissertation, we address these theoretical and methodological challenges within a systematic Bayesian nonparametric framework.
First, we develop a unified framework based on mixtures of Dirichlet process mixtures (MDPMs) for two classes of nonparametric mixture structures: one in which components’ high-density regions are spatially differentiated, and another in which components may fully overlap but are distinguished by contrasting density levels. We develop scalable algorithms and evaluate them through simulations and real-data applications in univariate and multivariate settings, showing that component distributions can be accurately recovered under mild conditions.
Second, we extend the approach to multivariate settings where component high-density regions are spatially differentiated but not convexly separable. To handle complex density-contour geometry, we approximate these regions by unions of hypercubes and construct MDPMs over the resulting coverings, allowing the model to learn component distributions with complex latent-support geometries. Simulation studies demonstrate strong performance across diverse settings.
Third, we provide theoretical support for the framework by establishing identifiability conditions for the first class of mixture structures. We further derive posterior contraction rates under the MDPM framework. These results show that MDPMs preserve the efficiency of learning the overall population density relative to a single Dirichlet process mixture, while enabling latent nonparametric component distributions to be learned at a nearly polynomial rate, substantially faster than the typical rates of nonparametric deconvolution.

Explore Similar Events

  •  Loading Similar Events...

Keywords


Back to Main Content