BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//UM//UM*Events//EN
CALSCALE:GREGORIAN
BEGIN:VTIMEZONE
TZID:America/Detroit
TZURL:http://tzurl.org/zoneinfo/America/Detroit
X-LIC-LOCATION:America/Detroit
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20070311T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20071104T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260504T113544
DTSTART;TZID=America/Detroit:20260506T140000
DTEND;TZID=America/Detroit:20260506T160000
SUMMARY:Lecture / Discussion:Bayesian Generative Modeling of Latent Subpopulations with Non- parametric Distributions
DESCRIPTION:Across many scientific domains\, researchers increasingly collect large heterogeneous datasets containing multiple meaningful subpopulations whose labels are unavailable. These subpopulations may be related in complex ways\, and each may exhibit rich internal structure. Scientific analysis often requires not only assigning observations to latent subpopulations\, but also characterizing the distributional structure within each subpopulation. Mixture models provide a natural framework for this goal. However\, most existing work assumes that component distributions belong to specified parametric families\, which are almost always misspecified in practice. Capturing complex subpopulation structures therefore requires extending mixture models to allow nonparametric component distributions. This extension immediately raises fundamental challenges of identifiability and inference: since only the overall population distribution is observed\, it is unclear what should count as a distinct subpopulation\; when components are highly flexible\, it is unclear whether they can be separated\, especially in overlapping regions\; and even when separation is theoretically possible\, reliably estimating latent subpopulations remains a major inferential challenge. In this dissertation\, we address these theoretical and methodological challenges within a systematic Bayesian nonparametric framework.\nFirst\, we develop a unified framework based on mixtures of Dirichlet process mixtures (MDPMs) for two classes of nonparametric mixture structures: one in which components’ high-density regions are spatially differentiated\, and another in which components may fully overlap but are distinguished by contrasting density levels. We develop scalable algorithms and evaluate them through simulations and real-data applications in univariate and multivariate settings\, showing that component distributions can be accurately recovered under mild conditions.\nSecond\, we extend the approach to multivariate settings where component high-density regions are spatially differentiated but not convexly separable. To handle complex density-contour geometry\, we approximate these regions by unions of hypercubes and construct MDPMs over the resulting coverings\, allowing the model to learn component distributions with complex latent-support geometries. Simulation studies demonstrate strong performance across diverse settings.\nThird\, we provide theoretical support for the framework by establishing identifiability conditions for the first class of mixture structures. We further derive posterior contraction rates under the MDPM framework. These results show that MDPMs preserve the efficiency of learning the overall population density relative to a single Dirichlet process mixture\, while enabling latent nonparametric component distributions to be learned at a nearly polynomial rate\, substantially faster than the typical rates of nonparametric deconvolution.
UID:148073-21902919@events.umich.edu
URL:https://events.umich.edu/event/148073
CLASS:PUBLIC
STATUS:CONFIRMED
CATEGORIES:Dissertation
LOCATION:West Hall - 438
CONTACT:
END:VEVENT
END:VCALENDAR