Presented By: Department of Statistics Dissertation Defenses
Statistical Foundations for Microplastic Identification: Efficient Sampling and Distribution-Free Uncertainty Quantification
Eduardo Ochoa Rivera
Microplastics are an emerging pollutant of global concern, with environmental particles documented across the world. Reliable identification of microplastic particles is essential for quantifying their prevalence and assessing environmental exposure, yet current spectroscopic identification pipelines face several statistical challenges. Measurements can be costly and time-consuming, commonly used spectral matching procedures often lack formal guarantees, and environmental samples can vary over time and across locations. This dissertation develops statistical and machine learning methods for adaptive sampling and uncertainty quantification, with a focus on improving the reliability and efficiency of microplastic spectral identification.
First, we study adaptive sampling through pure exploration problems in logistic bandits. We introduce Logistic Track-and-Stop, the first track-and-stop algorithm for general pure exploration problems under a logistic bandit model. The method combines adaptive sampling with a stopping rule based on generalized likelihood ratio statistics and asymptotically matches an approximation to the instance-specific lower bound on expected sample complexity.
Second, we develop and apply conformal prediction methods for microplastic spectral identification. We first apply conformal prediction to popular database matching pipelines, highlighting the limitations of practitioner-selected similarity thresholds. We then extend the conformal prediction framework to ensemble and multi-view settings by aggregating nonconformity scores across multiple models or data modalities. In particular, we apply multiview conformal prediction to photothermal infrared and Raman spectra, producing more efficient and robust prediction sets than those obtained from single-view methods.
Third, we study online conformal prediction across multiple coverage levels. We leverage online optimization algorithms to enforce nestedness of prediction sets across the full risk spectrum while controlling quantile estimation error. Beyond improving interpretability, jointly estimating multiple coverage levels can improve statistical efficiency by enforcing non-crossing constraints and sharing information across quantiles.
Together, these contributions provide tools for accelerating microplastic identification and improving the reliability of the resulting scientific conclusions.
First, we study adaptive sampling through pure exploration problems in logistic bandits. We introduce Logistic Track-and-Stop, the first track-and-stop algorithm for general pure exploration problems under a logistic bandit model. The method combines adaptive sampling with a stopping rule based on generalized likelihood ratio statistics and asymptotically matches an approximation to the instance-specific lower bound on expected sample complexity.
Second, we develop and apply conformal prediction methods for microplastic spectral identification. We first apply conformal prediction to popular database matching pipelines, highlighting the limitations of practitioner-selected similarity thresholds. We then extend the conformal prediction framework to ensemble and multi-view settings by aggregating nonconformity scores across multiple models or data modalities. In particular, we apply multiview conformal prediction to photothermal infrared and Raman spectra, producing more efficient and robust prediction sets than those obtained from single-view methods.
Third, we study online conformal prediction across multiple coverage levels. We leverage online optimization algorithms to enforce nestedness of prediction sets across the full risk spectrum while controlling quantile estimation error. Beyond improving interpretability, jointly estimating multiple coverage levels can improve statistical efficiency by enforcing non-crossing constraints and sharing information across quantiles.
Together, these contributions provide tools for accelerating microplastic identification and improving the reliability of the resulting scientific conclusions.