Presented By: Department of Statistics Dissertation Defenses
Design-Based Causal Inference for Clustered Randomized Experiments and Observational Studies
Xinhe Wang
Modern empirical research increasingly relies on comparative studies with complex designs, including stratified and clustered treatment assignment, multiple treatment arms, and observational samples. These features arise naturally in education, public health, policy evaluation, and many other fields, but they also complicate causal estimation and inference by undermining the validity for familiar estimators and standard errors.
The first part of the dissertation studies clustered randomized trials with heterogeneous cluster sizes. We show that the commonly used estimators that average stratum-specific treatment-control contrasts can be inconsistent for the average treatment effect in such settings, a problem that has received limited attention. We establish consistency of a simple alternative, the Hájek estimator, under standard asymptotic regimes, develop an asymptotically conservative variance estimator valid under arbitrary stratum sizes, and propose a score-type test with improved small- to moderate-sample performance.
The second part of the dissertation extends this framework to multi-arm stratified clustered experiments, where inference must account for not only treatment-versus-control comparisons, but also comparisons among active treatments. We show that regression adjustment admits a unified two-stage representation, allowing adjustment models to be fit on the full sample, a subset of units, or external data. We establish multivariate asymptotic theory for vectors of covariate-adjusted Hájek estimators and develop a covariance matrix estimator that is asymptotically conservative in the positive semidefinite order.
The third part of the dissertation connects design-based inference for randomized experiments with matched and stratified observational studies. We compare the observational assignment mechanism to an emulated stratified clustered randomized trial with the same realized blocking structure. Assuming sufficient within-block homogeneity in treatment propensities, we show that a sandwich-type variance estimator for the covariate-adjusted Hájek estimator is asymptotically conservative.
Together, these results provide a unified design-based framework for estimation and inference in randomized and observational comparative studies. The dissertation contributes both diagnostic insight, showing when common estimators fail, and constructive methodology for valid causal inference in complex empirical designs.
The first part of the dissertation studies clustered randomized trials with heterogeneous cluster sizes. We show that the commonly used estimators that average stratum-specific treatment-control contrasts can be inconsistent for the average treatment effect in such settings, a problem that has received limited attention. We establish consistency of a simple alternative, the Hájek estimator, under standard asymptotic regimes, develop an asymptotically conservative variance estimator valid under arbitrary stratum sizes, and propose a score-type test with improved small- to moderate-sample performance.
The second part of the dissertation extends this framework to multi-arm stratified clustered experiments, where inference must account for not only treatment-versus-control comparisons, but also comparisons among active treatments. We show that regression adjustment admits a unified two-stage representation, allowing adjustment models to be fit on the full sample, a subset of units, or external data. We establish multivariate asymptotic theory for vectors of covariate-adjusted Hájek estimators and develop a covariance matrix estimator that is asymptotically conservative in the positive semidefinite order.
The third part of the dissertation connects design-based inference for randomized experiments with matched and stratified observational studies. We compare the observational assignment mechanism to an emulated stratified clustered randomized trial with the same realized blocking structure. Assuming sufficient within-block homogeneity in treatment propensities, we show that a sandwich-type variance estimator for the covariate-adjusted Hájek estimator is asymptotically conservative.
Together, these results provide a unified design-based framework for estimation and inference in randomized and observational comparative studies. The dissertation contributes both diagnostic insight, showing when common estimators fail, and constructive methodology for valid causal inference in complex empirical designs.