Presented By: Department of Statistics Dissertation Defenses
Topics in Causal Inference Addressing Practical Data Challenges
Charlotte Mann
Abstract: Evaluating the effects of interventions is important to inform policy decisions across many disciplines. This dissertation is motivated by applications including evaluating medical treatments, state-level policy decisions, and educational curricular interventions. In each of these applications, there are practical barriers to using previously developed effect estimation methods with available data. We present estimation strategies that overcome these challenges or offer practical benefits of their own, beyond those of typical methods.
First, we apply a method that integrates experimental (RCT) and observational data for treatment effect estimation to a paired cluster-randomized field experiment in education (CTAI study). For this data integration approach, first, we fit an outcome model with the auxiliary, observational data. Then, we use predictions from the auxiliary model as a “super covariate” in treatment effect estimation with the RCT. The application to the CTAI study demonstrates the process and practical benefits of fitting an auxiliary model in addition to the data integration method’s efficacy in a relevant applied setting.
While we were able to access both auxiliary and RCT data for the CTA study, this may not always be the case in practice. For example, during the first year of the COVID-19 pandemic, large patient databases existed, but access to this data was restricted. To overcome this challenge, we aimed to share a useful auxiliary model for analyses of COVID-19 RCTs. In collaboration with physicians, we developed a risk model for in-hospital COVID-19 mortality. We discuss considerations for developing a model that would be useful for data integration in diverse (and unknown) RCTs and for physicians in practice.
We additionally explore more general approaches to integrate observational and experimental data for causal inference, when data privacy is a concern. Namely, we consider ways to release observational datasets, which limit disclosure of confidential information and can still be used in two treatment effect estimators, which leverage auxiliary data for generalizability or precision. We find that integrating privacy transformed observational data still improves generalizability or precision of treatment effect estimates, beyond that achieved using the RCT data alone.
Researchers may also want to use data that cannot be fully released, due to data privacy concerns, for treatment effect estimation directly. For example, analyses of health policies often rely on data provided by US data agencies, but publicly available mortality outcomes in the US are censored, so counts of 10 or fewer individual deaths are suppressed. This results in sometimes substantial missing data for county-level analyses. We address this common challenge to policy evaluations by presenting a rank-sum test statistic that accommodates outcomes censored in this way. We apply the rank-sum test to estimate the county-level effect of Medicaid expansion as part of the Affordable Care Act on mortality in the US in 2014, with attention to observational study design and additional statistical challenges that arise with county-level mortality counts.
Finally, motivated by the CTAI study, we present a general framework for design-based estimation in paired cluster-randomized experiments (pCRTs) for average individual effects. Given that pCRTs are common, it is surprisingly not obvious how to analyze them. Our framework clarifies the bias-variance trade-off between different treatment effect estimators and emphasizes the benefits of covariate adjustment for estimation with pCRTs. This analysis and extensive simulation studies provide guidance for how to analyze pCRTs in practice.
First, we apply a method that integrates experimental (RCT) and observational data for treatment effect estimation to a paired cluster-randomized field experiment in education (CTAI study). For this data integration approach, first, we fit an outcome model with the auxiliary, observational data. Then, we use predictions from the auxiliary model as a “super covariate” in treatment effect estimation with the RCT. The application to the CTAI study demonstrates the process and practical benefits of fitting an auxiliary model in addition to the data integration method’s efficacy in a relevant applied setting.
While we were able to access both auxiliary and RCT data for the CTA study, this may not always be the case in practice. For example, during the first year of the COVID-19 pandemic, large patient databases existed, but access to this data was restricted. To overcome this challenge, we aimed to share a useful auxiliary model for analyses of COVID-19 RCTs. In collaboration with physicians, we developed a risk model for in-hospital COVID-19 mortality. We discuss considerations for developing a model that would be useful for data integration in diverse (and unknown) RCTs and for physicians in practice.
We additionally explore more general approaches to integrate observational and experimental data for causal inference, when data privacy is a concern. Namely, we consider ways to release observational datasets, which limit disclosure of confidential information and can still be used in two treatment effect estimators, which leverage auxiliary data for generalizability or precision. We find that integrating privacy transformed observational data still improves generalizability or precision of treatment effect estimates, beyond that achieved using the RCT data alone.
Researchers may also want to use data that cannot be fully released, due to data privacy concerns, for treatment effect estimation directly. For example, analyses of health policies often rely on data provided by US data agencies, but publicly available mortality outcomes in the US are censored, so counts of 10 or fewer individual deaths are suppressed. This results in sometimes substantial missing data for county-level analyses. We address this common challenge to policy evaluations by presenting a rank-sum test statistic that accommodates outcomes censored in this way. We apply the rank-sum test to estimate the county-level effect of Medicaid expansion as part of the Affordable Care Act on mortality in the US in 2014, with attention to observational study design and additional statistical challenges that arise with county-level mortality counts.
Finally, motivated by the CTAI study, we present a general framework for design-based estimation in paired cluster-randomized experiments (pCRTs) for average individual effects. Given that pCRTs are common, it is surprisingly not obvious how to analyze them. Our framework clarifies the bias-variance trade-off between different treatment effect estimators and emphasizes the benefits of covariate adjustment for estimation with pCRTs. This analysis and extensive simulation studies provide guidance for how to analyze pCRTs in practice.
Explore Similar Events
-
Loading Similar Events...