Presented By: Department of Statistics
Statistics Department Seminar Series: Richard Guo Research Associate, Statistical Laboratory, University of Cambridge
"Harnessing Extra Randomness: Replicability, Flexibility and Causality"
Abstract: Many modern statistical procedures are randomized in the sense that the output is a random function of data. For example, many procedures employ data splitting, which randomly divides the dataset into disjoint parts for separate purposes. Despite their flexibility and popularity, data splitting and other constructions of randomized procedures have obvious drawbacks. First, two analyses of the same dataset may lead to different results due to the extra randomness introduced. Second, randomized procedures typically lose statistical power because the entire sample is not fully utilized.
To address these drawbacks, in this talk, I will study how to properly combine the results from multiple realizations (such as through multiple data splits) of a randomized procedure. I will introduce rank-transformed subsampling as a general method for delivering large sample inference of the combined result under minimal assumptions. I will illustrate the method with three applications: (1) a “hunt-and-test” procedure for detecting cancer subtypes using high-dimensional gene expression data, (2) testing the hypothesis of no direct effect in a sequentially randomized trial and (3) calibrating cross-fit “double machine learning” confidence intervals. For these problems, our method is able to de-randomize and improve power or coverage. Moreover, in contrast to existing approaches for combining p-values, our method enjoys type-I error control that asymptotically approaches the nominal level. This new development opens up the possibility of designing procedures that explicitly randomize and de-randomize: extra randomness is introduced to make the problem easier before being removed.
This talk is based on joint work with Rajen Shah.
Bio: Richard Guo is a research associate in the Statistical Laboratory at the University of Cambridge, mentored by Rajen Shah. In Spring 2022, he was the Richard M. Karp Research Fellow in the causality program at the Simons Institute for the Theory of Computing. He received his PhD in Statistics from University of Washington in 2021, advised by Thomas Richardson, for which he received the Z. W. Birnbaum Award. His research interests include graphical models, causal inference and replicability of data analysis.
https://unbiased.co.in/
To address these drawbacks, in this talk, I will study how to properly combine the results from multiple realizations (such as through multiple data splits) of a randomized procedure. I will introduce rank-transformed subsampling as a general method for delivering large sample inference of the combined result under minimal assumptions. I will illustrate the method with three applications: (1) a “hunt-and-test” procedure for detecting cancer subtypes using high-dimensional gene expression data, (2) testing the hypothesis of no direct effect in a sequentially randomized trial and (3) calibrating cross-fit “double machine learning” confidence intervals. For these problems, our method is able to de-randomize and improve power or coverage. Moreover, in contrast to existing approaches for combining p-values, our method enjoys type-I error control that asymptotically approaches the nominal level. This new development opens up the possibility of designing procedures that explicitly randomize and de-randomize: extra randomness is introduced to make the problem easier before being removed.
This talk is based on joint work with Rajen Shah.
Bio: Richard Guo is a research associate in the Statistical Laboratory at the University of Cambridge, mentored by Rajen Shah. In Spring 2022, he was the Richard M. Karp Research Fellow in the causality program at the Simons Institute for the Theory of Computing. He received his PhD in Statistics from University of Washington in 2021, advised by Thomas Richardson, for which he received the Z. W. Birnbaum Award. His research interests include graphical models, causal inference and replicability of data analysis.
https://unbiased.co.in/
Related Links
Co-Sponsored By
Explore Similar Events
-
Loading Similar Events...