Presented By: Michigan Program in Survey and Data Science
MPSDS JPSM Seminar Series - Flexible Formal Privacy for Public Data Curation
Jeremy Seeman - Michigan Institute for Data Science (MIDAS) and Institute for Social Research (ISR), University of Michigan
MPSDS JPSM Seminar Series
November 1, 2023
12;00 - 1:00 pm EDT
In person, room 1070 Institute for Social Research, and via Zoom.
The Zoom call will be locked 10 minutes after the start of the presentation.
Flexible Formal Privacy for Public Data Curation
Researchers rely extensively on public datasets disseminated by official statistics agencies, universities, non-governmental organizations, and other data curators. With the increasing availability of data and computing power comes increased threats to privacy, as published statistics can more easily be used to reconstruct sensitive personal data. Formal privacy (FP) methods, like differential privacy (DP), provably limit such information leakage by injecting carefully chosen randomized noise into published statistics. However, the way DP accounts for privacy degradation requires this noise be injected into every statistic dependent on the confidential dataset. This fails to reflect data curator needs, social, legal or ethical requirements, and complex dependency structures between public and confidential datasets. In this talk, I'll discuss statistical methodology that addresses these problems. We propose a FP framework with novel characterizations of disclosure risk when assessing collections of statistics wherein only some statistics are published with DP guarantees. We demonstrate FP properties maintained by our proposed framework, propose data release mechanisms which satisfy our proposed definition, and prove the optimality properties of downstream statistical estimators based on these mechanism outputs. For this talk, I'll discuss a few end-to-end data analysis examples in public health and surveys, showing how theoretical trade-offs between privacy, utility, and computation time manifest in practice when assessing disclosure risks and statistical utility. I'll conclude with a discussion on the implications of this work for survey researchers, focusing on opportunities to incorporate privacy by design in survey planning, experimental design, and other data collection operations.
Jeremy Seeman is a Michigan Data Science Fellow at the Michigan Institute for Data Science (MIDAS) and MPSDS. He recently graduated with his PhD in statistics from Penn State University. Jeremy's research focuses on statistical data privacy, quantitative methods in the social sciences, and social values in data governance. He is the recipient of the U.S Census Bureau Dissertation Fellowship and the ASA Pride Scholarship. Prior to joining Penn State, Jeremy completed his BS in Physics and MS in Statistics at the University of Chicago, where he was a research fellow at the Center for Data Science and Public Policy.
November 1, 2023
12;00 - 1:00 pm EDT
In person, room 1070 Institute for Social Research, and via Zoom.
The Zoom call will be locked 10 minutes after the start of the presentation.
Flexible Formal Privacy for Public Data Curation
Researchers rely extensively on public datasets disseminated by official statistics agencies, universities, non-governmental organizations, and other data curators. With the increasing availability of data and computing power comes increased threats to privacy, as published statistics can more easily be used to reconstruct sensitive personal data. Formal privacy (FP) methods, like differential privacy (DP), provably limit such information leakage by injecting carefully chosen randomized noise into published statistics. However, the way DP accounts for privacy degradation requires this noise be injected into every statistic dependent on the confidential dataset. This fails to reflect data curator needs, social, legal or ethical requirements, and complex dependency structures between public and confidential datasets. In this talk, I'll discuss statistical methodology that addresses these problems. We propose a FP framework with novel characterizations of disclosure risk when assessing collections of statistics wherein only some statistics are published with DP guarantees. We demonstrate FP properties maintained by our proposed framework, propose data release mechanisms which satisfy our proposed definition, and prove the optimality properties of downstream statistical estimators based on these mechanism outputs. For this talk, I'll discuss a few end-to-end data analysis examples in public health and surveys, showing how theoretical trade-offs between privacy, utility, and computation time manifest in practice when assessing disclosure risks and statistical utility. I'll conclude with a discussion on the implications of this work for survey researchers, focusing on opportunities to incorporate privacy by design in survey planning, experimental design, and other data collection operations.
Jeremy Seeman is a Michigan Data Science Fellow at the Michigan Institute for Data Science (MIDAS) and MPSDS. He recently graduated with his PhD in statistics from Penn State University. Jeremy's research focuses on statistical data privacy, quantitative methods in the social sciences, and social values in data governance. He is the recipient of the U.S Census Bureau Dissertation Fellowship and the ASA Pride Scholarship. Prior to joining Penn State, Jeremy completed his BS in Physics and MS in Statistics at the University of Chicago, where he was a research fellow at the Center for Data Science and Public Policy.
Livestream Information
ZoomNovember 1, 2023 (Wednesday) 12:00pm
Meeting ID: 97217806877
Meeting Password: 2324
Explore Similar Events
-
Loading Similar Events...