Presented By: Biomedical Engineering
Biomedical Engineering Seminar Series
"Stress testing simulation and machine learning models for virtual screening," with Matthew O'Meara
Stress testing simulation and machine learning models for virtual screening
Abstract:
Generative AI has lead to breakthroughs in protein structure prediction and design, building on high-quality data from the Protein DataBank and Sequence Read Archive. An outstanding question is, how effective will GenAI be for small molecule drug discovery, and what data will these models train on? First, I will describe our work in physics based ultra-large scale virtual screening and preliminary benchmarking of state-of-the-art co-folding methods for virtual screening. Then I will describe our work in exploring challenges and opportunities in leveraging diverse bioactivity data as training data: Large-scale data curation, and developing large-scale synthetic data sets, and a statistical framework for testing the impact of data contamination.
Abstract:
Generative AI has lead to breakthroughs in protein structure prediction and design, building on high-quality data from the Protein DataBank and Sequence Read Archive. An outstanding question is, how effective will GenAI be for small molecule drug discovery, and what data will these models train on? First, I will describe our work in physics based ultra-large scale virtual screening and preliminary benchmarking of state-of-the-art co-folding methods for virtual screening. Then I will describe our work in exploring challenges and opportunities in leveraging diverse bioactivity data as training data: Large-scale data curation, and developing large-scale synthetic data sets, and a statistical framework for testing the impact of data contamination.