Presented By: Biomedical Engineering
Biomedical Engineering Seminar Series
"Machine Learning for Small Molecule Drug Development and Delivery," with Daniel Reker, PhD.

Abstract:
Machine learning algorithms are becoming increasingly powerful to support scientific research and development. A large emphasis in the pharmaceutical community has been on applying these tools in drug discovery, where algorithms promise to accelerate the earliest stages of drug development. However, a key challenge in applying machine learning to drug discovery is the scarcity of large, high-quality datasets for many important applications. To combat this issue, active learning workflows can be deployed where the machine learning algorithm is put in charge of additional data acquisition. These approaches can dramatically improve the performance of machine learning algorithms by directly querying the most useful data for model development instead of relying on human bias for data generation. In particular, by pairing machine learning algorithms in yoked learning campaigns, with one model as a selection algorithm and another as the predictive model, the performance of these workflows can be improved especially when using currently popular deep neural networks. Even if no new data is selected, smart active learning algorithms can also
serve as a data selection approach for subsampling existing data to improve model performance especially on low quality datasets. Instead of selecting data, data processing approaches can further improve model performance to better support drug development. For example, novel pairwise deep learning approaches analyze the relationships between molecules to predict property differences instead of predicting property values of individual molecules. By quadratically increasing dataset sizes, such approaches are particularly beneficial for ADMET and drug development tasks where data availability might be severely limited when relying on in vivo readouts. Such approaches can also enable the integration of bounded measurement values, thereby enabling algorithms to incorporate data on incompletely characterized compounds to further improve model performance. While active learning and novel data
processing techniques can significantly improve model performance in drug discovery, similar advancements are needed for drug delivery applications. The development of machine learning approaches to de-risk and improve drug delivery holds immense promise to create superior therapeutics. However, available datasets for drug delivery applications are often even more limited, and this is further exacerbated by the large complexity of delivery challenges that often involve complex materials and interactions. To circumvent some of these challenges, data from drug discovery can be harnessed to predict drug-excipient interactions to identify functional formulations that improve drug absorption and metabolism. Alternatively, data can be specifically generated via high-throughput laboratory automation or using text mining to curate data from the literature. We have prototyped such advanced workflows in the context of nanoparticle development, for example by creating predictive workflows that model the in vivo tumor reduction of inorganic nanoparticles and by using machine learning to guide the synthesis of novel drug-excipient nanoparticles with applications in anti-fungal and anti-cancer drug delivery. Despite current challenges and surely overinflated expectations, such case studies serve as testimony that the strategic integration of machine learning into drug development pipelines holds immense promise to accelerate, de-risk, and optimize the creation of life-saving therapeutics.
Machine learning algorithms are becoming increasingly powerful to support scientific research and development. A large emphasis in the pharmaceutical community has been on applying these tools in drug discovery, where algorithms promise to accelerate the earliest stages of drug development. However, a key challenge in applying machine learning to drug discovery is the scarcity of large, high-quality datasets for many important applications. To combat this issue, active learning workflows can be deployed where the machine learning algorithm is put in charge of additional data acquisition. These approaches can dramatically improve the performance of machine learning algorithms by directly querying the most useful data for model development instead of relying on human bias for data generation. In particular, by pairing machine learning algorithms in yoked learning campaigns, with one model as a selection algorithm and another as the predictive model, the performance of these workflows can be improved especially when using currently popular deep neural networks. Even if no new data is selected, smart active learning algorithms can also
serve as a data selection approach for subsampling existing data to improve model performance especially on low quality datasets. Instead of selecting data, data processing approaches can further improve model performance to better support drug development. For example, novel pairwise deep learning approaches analyze the relationships between molecules to predict property differences instead of predicting property values of individual molecules. By quadratically increasing dataset sizes, such approaches are particularly beneficial for ADMET and drug development tasks where data availability might be severely limited when relying on in vivo readouts. Such approaches can also enable the integration of bounded measurement values, thereby enabling algorithms to incorporate data on incompletely characterized compounds to further improve model performance. While active learning and novel data
processing techniques can significantly improve model performance in drug discovery, similar advancements are needed for drug delivery applications. The development of machine learning approaches to de-risk and improve drug delivery holds immense promise to create superior therapeutics. However, available datasets for drug delivery applications are often even more limited, and this is further exacerbated by the large complexity of delivery challenges that often involve complex materials and interactions. To circumvent some of these challenges, data from drug discovery can be harnessed to predict drug-excipient interactions to identify functional formulations that improve drug absorption and metabolism. Alternatively, data can be specifically generated via high-throughput laboratory automation or using text mining to curate data from the literature. We have prototyped such advanced workflows in the context of nanoparticle development, for example by creating predictive workflows that model the in vivo tumor reduction of inorganic nanoparticles and by using machine learning to guide the synthesis of novel drug-excipient nanoparticles with applications in anti-fungal and anti-cancer drug delivery. Despite current challenges and surely overinflated expectations, such case studies serve as testimony that the strategic integration of machine learning into drug development pipelines holds immense promise to accelerate, de-risk, and optimize the creation of life-saving therapeutics.