Fast uncertainty estimates in deep learning interatomic potentials DOI Open Access
Albert Zhu, Simon Batzner, Albert Musaelian

et al.

The Journal of Chemical Physics, Journal Year: 2023, Volume and Issue: 158(16)

Published: April 27, 2023

Deep learning has emerged as a promising paradigm to give access highly accurate predictions of molecular and material properties. A common short-coming shared by current approaches, however, is that neural networks only point estimates their do not come with predictive uncertainties associated these estimates. Existing uncertainty quantification efforts have primarily leveraged the standard deviation across an ensemble independently trained networks. This incurs large computational overhead in both training prediction, resulting order-of-magnitude more expensive predictions. Here, we propose method estimate based on single network without need for ensemble. allows us obtain virtually no additional over inference. We demonstrate quality matches those obtained from deep ensembles. further examine our methods ensembles configuration space test system compare potential energy surface. Finally, study efficacy active setting find results match ensemble-based strategy at reduced cost.

Language: Английский

Leveraging molecular structure and bioactivity with chemical language models for de novo drug design DOI Creative Commons
Michaël Moret, Irène Pachón-Angona,

Leandro Cotos

et al.

Nature Communications, Journal Year: 2023, Volume and Issue: 14(1)

Published: Jan. 7, 2023

Generative chemical language models (CLMs) can be used for de novo molecular structure generation by learning from a textual representation of molecules. Here, we show that hybrid CLMs additionally leverage the bioactivity information available training compounds. To computationally design ligands phosphoinositide 3-kinase gamma (PI3Kγ), collection virtual molecules was created with generative CLM. This compound library refined using CLM-based classifier prediction. second CLM pretrained patented structures and fine-tuned known PI3Kγ ligands. Several computer-generated designs were commercially available, enabling fast prescreening preliminary experimental validation. A new ligand sub-micromolar activity identified, highlighting method's scaffold-hopping potential. Chemical synthesis biochemical testing two top-ranked designed their derivatives corroborated model's ability to generate medium low nanomolar hit-to-lead expansion. The most potent compounds led pronounced inhibition PI3K-dependent Akt phosphorylation in medulloblastoma cell model, demonstrating efficacy PI3K/Akt pathway repression human tumor cells. results positively advocate screening activity-focused design.

Language: Английский

Citations

97

Characterizing Uncertainty in Machine Learning for Chemistry DOI Creative Commons
Esther Heid, Charles J. McGill, Florence H. Vermeire

et al.

Journal of Chemical Information and Modeling, Journal Year: 2023, Volume and Issue: 63(13), P. 4012 - 4029

Published: June 20, 2023

Characterizing uncertainty in machine learning models has recently gained interest the context of reliability, robustness, safety, and active learning. Here, we separate total into contributions from noise data (aleatoric) shortcomings model (epistemic), further dividing epistemic bias variance contributions. We systematically address influence noise, bias, chemical property predictions, where diverse nature target properties vast space give rise to many different distinct sources prediction error. demonstrate that error can each be significant contexts must individually addressed during development. Through controlled experiments on sets molecular properties, show important trends performance associated with level set, size architecture, molecule representation, ensemble size, set splitting. In particular, 1) test limit a model's observed when actual is much better, 2) using size-extensive aggregation structures crucial for extensive prediction, 3) ensembling reliable tool quantification improvement specifically contribution variance. develop general guidelines how improve an underperforming falling contexts.

Language: Английский

Citations

43

Graph neural networks DOI
Gabriele Corso, H. Stärk,

Stefanie Jegelka

et al.

Nature Reviews Methods Primers, Journal Year: 2024, Volume and Issue: 4(1)

Published: March 7, 2024

Language: Английский

Citations

43

Pareto optimization to accelerate multi-objective virtual screening DOI Creative Commons
Jenna C. Fromer, David Graff,

Connor W. Coley

et al.

Digital Discovery, Journal Year: 2024, Volume and Issue: 3(3), P. 467 - 481

Published: Jan. 1, 2024

Pareto optimization is suited to multi-objective problems when the relative importance of objectives not known a priori. We report an open source tool accelerate docking-based virtual screening with strong empirical performance.

Language: Английский

Citations

16

Deep learning and knowledge-based methods for computer-aided molecular design—toward a unified approach: State-of-the-art and future directions DOI
Abdulelah S. Alshehri, Rafiqul Gani, Fengqi You

et al.

Computers & Chemical Engineering, Journal Year: 2020, Volume and Issue: 141, P. 107005 - 107005

Published: July 2, 2020

Language: Английский

Citations

100

Accelerating high-throughput virtual screening through molecular pool-based active learning DOI Creative Commons
David Graff,

Eugene I. Shakhnovich,

Connor W. Coley

et al.

Chemical Science, Journal Year: 2021, Volume and Issue: 12(22), P. 7866 - 7881

Published: Jan. 1, 2021

Structure-based virtual screening is an important tool in early stage drug discovery that scores the interactions between a target protein and candidate ligands. As libraries continue to grow (in excess of 108 molecules), so too do resources necessary conduct exhaustive campaigns on these libraries. However, Bayesian optimization techniques, previously employed other scientific problems, can aid their exploration: surrogate structure-property relationship model trained predicted affinities subset library be applied remaining members, allowing least promising compounds excluded from evaluation. In this study, we explore application techniques computational docking datasets assess impact architecture, acquisition function, batch size performance. We observe significant reductions costs; for example, using directed-message passing neural network identify 94.8% or 89.3% top-50 000 ligands 100M member after testing only 2.4% upper confidence bound greedy strategy, respectively. Such model-guided searches mitigate increasing costs increasingly large accelerate high-throughput with applications beyond docking.

Language: Английский

Citations

95

Comparative analysis of molecular fingerprints in prediction of drug combination effects DOI Creative Commons
Bulat Zagidullin, Ziyan Wang, Yuanfang Guan

et al.

Briefings in Bioinformatics, Journal Year: 2021, Volume and Issue: 22(6)

Published: Aug. 9, 2021

Application of machine and deep learning methods in drug discovery cancer research has gained a considerable amount attention the past years. As field grows, it becomes crucial to systematically evaluate performance novel computational solutions relation established techniques. To this end, we compare rule-based data-driven molecular representations prediction combination sensitivity synergy scores using standardized results 14 high-throughput screening studies, comprising 64 200 unique combinations 4153 molecules tested 112 cell lines. We clustering quantify their similarity by adapting Centered Kernel Alignment metric. Our work demonstrates that identify an optimal representation type, is necessary supplement quantitative benchmark with qualitative considerations, such as model interpretability robustness, which may vary between throughout preclinical development projects.

Language: Английский

Citations

74

Multi-fidelity prediction of molecular optical peaks with deep learning DOI Creative Commons
Kevin P. Greenman, William H. Green, Rafael Gómez‐Bombarelli

et al.

Chemical Science, Journal Year: 2022, Volume and Issue: 13(4), P. 1152 - 1162

Published: Jan. 1, 2022

Optical properties are central to molecular design for many applications, including solar cells and biomedical imaging. A variety of

Language: Английский

Citations

59

Critical assessment of AI in drug discovery DOI
W. Patrick Walters,

Regina Barzilay

Expert Opinion on Drug Discovery, Journal Year: 2021, Volume and Issue: 16(9), P. 937 - 947

Published: April 19, 2021

Introduction: Artificial Intelligence (AI) has become a component of our everyday lives, with applications ranging from recommendations on what to buy the analysis radiology images. Many techniques originally developed for other fields such as language translation and computer vision are now being applied in drug discovery. AI enabled multiple aspects discovery including high content screening data, design synthesis new molecules.Areas covered: This perspective provides an overview application several areas relevant property prediction, molecule generation, image analysis, organic planning.Expert opinion: While variety machine learning methods routinely used predict biological activity ADME properties, representing molecules continue evolve. Molecule generation relatively unproven but hold potential access new, unexplored chemical space. The will benefit dedicated research, well developments fields. With this pairing algorithmic advancements high-quality impact grow coming years.

Language: Английский

Citations

58

Uncertainty-aware prediction of chemical reaction yields with graph neural networks DOI Creative Commons
Youngchun Kwon, Dongseon Lee, Youn-Suk Choi

et al.

Journal of Cheminformatics, Journal Year: 2022, Volume and Issue: 14(1)

Published: Jan. 10, 2022

In this paper, we present a data-driven method for the uncertainty-aware prediction of chemical reaction yields. The reactants and products in are represented as set molecular graphs. predictive distribution yield is modeled graph neural network that directly processes graphs with permutation invariance. Uncertainty-aware learning inference applied to model make accurate predictions evaluate their uncertainty. We demonstrate effectiveness proposed on benchmark datasets various settings. Compared existing methods, improves uncertainty quantification performance most

Language: Английский

Citations

47