Journal of Catalysis, Год журнала: 2023, Номер 429, С. 115240 - 115240
Опубликована: Дек. 5, 2023
Язык: Английский
Journal of Catalysis, Год журнала: 2023, Номер 429, С. 115240 - 115240
Опубликована: Дек. 5, 2023
Язык: Английский
Journal of Chemical Information and Modeling, Год журнала: 2024, Номер 64(8), С. 2955 - 2970
Опубликована: Март 15, 2024
Chemical reactions serve as foundational building blocks for organic chemistry and drug design. In the era of large AI models, data-driven approaches have emerged to innovate design novel reactions, optimize existing ones higher yields, discover new pathways synthesizing chemical structures comprehensively. To effectively address these challenges with machine learning it is imperative derive robust informative representations or engage in feature engineering using extensive data sets reactions. This work aims provide a comprehensive review established reaction featurization approaches, offering insights into selection features wide array tasks. The advantages limitations employing SMILES, molecular fingerprints, graphs, physics-based properties are meticulously elaborated. Solutions bridge gap between different will also be critically evaluated. Additionally, we introduce frontier pretraining, holding promise an innovative yet unexplored avenue.
Язык: Английский
Процитировано
10Journal of the American Chemical Society, Год журнала: 2024, Номер 146(22), С. 15070 - 15084
Опубликована: Май 20, 2024
Despite the increased use of computational tools to supplement medicinal chemists' expertise and intuition in drug design, predicting synthetic yields chemistry endeavors remains an unsolved challenge. Existing design workflows could profoundly benefit from reaction yield prediction, as precious material waste be reduced, a greater number relevant compounds delivered advance make, test, analyze (DMTA) cycle. In this work, we detail evaluation AbbVie's library data set build machine learning models for prediction Suzuki coupling yields. The combination density functional theory (DFT)-derived features Morgan fingerprints was identified perform better than one-hot encoded baseline modeling, furnishing encouraging results. Overall, observe modest generalization unseen reactant structures within 15-year retrospective set. Additionally, compare predictions made by model those expert chemists, finding that can often predict both success with accuracy. Finally, demonstrate application approach suggest structurally electronically similar building blocks replace predicted or observed unsuccessful prior after synthesis, respectively. used select monomers have higher yields, resulting synthesis efficiency drug-like molecules.
Язык: Английский
Процитировано
9Nature Communications, Год журнала: 2025, Номер 16(1)
Опубликована: Апрель 10, 2025
Язык: Английский
Процитировано
1Communications Chemistry, Год журнала: 2024, Номер 7(1)
Опубликована: Июнь 14, 2024
Recent years have seen a rapid growth in the application of various machine learning methods for reaction outcome prediction. Deep models gained popularity due to their ability learn representations directly from molecular structure. Gaussian processes (GPs), on other hand, provide reliable uncertainty estimates but are unable data. We combine feature neural networks (NNs) with quantification GPs deep kernel (DKL) framework predict outcome. The DKL model is observed obtain very good predictive performance across different input representations. It significantly outperforms standard and provides comparable graph networks, estimation. Additionally, predictions provided by facilitated its incorporation as surrogate Bayesian optimization (BO). proposed method, therefore, has great potential towards accelerating discovery integrating accurate that BO.
Язык: Английский
Процитировано
4Nature Communications, Год журнала: 2024, Номер 15(1)
Опубликована: Ноя. 27, 2024
Discovery of new types reactions is essential to organic chemistry because it expands the scope accessible molecular scaffolds and can enable more economical syntheses existing structures. In this context, so-called multicomponent reactions, MCRs, are particular interest they build complex from multiple starting materials in just one step, without purification intermediates. However, for over a century active research, MCRs have been discovered rather than designed, their number remains limited only several hundred. This work demonstrates that computers taught knowledge reaction mechanisms rules physical-organic design – completely autonomously large numbers mechanistically distinct MCRs. Moreover, when supplemented by models approximate kinetic rates, algorithm predict yields identify potential organocatalysis. These predictions validated experiments spanning different modes reactivity diverse product scaffolds. Multi component (MCRs) step intermediates but until now designed. Here, authors demonstrate an algorithmic approach based numbers.
Язык: Английский
Процитировано
4LWT, Год журнала: 2025, Номер unknown, С. 117412 - 117412
Опубликована: Янв. 1, 2025
Язык: Английский
Процитировано
0Journal of Chemical Information and Modeling, Год журнала: 2025, Номер unknown
Опубликована: Май 1, 2025
Predicting reaction yields in synthetic chemistry remains a significant challenge. This study systematically evaluates the impact of tokenization, molecular representation, pretraining data, and adversarial training on BERT-based model for yield prediction Buchwald-Hartwig Suzuki-Miyaura coupling reactions using publicly available HTE data sets. We demonstrate that representation choice (SMILES, DeepSMILES, SELFIES, Morgan fingerprint-based notation, IUPAC names) has minimal performance, while typically BPE SentencePiece tokenization outperform other methods. WordPiece is strongly discouraged SELFIES notation. Furthermore, with relatively small sets (<100 K reactions) achieves comparable performance to larger containing millions examples. The use artificially generated domain-specific proposed. prove be good surrogate schemes extracted from such as Pistachio or Reaxys. best was observed hybrid combining real domain-specific, artificial data. Finally, we show novel approach, perturbing input embeddings dynamically, improves robustness generalizability success prediction. These findings provide valuable insights developing robust practical machine learning models chemistry. GSK's BERT code base made community this work.
Язык: Английский
Процитировано
0Chemical Science, Год журнала: 2025, Номер unknown
Опубликована: Янв. 1, 2025
Amide coupling, a key medicinal chemistry reaction, benefits from AI to minimize trial-and-error.
Язык: Английский
Процитировано
0The Journal of Physical Chemistry A, Год журнала: 2024, Номер unknown
Опубликована: Ноя. 20, 2024
The application of machine learning (ML) to predict reaction yields has shown remarkable accuracy when based on high-throughput computational and experimental data. However, the significantly diminishes leveraging literature-derived data, highlighting a gap in predictive capability current ML models. This study, focusing Pd-catalyzed carbonylation reactions, reveals that even with data set 2512 best-performing model reaches only an R2 0.51. Further investigations show models' effectiveness is predominantly confined predictions within narrow subsets closely related from same literature sources, rather than across broader, heterogeneous sets available literature. reliance similarity, coupled small sample sizes makes highly sensitive inherent fluctuations typical sets, adversely impacting stability, accuracy, generalizability. findings underscore limitations techniques for predicting chemical yields, need more sophisticated approaches handle complexity diversity
Язык: Английский
Процитировано
1Nature Communications, Год журнала: 2024, Номер 15(1)
Опубликована: Ноя. 27, 2024
This work describes estimation of yields complex, multicomponent reactions (MCRs) based on the modeled networks mechanistic steps spanning both main reaction pathway as well immediate and downstream side reactions. Because experimental values kinetic rate constants for individual transforms are extremely sparse, these approximated here using Mayr's nucleophilicity electrophilicity parameters fine-tuned by correction terms grounded in linear free-energy relationships. With this formalism, model trained only 20 – but mechanistically- yield-diverse MCRs transfers to newly discovered that markedly different mechanisms types transforms. These results suggest mechanistic-level approach yield may be a useful alternative models derived from full-reaction data lack information about yield-lowering The ability predict organic is tremendous value synthetic chemistry, limiting number unproductive experiments. Here, authors describe
Язык: Английский
Процитировано
0