Cited by Boosting the generality of catalytic systems by the synergetic ligand effect in Pd-catalyzed C-N cross-coupling

Exploring Chemical Reaction Space with Machine Learning Models: Representation and Feature Perspective DOI

Yuheng Ding,

Bo Qiang, Qixuan Chen

и другие.

Journal of Chemical Information and Modeling, Год журнала: 2024, Номер 64(8), С. 2955 - 2970

Опубликована: Март 15, 2024

Chemical reactions serve as foundational building blocks for organic chemistry and drug design. In the era of large AI models, data-driven approaches have emerged to innovate design novel reactions, optimize existing ones higher yields, discover new pathways synthesizing chemical structures comprehensively. To effectively address these challenges with machine learning it is imperative derive robust informative representations or engage in feature engineering using extensive data sets reactions. This work aims provide a comprehensive review established reaction featurization approaches, offering insights into selection features wide array tasks. The advantages limitations employing SMILES, molecular fingerprints, graphs, physics-based properties are meticulously elaborated. Solutions bridge gap between different will also be critically evaluated. Additionally, we introduce frontier pretraining, holding promise an innovative yet unexplored avenue.

Язык: Английский

Процитировано

Incorporating Synthetic Accessibility in Drug Design: Predicting Reaction Yields of Suzuki Cross-Couplings by Leveraging AbbVie’s 15-Year Parallel Library Data Set DOI

Priyanka Raghavan, Alexander J. Rago, Pritha Verma

и другие.

Journal of the American Chemical Society, Год журнала: 2024, Номер 146(22), С. 15070 - 15084

Опубликована: Май 20, 2024

Despite the increased use of computational tools to supplement medicinal chemists' expertise and intuition in drug design, predicting synthetic yields chemistry endeavors remains an unsolved challenge. Existing design workflows could profoundly benefit from reaction yield prediction, as precious material waste be reduced, a greater number relevant compounds delivered advance make, test, analyze (DMTA) cycle. In this work, we detail evaluation AbbVie's library data set build machine learning models for prediction Suzuki coupling yields. The combination density functional theory (DFT)-derived features Morgan fingerprints was identified perform better than one-hot encoded baseline modeling, furnishing encouraging results. Overall, observe modest generalization unseen reactant structures within 15-year retrospective set. Additionally, compare predictions made by model those expert chemists, finding that can often predict both success with accuracy. Finally, demonstrate application approach suggest structurally electronically similar building blocks replace predicted or observed unsuccessful prior after synthesis, respectively. used select monomers have higher yields, resulting synthesis efficiency drug-like molecules.

Язык: Английский

Процитировано

Transfer learning across different photocatalytic organic reactions DOI

Naoki Noto,

Ryuga Kunisada,

Tabea Rohlfs

и другие.

Nature Communications, Год журнала: 2025, Номер 16(1)

Опубликована: Апрель 10, 2025

Язык: Английский

Процитировано

Deep Kernel learning for reaction outcome prediction and optimization DOI

Sukriti Singh, José Miguel Hernández-Lobato

Communications Chemistry, Год журнала: 2024, Номер 7(1)

Опубликована: Июнь 14, 2024

Recent years have seen a rapid growth in the application of various machine learning methods for reaction outcome prediction. Deep models gained popularity due to their ability learn representations directly from molecular structure. Gaussian processes (GPs), on other hand, provide reliable uncertainty estimates but are unable data. We combine feature neural networks (NNs) with quantification GPs deep kernel (DKL) framework predict outcome. The DKL model is observed obtain very good predictive performance across different input representations. It significantly outperforms standard and provides comparable graph networks, estimation. Additionally, predictions provided by facilitated its incorporation as surrogate Bayesian optimization (BO). proposed method, therefore, has great potential towards accelerating discovery integrating accurate that BO.

Язык: Английский

Процитировано

Systematic, computational discovery of multicomponent and one-pot reactions DOI

Rafał Roszak, Louis Gadina, Agnieszka Wołos

и другие.

Nature Communications, Год журнала: 2024, Номер 15(1)

Опубликована: Ноя. 27, 2024

Discovery of new types reactions is essential to organic chemistry because it expands the scope accessible molecular scaffolds and can enable more economical syntheses existing structures. In this context, so-called multicomponent reactions, MCRs, are particular interest they build complex from multiple starting materials in just one step, without purification intermediates. However, for over a century active research, MCRs have been discovered rather than designed, their number remains limited only several hundred. This work demonstrates that computers taught knowledge reaction mechanisms rules physical-organic design – completely autonomously large numbers mechanistically distinct MCRs. Moreover, when supplemented by models approximate kinetic rates, algorithm predict yields identify potential organocatalysis. These predictions validated experiments spanning different modes reactivity diverse product scaffolds. Multi component (MCRs) step intermediates but until now designed. Here, authors demonstrate an algorithmic approach based numbers.

Язык: Английский

Процитировано

Integrative hyperspectral imaging and artificial intelligence approaches for identifying sucrose substitutes and assessing cookie qualities DOI

Sungmin Jeong,

S.-J. Cho,

Suyong Lee

и другие.

LWT, Год журнала: 2025, Номер unknown, С. 117412 - 117412

Опубликована: Янв. 1, 2025

Язык: Английский

Процитировано

Exploring BERT for Reaction Yield Prediction: Evaluating the Impact of Tokenization, Molecular Representation, and Pretraining Data Augmentation DOI

Adrian Krzyzanowski, Stephen D. Pickett, Péter Pogány

и другие.

Journal of Chemical Information and Modeling, Год журнала: 2025, Номер unknown

Опубликована: Май 1, 2025

Predicting reaction yields in synthetic chemistry remains a significant challenge. This study systematically evaluates the impact of tokenization, molecular representation, pretraining data, and adversarial training on BERT-based model for yield prediction Buchwald-Hartwig Suzuki-Miyaura coupling reactions using publicly available HTE data sets. We demonstrate that representation choice (SMILES, DeepSMILES, SELFIES, Morgan fingerprint-based notation, IUPAC names) has minimal performance, while typically BPE SentencePiece tokenization outperform other methods. WordPiece is strongly discouraged SELFIES notation. Furthermore, with relatively small sets (<100 K reactions) achieves comparable performance to larger containing millions examples. The use artificially generated domain-specific proposed. prove be good surrogate schemes extracted from such as Pistachio or Reaxys. best was observed hybrid combining real domain-specific, artificial data. Finally, we show novel approach, perturbing input embeddings dynamically, improves robustness generalizability success prediction. These findings provide valuable insights developing robust practical machine learning models chemistry. GSK's BERT code base made community this work.

Язык: Английский

Процитировано

Intermediate knowledge enhanced the performance of the amide coupling yield prediction model DOI

Chonghuan Zhang,

Qianghua Lin,

Chenxi Yang

и другие.

Chemical Science, Год журнала: 2025, Номер unknown

Опубликована: Янв. 1, 2025

Amide coupling, a key medicinal chemistry reaction, benefits from AI to minimize trial-and-error.

Язык: Английский

Процитировано

Challenges with Literature-Derived Data in Machine Learning for Yield Prediction: A Case Study on Pd-Catalyzed Carbonylation Reactions DOI

Dongzhi Li, Xue‐Qing Gong

The Journal of Physical Chemistry A, Год журнала: 2024, Номер unknown

Опубликована: Ноя. 20, 2024

The application of machine learning (ML) to predict reaction yields has shown remarkable accuracy when based on high-throughput computational and experimental data. However, the significantly diminishes leveraging literature-derived data, highlighting a gap in predictive capability current ML models. This study, focusing Pd-catalyzed carbonylation reactions, reveals that even with data set 2512 best-performing model reaches only an R2 0.51. Further investigations show models' effectiveness is predominantly confined predictions within narrow subsets closely related from same literature sources, rather than across broader, heterogeneous sets available literature. reliance similarity, coupled small sample sizes makes highly sensitive inherent fluctuations typical sets, adversely impacting stability, accuracy, generalizability. findings underscore limitations techniques for predicting chemical yields, need more sophisticated approaches handle complexity diversity

Язык: Английский

Процитировано

Estimation of multicomponent reactions’ yields from networks of mechanistic steps DOI

Sara Szymkuć, Agnieszka Wołos, Rafał Roszak

и другие.

Nature Communications, Год журнала: 2024, Номер 15(1)

Опубликована: Ноя. 27, 2024

This work describes estimation of yields complex, multicomponent reactions (MCRs) based on the modeled networks mechanistic steps spanning both main reaction pathway as well immediate and downstream side reactions. Because experimental values kinetic rate constants for individual transforms are extremely sparse, these approximated here using Mayr's nucleophilicity electrophilicity parameters fine-tuned by correction terms grounded in linear free-energy relationships. With this formalism, model trained only 20 – but mechanistically- yield-diverse MCRs transfers to newly discovered that markedly different mechanisms types transforms. These results suggest mechanistic-level approach yield may be a useful alternative models derived from full-reaction data lack information about yield-lowering The ability predict organic is tremendous value synthetic chemistry, limiting number unproductive experiments. Here, authors describe

Язык: Английский

Процитировано