The challenge of balancing model sensitivity and robustness in predicting yields: a benchmarking study of amide coupling reactions DOI Creative Commons
Zhen Liu, Yurii S. Moroz, Olexandr Isayev

и другие.

Chemical Science, Год журнала: 2023, Номер 14(39), С. 10835 - 10846

Опубликована: Янв. 1, 2023

A sensitive model captures the reactivity cliffs but overfit to yield outliers. On other hand, a robust disregards outliers underfits cliffs.

Язык: Английский

Unified Deep Learning Model for Multitask Reaction Predictions with Explanation DOI
Jieyu Lü, Yingkai Zhang

Journal of Chemical Information and Modeling, Год журнала: 2022, Номер 62(6), С. 1376 - 1387

Опубликована: Март 10, 2022

There is significant interest and importance to develop robust machine learning models assist organic chemistry synthesis. Typically, task-specific for distinct reaction prediction tasks have been developed. In this work, we a unified deep model, T5Chem, variety of chemical predictions by adapting the "Text-to-Text Transfer Transformer" (T5) framework in natural language processing (NLP). On basis self-supervised pretraining with PubChem molecules, T5Chem model can achieve state-of-the-art performances four types using different open-source data sets, including type classification on USPTO_TPL, forward USPTO_MIT, single-step retrosynthesis USPTO_50k, yield high-throughput C–N coupling reactions. Meanwhile, introduced new multitask set USPTO_500_MT, which be used train test five tasks, above as well reagent suggestion task. Our results showed that trained multiple are more benefit from mutual related tasks. Furthermore, demonstrated use SHAP (SHapley Additive exPlanations) explain at functional group level, provides way demystify sequence-based chemistry. accessible through https://yzhang.hpc.nyu.edu/T5Chem.

Язык: Английский

Процитировано

71

Predicting reaction conditions from limited data through active transfer learning DOI Creative Commons
Eunjae Shim, Joshua Kammeraad, Ziping Xu

и другие.

Chemical Science, Год журнала: 2022, Номер 13(22), С. 6655 - 6668

Опубликована: Янв. 1, 2022

Transfer learning is combined with active to discover synthetic reaction conditions in a small-data regime. This strategy tested on cross-coupling reactions from high-throughput experimentation dataset and shows promising results.

Язык: Английский

Процитировано

47

Enabling late-stage drug diversification by high-throughput experimentation with geometric deep learning DOI Creative Commons
David F. Nippa, Kenneth Atz,

Remo Hohler

и другие.

Nature Chemistry, Год журнала: 2023, Номер 16(2), С. 239 - 248

Опубликована: Ноя. 23, 2023

Abstract Late-stage functionalization is an economical approach to optimize the properties of drug candidates. However, chemical complexity molecules often makes late-stage diversification challenging. To address this problem, a platform based on geometric deep learning and high-throughput reaction screening was developed. Considering borylation as critical step in functionalization, computational model predicted yields for diverse conditions with mean absolute error margin 4–5%, while reactivity novel reactions known unknown substrates classified balanced accuracy 92% 67%, respectively. The regioselectivity major products accurately captured classifier F -score 67%. When applied 23 commercial molecules, successfully identified numerous opportunities structural diversification. influence steric electronic information performance quantified, comprehensive simple user-friendly format introduced that proved be key enabler seamlessly integrating experimentation functionalization.

Язык: Английский

Процитировано

44

Dataset Design for Building Models of Chemical Reactivity DOI Creative Commons
Priyanka Raghavan, Brittany C. Haas, Madeline E. Ruos

и другие.

ACS Central Science, Год журнала: 2023, Номер 9(12), С. 2196 - 2204

Опубликована: Дек. 8, 2023

Models can codify our understanding of chemical reactivity and serve a useful purpose in the development new synthetic processes via, for example, evaluating hypothetical reaction conditions or silico substrate tolerance. Perhaps most determining factor is composition training data whether it sufficient to train model that make accurate predictions over full domain interest. Here, we discuss design datasets ways are conducive data-driven modeling, emphasizing idea set diversity generalizability rely on choice molecular representation. We additionally experimental constraints associated with generating common types chemistry how these considerations should influence dataset building.

Язык: Английский

Процитировано

42

Graph-based Molecular Representation Learning DOI Open Access

Zhichun Guo,

Kehan Guo,

Bozhao Nan

и другие.

Опубликована: Авг. 1, 2023

Molecular representation learning (MRL) is a key step to build the connection between machine and chemical science. In particular, it encodes molecules as numerical vectors preserving molecular structures features, on top of which downstream tasks (e.g., property prediction) can be performed. Recently, MRL has achieved considerable progress, especially in methods based deep graph learning. this survey, we systematically review these graph-based techniques, incorporating domain knowledge. Specifically, first introduce features 2D 3D graphs. Then summarize categorize into three groups their input. Furthermore, discuss some typical applications supported by MRL. To facilitate studies fast-developing area, also list benchmarks commonly used datasets paper. Finally, share our thoughts future research directions.

Язык: Английский

Процитировано

24

Paving the road towards automated homogeneous catalyst design DOI Creative Commons
Adarsh V. Kalikadien,

A.H. Mirza,

Aydin Najl Hossaini

и другие.

ChemPlusChem, Год журнала: 2024, Номер 89(7)

Опубликована: Янв. 26, 2024

In the past decade, computational tools have become integral to catalyst design. They continue offer significant support experimental organic synthesis and catalysis researchers aiming for optimal reaction outcomes. More recently, data-driven approaches utilizing machine learning garnered considerable attention their expansive capabilities. This Perspective provides an overview of diverse initiatives in realm design introduces our automated tailored high-throughput silico exploration chemical space. While valuable insights are gained through methods analysis space, degree automation modularity key. We argue that integration data-driven, modular workflows is key enhancing homogeneous on unprecedented scale, contributing advancement research.

Язык: Английский

Процитировано

15

AI for organic and polymer synthesis DOI

Hong Xin,

Qi Yang, Kuangbiao Liao

и другие.

Science China Chemistry, Год журнала: 2024, Номер 67(8), С. 2461 - 2496

Опубликована: Июнь 26, 2024

Язык: Английский

Процитировано

12

Machine learning-guided strategies for reaction conditions design and optimization DOI Creative Commons
Lung-Yi Chen, Yi‐Pei Li

Beilstein Journal of Organic Chemistry, Год журнала: 2024, Номер 20, С. 2476 - 2492

Опубликована: Окт. 4, 2024

This review surveys the recent advances and challenges in predicting optimizing reaction conditions using machine learning techniques. The paper emphasizes importance of acquiring processing large diverse datasets chemical reactions, use both global local models to guide design synthetic processes. Global exploit information from comprehensive databases suggest general for new while fine-tune specific parameters a given family improve yield selectivity. also identifies current limitations opportunities this field, such as data quality availability, integration high-throughput experimentation. demonstrates how combination engineering, science, ML algorithms can enhance efficiency effectiveness design, enable novel discoveries chemistry.

Язык: Английский

Процитировано

11

Exploring Chemical Reaction Space with Machine Learning Models: Representation and Feature Perspective DOI

Yuheng Ding,

Bo Qiang, Qixuan Chen

и другие.

Journal of Chemical Information and Modeling, Год журнала: 2024, Номер 64(8), С. 2955 - 2970

Опубликована: Март 15, 2024

Chemical reactions serve as foundational building blocks for organic chemistry and drug design. In the era of large AI models, data-driven approaches have emerged to innovate design novel reactions, optimize existing ones higher yields, discover new pathways synthesizing chemical structures comprehensively. To effectively address these challenges with machine learning it is imperative derive robust informative representations or engage in feature engineering using extensive data sets reactions. This work aims provide a comprehensive review established reaction featurization approaches, offering insights into selection features wide array tasks. The advantages limitations employing SMILES, molecular fingerprints, graphs, physics-based properties are meticulously elaborated. Solutions bridge gap between different will also be critically evaluated. Additionally, we introduce frontier pretraining, holding promise an innovative yet unexplored avenue.

Язык: Английский

Процитировано

10

Designing Target-specific Data Sets for Regioselectivity Predictions on Complex Substrates DOI Creative Commons
Jules Schleinitz, Alba Carretero‐Cerdán, Anjali Gurajapu

и другие.

Journal of the American Chemical Society, Год журнала: 2025, Номер 147(9), С. 7476 - 7484

Опубликована: Фев. 21, 2025

The development of machine learning models to predict the regioselectivity C(sp3)-H functionalization reactions is reported. A data set for dioxirane oxidations was curated from literature and used generate a model C-H oxidation. To assess whether smaller, intentionally designed sets could provide accuracy on complex targets, series acquisition functions were developed select most informative molecules specific target. Active learning-based that leverage predicted reactivity uncertainty found outperform those based molecular site similarity alone. use elaboration significantly reduced number points needed perform accurate prediction, it machine-designed can give predictions when larger, randomly selected fail. Finally, workflow experimentally validated five substrates shown be applicable predicting arene radical borylation. These studies quantitative alternative intuitive extrapolation "model substrates" frequently estimate molecules.

Язык: Английский

Процитировано

2