The challenge of balancing model sensitivity and robustness in predicting yields: a benchmarking study of amide coupling reactions DOI Creative Commons
Zhen Liu, Yurii S. Moroz, Olexandr Isayev

и другие.

Chemical Science, Год журнала: 2023, Номер 14(39), С. 10835 - 10846

Опубликована: Янв. 1, 2023

A sensitive model captures the reactivity cliffs but overfit to yield outliers. On other hand, a robust disregards outliers underfits cliffs.

Язык: Английский

HCat-GNet: a Human-Interpretable GNN Tool for Ligand Optimization in Asymmetric Catalysis DOI Creative Commons
Eduardo Alberto Aguilar Bejarano, Ender Özcan, Raja K. Rit

и другие.

iScience, Год журнала: 2025, Номер 28(3), С. 111881 - 111881

Опубликована: Янв. 24, 2025

Язык: Английский

Процитировано

1

Recent Applications of Machine Learning in Molecular Property and Chemical Reaction Outcome Predictions DOI
Shilpa Shilpa,

Gargee Kashyap,

Raghavan B. Sunoj

и другие.

The Journal of Physical Chemistry A, Год журнала: 2023, Номер 127(40), С. 8253 - 8271

Опубликована: Сен. 28, 2023

Burgeoning developments in machine learning (ML) and its rapidly growing adaptations chemistry are noteworthy. Motivated by the successful deployments of ML realm molecular property prediction (MPP) chemical reaction (CRP), herein we highlight some most recent applications predictive chemistry. We present a nonmathematical concise overview progression implementations, ranging from an ensemble-based random forest model to advanced graph neural network algorithms. Similarly, prospects various feature engineering approaches that work conjunction with models described. Highly accurate predictions reported MPP tasks (e.g., lipophilicity, solubility, distribution coefficient), using methods such as D-MPNN, MolCLR, SMILES-BERT, MolBERT, offer promising avenues design drug discovery. Whereas pertains given molecule, reactions different level challenge, primarily arising simultaneous involvement multiple molecules their diverse roles setting. The RMSEs range 0.287 2.20, while those for yield well over 4.9 lower end, reaching thresholds >10.0 several examples. Our Review concludes set persisting challenges dealing data sets overall optimistic outlook on benefits ML-driven workflows CRP tasks.

Язык: Английский

Процитировано

20

When Yield Prediction Does Not Yield Prediction: An Overview of the Current Challenges DOI Creative Commons
Varvara Voinarovska, Mikhail A. Kabeshov, Dmytro Dudenko

и другие.

Journal of Chemical Information and Modeling, Год журнала: 2023, Номер 64(1), С. 42 - 56

Опубликована: Дек. 20, 2023

Machine Learning (ML) techniques face significant challenges when predicting advanced chemical properties, such as yield, feasibility of synthesis, and optimal reaction conditions. These stem from the high-dimensional nature prediction task myriad essential variables involved, ranging reactants reagents to catalysts, temperature, purification processes. Successfully developing a reliable predictive model not only holds potential for optimizing high-throughput experiments but can also elevate existing retrosynthetic approaches bolster plethora applications within field. In this review, we systematically evaluate efficacy current ML methodologies in chemoinformatics, shedding light on their milestones inherent limitations. Additionally, detailed examination representative case study provides insights into prevailing issues related data availability transferability discipline.

Язык: Английский

Процитировано

20

Incorporating Synthetic Accessibility in Drug Design: Predicting Reaction Yields of Suzuki Cross-Couplings by Leveraging AbbVie’s 15-Year Parallel Library Data Set DOI Creative Commons
Priyanka Raghavan, Alexander J. Rago, Pritha Verma

и другие.

Journal of the American Chemical Society, Год журнала: 2024, Номер 146(22), С. 15070 - 15084

Опубликована: Май 20, 2024

Despite the increased use of computational tools to supplement medicinal chemists' expertise and intuition in drug design, predicting synthetic yields chemistry endeavors remains an unsolved challenge. Existing design workflows could profoundly benefit from reaction yield prediction, as precious material waste be reduced, a greater number relevant compounds delivered advance make, test, analyze (DMTA) cycle. In this work, we detail evaluation AbbVie's library data set build machine learning models for prediction Suzuki coupling yields. The combination density functional theory (DFT)-derived features Morgan fingerprints was identified perform better than one-hot encoded baseline modeling, furnishing encouraging results. Overall, observe modest generalization unseen reactant structures within 15-year retrospective set. Additionally, compare predictions made by model those expert chemists, finding that can often predict both success with accuracy. Finally, demonstrate application approach suggest structurally electronically similar building blocks replace predicted or observed unsuccessful prior after synthesis, respectively. used select monomers have higher yields, resulting synthesis efficiency drug-like molecules.

Язык: Английский

Процитировано

9

Prediction of chemical reaction yields with large-scale multi-view pre-training DOI Creative Commons
Runhan Shi, Gufeng Yu, Xiaohong Huo

и другие.

Journal of Cheminformatics, Год журнала: 2024, Номер 16(1)

Опубликована: Фев. 25, 2024

Developing machine learning models with high generalization capability for predicting chemical reaction yields is of significant interest and importance. The efficacy such depends heavily on the representation reactions, which has commonly been learned from SMILES or graphs molecules using deep neural networks. However, progression reactions inherently determined by molecular 3D geometric properties, have recently highlighted as crucial features in accurately properties reactions. Additionally, large-scale pre-training shown to be essential enhancing complex models. Based these considerations, we propose Reaction Multi-View Pre-training (ReaMVP) framework, leverages self-supervised techniques a two-stage strategy predict yields. By incorporating multi-view information, ReaMVP achieves state-of-the-art performance two benchmark datasets. Notably, experimental results indicate that advantage out-of-sample data, suggesting an enhanced ability new Scientific Contribution: This study presents improves integrating sequential views leveraging strategy, framework demonstrates superior predictive data enhances prediction

Язык: Английский

Процитировано

8

MetaRF: attention-based random forest for reaction yield prediction with a few trails DOI Creative Commons
Kexin Chen, Guangyong Chen, Junyou Li

и другие.

Journal of Cheminformatics, Год журнала: 2023, Номер 15(1)

Опубликована: Апрель 10, 2023

Artificial intelligence has deeply revolutionized the field of medicinal chemistry with many impressive applications, but success these applications requires a massive amount training samples high-quality annotations, which seriously limits wide usage data-driven methods. In this paper, we focus on reaction yield prediction problem, assists chemists in selecting high-yield reactions new chemical space only few experimental trials. To attack challenge, first put forth MetaRF, an attention-based random forest model specially designed for few-shot prediction, where attention weight is automatically optimized by meta-learning framework and can be quickly adapted to predict performance reagents while given additional samples. improve learning performance, further introduce dimension-reduction based sampling method determine valuable experimentally tested then learned. Our methodology evaluated three different datasets acquires satisfactory prediction. high-throughput experimentation (HTE) datasets, average our methodology's top 10 relatively close results ideal selection.

Язык: Английский

Процитировано

17

Interplay of Computation and Experiment in Enantioselective Catalysis: Rationalization, Prediction, and─Correction? DOI Open Access
Michael P. Maloney, Brock A. Stenfors, Paul Helquist

и другие.

ACS Catalysis, Год журнала: 2023, Номер 13(21), С. 14285 - 14299

Опубликована: Окт. 26, 2023

The application of computational methods in enantioselective catalysis has evolved from the rationalization observed stereochemical outcome to their prediction and design chiral ligands. This Perspective provides an overview current used, ranging atomistic modeling transition structures involved correlation-based with particular emphasis placed on Q2MM/CatVS method. Using three palladium-catalyzed reactions, namely, conjugate addition arylboronic acids enones, redox relay Heck reaction, Tsuji–Trost allylic amination as case studies, we argue that have become truly equal partners experimental studies that, some cases, they are able correct published assignments. Finally, consequences this approach data-driven discussed.

Язык: Английский

Процитировано

15

Probing machine learning models based on high throughput experimentation data for the discovery of asymmetric hydrogenation catalysts DOI Creative Commons
Adarsh V. Kalikadien, Cecile Valsecchi, Robbert van Putten

и другие.

Chemical Science, Год журнала: 2024, Номер 15(34), С. 13618 - 13630

Опубликована: Янв. 1, 2024

Enantioselective hydrogenation of olefins by Rh-based chiral catalysts has been extensively studied for more than 50 years. Naively, one would expect that everything about this transformation is known and selecting a catalyst induces the desired reactivity or selectivity trivial task. Nonetheless, ligand engineering selection any new prochiral olefin remains an empirical trial-error exercise. In study, we investigated whether machine learning techniques could be used to accelerate identification most efficient ligand. For purpose, high throughput experimentation build large dataset consisting results Rh-catalyzed asymmetric hydrogenation, specially designed applications in learning. We showcased its alignment with existing literature while addressing observed discrepancies. Additionally, computational framework automated reproducible quantum-chemistry based featurization structures was created. Together less computationally demanding representations, these descriptors were fed into our pipeline both out-of-domain in-domain prediction tasks reactivity. purposes, models provided limited efficacy. It found even expensive do not impart significant meaning model predictions. The application, partly successful predictions conversion, emphasizes need evaluating cost-benefit ratio intensive tailored descriptor design. Challenges persist predicting enantioselectivity, calling caution interpreting from small datasets. Our insights underscore importance diversity broad substrate inclusion suggest mechanistic considerations improve accuracy statistical models.

Язык: Английский

Процитировано

6

Global reactivity models are impactful in industrial synthesis applications DOI Creative Commons
Paulo Neves,

Kelly J. McClure,

Jonas Verhoeven

и другие.

Journal of Cheminformatics, Год журнала: 2023, Номер 15(1)

Опубликована: Фев. 11, 2023

Artificial Intelligence is revolutionizing many aspects of the pharmaceutical industry. Deep learning models are now routinely applied to guide drug discovery projects leading faster and improved findings, but there still tasks with enormous unrealized potential. One such task reaction yield prediction. Every year more than one fifth all synthesis attempts result in product yields which either zero or too low. This equates chemical human resources being spent on activities ultimately do not progress programs, a triple loss when accounting for cost opportunity time wasted. In this work we pre-train BERT model 16 million reactions from 4 different data sources, fine tune it achieve an uncertainty calibrated global prediction model. improvement upon state art just increase also by introducing new embedding layer solves few limitations SMILES enables integration additional information as equivalents molecule role into encoding, called Enriched Embedding (BEE). The benchmarked open-source dataset against state-of-the-art focused showing near 20-point r2 score. fine-tuned tested internal company benchmark, prospective study shows that application can reduce total number negative (yield under 5%) ran Janssen at least 34%. Lastly, corroborate previous results through experimental validation, directly deploying on-going project be used successfully reagent recommender due its fast inference speed reliable confidence estimation, critical feature industry application.

Язык: Английский

Процитировано

12

The rise of automated curiosity-driven discoveries in chemistry DOI Creative Commons
Latimah Bustillo, Teodoro Laino, Tiago Rodrigues

и другие.

Chemical Science, Год журнала: 2023, Номер 14(38), С. 10378 - 10384

Опубликована: Янв. 1, 2023

The quest for generating novel chemistry knowledge is critical in scientific advancement, and machine learning (ML) has emerged as an asset this pursuit.

Язык: Английский

Процитировано

12