C–H Aminoalkylation of 5-Membered Heterocycles: Influence of Descriptors, Data Set Size, and Data Quality on the Predictiveness of Machine Learning Models and Expansion of the Substrate Space Beyond 1,3-Azoles DOI
Stephanie Felten, Cyndi Qixin He, Marion H. Emmert

et al.

The Journal of Organic Chemistry, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 11, 2025

We report a general C-H aminoalkylation of 5-membered heterocycles through combined machine learning/experimental workflow. Our work describes previously unknown functionalization reactivity and creates predictive learning (ML) model iterative refinement over 6 rounds active learning. The initial established with 1,3-azoles predicts the reactivities N-aryl indazoles, 1,2,4-triazolopyrazines, 1,2,3-thiadiazoles, 1,3,4-oxadiazoles, while other substrate classes (e.g., pyrazoles 1,2,4-triazoles) are not predicted well. final includes additional heterocyclic scaffolds in training data, which results high accuracy across all tested cores. prediction performance is shown both within set via cross-validation (CV R2 = 0.81) when predicting unseen substrates diverse molecular weight structure (Test 0.95). concept feature engineering discussed, we benchmark mechanistically related DFT-based features that more time-intensive laborious comparison descriptors fingerprints. Importantly, this establishes novel for methods underdeveloped. Since such key motifs drug discovery development, expect to be significant use synthetic synthesis-oriented ML communities.

Language: Английский

Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery DOI Creative Commons
Zhengkai Tu, Thijs Stuyver,

Connor W. Coley

et al.

Chemical Science, Journal Year: 2022, Volume and Issue: 14(2), P. 226 - 244

Published: Nov. 28, 2022

This review outlines several organic chemistry tasks for which predictive machine learning models have been and can be applied.

Language: Английский

Citations

85

Machine-Learning-Assisted Design of Highly Tough Thermosetting Polymers DOI

Yaxi Hu,

Wenlin Zhao,

Liquan Wang

et al.

ACS Applied Materials & Interfaces, Journal Year: 2022, Volume and Issue: 14(49), P. 55004 - 55016

Published: Dec. 1, 2022

Despite advances in machine learning for accurately predicting material properties, forecasting the performance of thermosetting polymers remains a challenge due to sparsity historical experimental data and their complicated crosslinked structures. We proposed machine-learning-assisted materials genome approach (MGA) rapidly designing novel epoxy thermosets with excellent mechanical properties (high tensile moduli, high strength, toughness) through high-throughput screening vast chemical space. Machine-learning models were established by combining attention- gate-augmented graph convolutional networks, multilayer perceptrons, classical gel theory, transfer from small molecules polymers. Proof-of-concept experiments carried out, structures designed MGA verified. Gene substructures affecting modulus, toughness also extracted, revealing mechanisms properties. The developed strategy can be employed design other efficiently.

Language: Английский

Citations

45

Dataset Design for Building Models of Chemical Reactivity DOI Creative Commons
Priyanka Raghavan, Brittany C. Haas, Madeline E. Ruos

et al.

ACS Central Science, Journal Year: 2023, Volume and Issue: 9(12), P. 2196 - 2204

Published: Dec. 8, 2023

Models can codify our understanding of chemical reactivity and serve a useful purpose in the development new synthetic processes via, for example, evaluating hypothetical reaction conditions or silico substrate tolerance. Perhaps most determining factor is composition training data whether it sufficient to train model that make accurate predictions over full domain interest. Here, we discuss design datasets ways are conducive data-driven modeling, emphasizing idea set diversity generalizability rely on choice molecular representation. We additionally experimental constraints associated with generating common types chemistry how these considerations should influence dataset building.

Language: Английский

Citations

42

Rapid planning and analysis of high-throughput experiment arrays for reaction discovery DOI Creative Commons
Babak Mahjour, Rui Zhang, Yuning Shen

et al.

Nature Communications, Journal Year: 2023, Volume and Issue: 14(1)

Published: July 3, 2023

High-throughput experimentation (HTE) is an increasingly important tool in reaction discovery. While the hardware for running HTE chemical laboratory has evolved significantly recent years, there remains a need software solutions to navigate data-rich experiments. Here we have developed phactor™, that facilitates performance and analysis of laboratory. phactor™ allows experimentalists rapidly design arrays reactions or direct-to-biology experiments 24, 96, 384, 1,536 wellplates. Users can access online reagent data, such as inventory, virtually populate wells with produce instructions perform array manually, assistance liquid handling robot. After completion array, analytical results be uploaded facile evaluation, guide next series All metadata, are stored machine-readable formats readily translatable various software. We also demonstrate use discovery several chemistries, including identification low micromolar inhibitor SARS-CoV-2 main protease. Furthermore, been made available free academic 24- 96-well via interface.

Language: Английский

Citations

30

Machine learning-guided strategies for reaction conditions design and optimization DOI Creative Commons
Lung-Yi Chen, Yi‐Pei Li

Beilstein Journal of Organic Chemistry, Journal Year: 2024, Volume and Issue: 20, P. 2476 - 2492

Published: Oct. 4, 2024

This review surveys the recent advances and challenges in predicting optimizing reaction conditions using machine learning techniques. The paper emphasizes importance of acquiring processing large diverse datasets chemical reactions, use both global local models to guide design synthetic processes. Global exploit information from comprehensive databases suggest general for new while fine-tune specific parameters a given family improve yield selectivity. also identifies current limitations opportunities this field, such as data quality availability, integration high-throughput experimentation. demonstrates how combination engineering, science, ML algorithms can enhance efficiency effectiveness design, enable novel discoveries chemistry.

Language: Английский

Citations

11

Exploring Chemical Reaction Space with Machine Learning Models: Representation and Feature Perspective DOI

Yuheng Ding,

Bo Qiang, Qixuan Chen

et al.

Journal of Chemical Information and Modeling, Journal Year: 2024, Volume and Issue: 64(8), P. 2955 - 2970

Published: March 15, 2024

Chemical reactions serve as foundational building blocks for organic chemistry and drug design. In the era of large AI models, data-driven approaches have emerged to innovate design novel reactions, optimize existing ones higher yields, discover new pathways synthesizing chemical structures comprehensively. To effectively address these challenges with machine learning it is imperative derive robust informative representations or engage in feature engineering using extensive data sets reactions. This work aims provide a comprehensive review established reaction featurization approaches, offering insights into selection features wide array tasks. The advantages limitations employing SMILES, molecular fingerprints, graphs, physics-based properties are meticulously elaborated. Solutions bridge gap between different will also be critically evaluated. Additionally, we introduce frontier pretraining, holding promise an innovative yet unexplored avenue.

Language: Английский

Citations

10

Designing Target-specific Data Sets for Regioselectivity Predictions on Complex Substrates DOI Creative Commons
Jules Schleinitz, Alba Carretero‐Cerdán, Anjali Gurajapu

et al.

Journal of the American Chemical Society, Journal Year: 2025, Volume and Issue: 147(9), P. 7476 - 7484

Published: Feb. 21, 2025

The development of machine learning models to predict the regioselectivity C(sp3)-H functionalization reactions is reported. A data set for dioxirane oxidations was curated from literature and used generate a model C-H oxidation. To assess whether smaller, intentionally designed sets could provide accuracy on complex targets, series acquisition functions were developed select most informative molecules specific target. Active learning-based that leverage predicted reactivity uncertainty found outperform those based molecular site similarity alone. use elaboration significantly reduced number points needed perform accurate prediction, it machine-designed can give predictions when larger, randomly selected fail. Finally, workflow experimentally validated five substrates shown be applicable predicting arene radical borylation. These studies quantitative alternative intuitive extrapolation "model substrates" frequently estimate molecules.

Language: Английский

Citations

2

High‐Throughput Experimentation and Machine Learning‐Assisted Optimization of Iridium‐Catalyzed Cross‐Dimerization of Sulfoxonium Ylides DOI Open Access

Yougen Xu,

Ya-Dong Gao,

Lebin Su

et al.

Angewandte Chemie International Edition, Journal Year: 2023, Volume and Issue: 62(48)

Published: Oct. 10, 2023

A novel and convenient approach that combines high-throughput experimentation (HTE) with machine learning (ML) technologies to achieve the first selective cross-dimerization of sulfoxonium ylides via iridium catalysis is presented. variety valuable amide-, ketone-, ester-, N-heterocycle-substituted unsymmetrical E-alkenes are synthesized in good yields high stereoselectivities. This mild method avoids use diazo compounds characterized by simple operation, step-economy, excellent chemoselectivity functional group compatibility. The combined experimental computational studies identify an amide-sulfoxonium ylide as a carbene precursor. Furthermore, comprehensive exploration reaction space also performed (600 reactions) model for yield prediction has been constructed.

Language: Английский

Citations

20

When Yield Prediction Does Not Yield Prediction: An Overview of the Current Challenges DOI Creative Commons
Varvara Voinarovska, Mikhail A. Kabeshov, Dmytro Dudenko

et al.

Journal of Chemical Information and Modeling, Journal Year: 2023, Volume and Issue: 64(1), P. 42 - 56

Published: Dec. 20, 2023

Machine Learning (ML) techniques face significant challenges when predicting advanced chemical properties, such as yield, feasibility of synthesis, and optimal reaction conditions. These stem from the high-dimensional nature prediction task myriad essential variables involved, ranging reactants reagents to catalysts, temperature, purification processes. Successfully developing a reliable predictive model not only holds potential for optimizing high-throughput experiments but can also elevate existing retrosynthetic approaches bolster plethora applications within field. In this review, we systematically evaluate efficacy current ML methodologies in chemoinformatics, shedding light on their milestones inherent limitations. Additionally, detailed examination representative case study provides insights into prevailing issues related data availability transferability discipline.

Language: Английский

Citations

20

Machine Learning Strategies for Reaction Development: Toward the Low-Data Limit DOI
Eunjae Shim, Ambuj Tewari, Tim Cernak

et al.

Journal of Chemical Information and Modeling, Journal Year: 2023, Volume and Issue: 63(12), P. 3659 - 3668

Published: June 14, 2023

Machine learning models are increasingly being utilized to predict outcomes of organic chemical reactions. A large amount reaction data is used train these models, which in stark contrast how expert chemists discover and develop new reactions by leveraging information from a small number relevant transformations. Transfer active two strategies that can operate low-data situations, may help fill this gap promote the use machine for tackling real-world challenges synthesis. This Perspective introduces transfer connects potential opportunities directions further research, especially area prospective development

Language: Английский

Citations

18