A meta-learning approach for selectivity prediction in asymmetric catalysis DOI Creative Commons
Sukriti Singh, José Miguel Hernández-Lobato

Nature Communications, Journal Year: 2025, Volume and Issue: 16(1)

Published: April 15, 2025

Abstract Transition metal-catalyzed asymmetric reactions are of high contemporary importance in organic synthesis. Recently, machine learning (ML) has shown promise accelerating the development newer catalytic protocols. However, need for large amount experimental data can present a bottleneck implementing ML models. Here, we propose meta-learning workflow that harness literature-derived to extract shared reaction features and requires only few examples predict outcome new reactions. Prototypical networks used as method enantioselectivity hydrogenation olefins. This model consistently provides significant performance improvement over other popular methods such random forests graph neural networks. The our meta-model is analyzed with varying sizes training demonstrate its utility even limited data. A good on an out-of-sample test set further indicates general applicability approach. We believe this work will provide leap forward identifying promising early phases when minimal available.

Language: Английский

Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery DOI Creative Commons
Zhengkai Tu, Thijs Stuyver,

Connor W. Coley

et al.

Chemical Science, Journal Year: 2022, Volume and Issue: 14(2), P. 226 - 244

Published: Nov. 28, 2022

This review outlines several organic chemistry tasks for which predictive machine learning models have been and can be applied.

Language: Английский

Citations

77

Designing Target-specific Data Sets for Regioselectivity Predictions on Complex Substrates DOI Creative Commons
Jules Schleinitz, Alba Carretero‐Cerdán, Anjali Gurajapu

et al.

Journal of the American Chemical Society, Journal Year: 2025, Volume and Issue: 147(9), P. 7476 - 7484

Published: Feb. 21, 2025

The development of machine learning models to predict the regioselectivity C(sp3)-H functionalization reactions is reported. A data set for dioxirane oxidations was curated from literature and used generate a model C-H oxidation. To assess whether smaller, intentionally designed sets could provide accuracy on complex targets, series acquisition functions were developed select most informative molecules specific target. Active learning-based that leverage predicted reactivity uncertainty found outperform those based molecular site similarity alone. use elaboration significantly reduced number points needed perform accurate prediction, it machine-designed can give predictions when larger, randomly selected fail. Finally, workflow experimentally validated five substrates shown be applicable predicting arene radical borylation. These studies quantitative alternative intuitive extrapolation "model substrates" frequently estimate molecules.

Language: Английский

Citations

2

Machine-Learning-Assisted Design of Highly Tough Thermosetting Polymers DOI

Yaxi Hu,

Wenlin Zhao,

Liquan Wang

et al.

ACS Applied Materials & Interfaces, Journal Year: 2022, Volume and Issue: 14(49), P. 55004 - 55016

Published: Dec. 1, 2022

Despite advances in machine learning for accurately predicting material properties, forecasting the performance of thermosetting polymers remains a challenge due to sparsity historical experimental data and their complicated crosslinked structures. We proposed machine-learning-assisted materials genome approach (MGA) rapidly designing novel epoxy thermosets with excellent mechanical properties (high tensile moduli, high strength, toughness) through high-throughput screening vast chemical space. Machine-learning models were established by combining attention- gate-augmented graph convolutional networks, multilayer perceptrons, classical gel theory, transfer from small molecules polymers. Proof-of-concept experiments carried out, structures designed MGA verified. Gene substructures affecting modulus, toughness also extracted, revealing mechanisms properties. The developed strategy can be employed design other efficiently.

Language: Английский

Citations

40

Dataset Design for Building Models of Chemical Reactivity DOI Creative Commons
Priyanka Raghavan, Brittany C. Haas, Madeline E. Ruos

et al.

ACS Central Science, Journal Year: 2023, Volume and Issue: 9(12), P. 2196 - 2204

Published: Dec. 8, 2023

Models can codify our understanding of chemical reactivity and serve a useful purpose in the development new synthetic processes via, for example, evaluating hypothetical reaction conditions or silico substrate tolerance. Perhaps most determining factor is composition training data whether it sufficient to train model that make accurate predictions over full domain interest. Here, we discuss design datasets ways are conducive data-driven modeling, emphasizing idea set diversity generalizability rely on choice molecular representation. We additionally experimental constraints associated with generating common types chemistry how these considerations should influence dataset building.

Language: Английский

Citations

37

Rapid planning and analysis of high-throughput experiment arrays for reaction discovery DOI Creative Commons
Babak Mahjour, Rui Zhang, Yuning Shen

et al.

Nature Communications, Journal Year: 2023, Volume and Issue: 14(1)

Published: July 3, 2023

High-throughput experimentation (HTE) is an increasingly important tool in reaction discovery. While the hardware for running HTE chemical laboratory has evolved significantly recent years, there remains a need software solutions to navigate data-rich experiments. Here we have developed phactor™, that facilitates performance and analysis of laboratory. phactor™ allows experimentalists rapidly design arrays reactions or direct-to-biology experiments 24, 96, 384, 1,536 wellplates. Users can access online reagent data, such as inventory, virtually populate wells with produce instructions perform array manually, assistance liquid handling robot. After completion array, analytical results be uploaded facile evaluation, guide next series All metadata, are stored machine-readable formats readily translatable various software. We also demonstrate use discovery several chemistries, including identification low micromolar inhibitor SARS-CoV-2 main protease. Furthermore, been made available free academic 24- 96-well via interface.

Language: Английский

Citations

30

Exploring Chemical Reaction Space with Machine Learning Models: Representation and Feature Perspective DOI

Yuheng Ding,

Bo Qiang, Qixuan Chen

et al.

Journal of Chemical Information and Modeling, Journal Year: 2024, Volume and Issue: 64(8), P. 2955 - 2970

Published: March 15, 2024

Chemical reactions serve as foundational building blocks for organic chemistry and drug design. In the era of large AI models, data-driven approaches have emerged to innovate design novel reactions, optimize existing ones higher yields, discover new pathways synthesizing chemical structures comprehensively. To effectively address these challenges with machine learning it is imperative derive robust informative representations or engage in feature engineering using extensive data sets reactions. This work aims provide a comprehensive review established reaction featurization approaches, offering insights into selection features wide array tasks. The advantages limitations employing SMILES, molecular fingerprints, graphs, physics-based properties are meticulously elaborated. Solutions bridge gap between different will also be critically evaluated. Additionally, we introduce frontier pretraining, holding promise an innovative yet unexplored avenue.

Language: Английский

Citations

10

Recommending reaction conditions with label ranking DOI Creative Commons
Eunjae Shim, Ambuj Tewari, Tim Cernak

et al.

Chemical Science, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 1, 2025

Label ranking is introduced as a conceptually new means for prioritizing experiments. Their simplicity, ease of application, and the use aggregation facilitate their ability to make accurate predictions with small datasets.

Language: Английский

Citations

1

High‐Throughput Experimentation and Machine Learning‐Assisted Optimization of Iridium‐Catalyzed Cross‐Dimerization of Sulfoxonium Ylides DOI Open Access

Yougen Xu,

Ya-Dong Gao,

Lebin Su

et al.

Angewandte Chemie International Edition, Journal Year: 2023, Volume and Issue: 62(48)

Published: Oct. 10, 2023

A novel and convenient approach that combines high-throughput experimentation (HTE) with machine learning (ML) technologies to achieve the first selective cross-dimerization of sulfoxonium ylides via iridium catalysis is presented. variety valuable amide-, ketone-, ester-, N-heterocycle-substituted unsymmetrical E-alkenes are synthesized in good yields high stereoselectivities. This mild method avoids use diazo compounds characterized by simple operation, step-economy, excellent chemoselectivity functional group compatibility. The combined experimental computational studies identify an amide-sulfoxonium ylide as a carbene precursor. Furthermore, comprehensive exploration reaction space also performed (600 reactions) model for yield prediction has been constructed.

Language: Английский

Citations

19

MetaRF: attention-based random forest for reaction yield prediction with a few trails DOI Creative Commons
Kexin Chen, Guangyong Chen, Junyou Li

et al.

Journal of Cheminformatics, Journal Year: 2023, Volume and Issue: 15(1)

Published: April 10, 2023

Artificial intelligence has deeply revolutionized the field of medicinal chemistry with many impressive applications, but success these applications requires a massive amount training samples high-quality annotations, which seriously limits wide usage data-driven methods. In this paper, we focus on reaction yield prediction problem, assists chemists in selecting high-yield reactions new chemical space only few experimental trials. To attack challenge, first put forth MetaRF, an attention-based random forest model specially designed for few-shot prediction, where attention weight is automatically optimized by meta-learning framework and can be quickly adapted to predict performance reagents while given additional samples. improve learning performance, further introduce dimension-reduction based sampling method determine valuable experimentally tested then learned. Our methodology evaluated three different datasets acquires satisfactory prediction. high-throughput experimentation (HTE) datasets, average our methodology's top 10 relatively close results ideal selection.

Language: Английский

Citations

17

Machine Learning Strategies for Reaction Development: Toward the Low-Data Limit DOI
Eunjae Shim, Ambuj Tewari, Tim Cernak

et al.

Journal of Chemical Information and Modeling, Journal Year: 2023, Volume and Issue: 63(12), P. 3659 - 3668

Published: June 14, 2023

Machine learning models are increasingly being utilized to predict outcomes of organic chemical reactions. A large amount reaction data is used train these models, which in stark contrast how expert chemists discover and develop new reactions by leveraging information from a small number relevant transformations. Transfer active two strategies that can operate low-data situations, may help fill this gap promote the use machine for tackling real-world challenges synthesis. This Perspective introduces transfer connects potential opportunities directions further research, especially area prospective development

Language: Английский

Citations

17