Data Efficient Molecular Image Representation Learning using Foundation Models DOI Creative Commons
Yonatan Harnik, Hadas Shalit Peleg, Amit H. Bermano

et al.

Chemical Science, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 1, 2025

A general image foundation model was used as the basis for molecular representation learning, showcasing its benefits in chemical property prediction through a stratified pretraining workflow.

Language: Английский

Applying statistical modeling strategies to sparse datasets in synthetic chemistry DOI Creative Commons
Brittany C. Haas, Dipannita Kalyani, Matthew S. Sigman

et al.

Science Advances, Journal Year: 2025, Volume and Issue: 11(1)

Published: Jan. 1, 2025

The application of statistical modeling in organic chemistry is emerging as a standard practice for probing structure-activity relationships and predictive tool many optimization objectives. This review aimed tutorial those entering the area chemistry. We provide case studies to highlight considerations approaches that can be used successfully analyze datasets low data regimes, common situation encountered given experimental demands Statistical hinges on (what being modeled), descriptors (how are represented), algorithms modeled). Herein, we focus how various reaction outputs (e.g., yield, rate, selectivity, solubility, stability, turnover number) structures binned, heavily skewed, distributed) influence choice algorithm constructing chemically insightful models.

Language: Английский

Citations

3

Enantioselective Sulfonimidamide Acylation via a Cinchona Alkaloid-Catalyzed Desymmetrization: Scope, Data Science, and Mechanistic Investigation DOI
Brittany C. Haas, Ngiap‐Kie Lim, Janis Jermaks

et al.

Journal of the American Chemical Society, Journal Year: 2024, Volume and Issue: 146(12), P. 8536 - 8546

Published: March 13, 2024

Methods to access chiral sulfur(VI) pharmacophores are of interest in medicinal and synthetic chemistry. We report the desymmetrization unprotected sulfonimidamides via asymmetric acylation with a cinchona-phosphinate catalyst. The desired products formed excellent yield enantioselectivity no observed bis-acylation. A data-science-driven approach substrate scope evaluation was coupled high throughput experimentation (HTE) facilitate statistical modeling order inform mechanistic studies. Reaction kinetics, catalyst structural studies, density functional theory (DFT) transition state analysis elucidated turnover-limiting step be collapse tetrahedral intermediate provided key insights into catalyst-substrate structure–activity relationships responsible for origin enantioselectivity. This study offers reliable method accessing enantioenriched propel their application as serves an example insight that can gleaned from integrating data science traditional physical organic techniques.

Language: Английский

Citations

13

Autonomous chemistry: Navigating self-driving labs in chemical and material sciences DOI

Oliver Bayley,

Elia Savino,

Aidan Slattery

et al.

Matter, Journal Year: 2024, Volume and Issue: 7(7), P. 2382 - 2398

Published: July 1, 2024

Language: Английский

Citations

11

Machine learning-guided strategies for reaction conditions design and optimization DOI Creative Commons
Lung-Yi Chen, Yi‐Pei Li

Beilstein Journal of Organic Chemistry, Journal Year: 2024, Volume and Issue: 20, P. 2476 - 2492

Published: Oct. 4, 2024

This review surveys the recent advances and challenges in predicting optimizing reaction conditions using machine learning techniques. The paper emphasizes importance of acquiring processing large diverse datasets chemical reactions, use both global local models to guide design synthetic processes. Global exploit information from comprehensive databases suggest general for new while fine-tune specific parameters a given family improve yield selectivity. also identifies current limitations opportunities this field, such as data quality availability, integration high-throughput experimentation. demonstrates how combination engineering, science, ML algorithms can enhance efficiency effectiveness design, enable novel discoveries chemistry.

Language: Английский

Citations

11

Exploring Chemical Reaction Space with Machine Learning Models: Representation and Feature Perspective DOI

Yuheng Ding,

Bo Qiang, Qixuan Chen

et al.

Journal of Chemical Information and Modeling, Journal Year: 2024, Volume and Issue: 64(8), P. 2955 - 2970

Published: March 15, 2024

Chemical reactions serve as foundational building blocks for organic chemistry and drug design. In the era of large AI models, data-driven approaches have emerged to innovate design novel reactions, optimize existing ones higher yields, discover new pathways synthesizing chemical structures comprehensively. To effectively address these challenges with machine learning it is imperative derive robust informative representations or engage in feature engineering using extensive data sets reactions. This work aims provide a comprehensive review established reaction featurization approaches, offering insights into selection features wide array tasks. The advantages limitations employing SMILES, molecular fingerprints, graphs, physics-based properties are meticulously elaborated. Solutions bridge gap between different will also be critically evaluated. Additionally, we introduce frontier pretraining, holding promise an innovative yet unexplored avenue.

Language: Английский

Citations

10

Designing Target-specific Data Sets for Regioselectivity Predictions on Complex Substrates DOI Creative Commons
Jules Schleinitz, Alba Carretero‐Cerdán, Anjali Gurajapu

et al.

Journal of the American Chemical Society, Journal Year: 2025, Volume and Issue: 147(9), P. 7476 - 7484

Published: Feb. 21, 2025

The development of machine learning models to predict the regioselectivity C(sp3)-H functionalization reactions is reported. A data set for dioxirane oxidations was curated from literature and used generate a model C-H oxidation. To assess whether smaller, intentionally designed sets could provide accuracy on complex targets, series acquisition functions were developed select most informative molecules specific target. Active learning-based that leverage predicted reactivity uncertainty found outperform those based molecular site similarity alone. use elaboration significantly reduced number points needed perform accurate prediction, it machine-designed can give predictions when larger, randomly selected fail. Finally, workflow experimentally validated five substrates shown be applicable predicting arene radical borylation. These studies quantitative alternative intuitive extrapolation "model substrates" frequently estimate molecules.

Language: Английский

Citations

2

Standardizing Substrate Selection: A Strategy toward Unbiased Evaluation of Reaction Generality DOI Creative Commons

Debanjan Rana,

Philipp M. Pflüger,

Niklas Hölter

et al.

ACS Central Science, Journal Year: 2024, Volume and Issue: unknown

Published: April 8, 2024

With over 10,000 new reaction protocols arising every year, only a handful of these procedures transition from academia to application. A major reason for this gap stems the lack comprehensive knowledge about reaction's scope, i.e., which substrates protocol can or cannot be applied. Even though chemists invest substantial effort assess scope protocols, resulting tables involve significant biases, reducing their expressiveness. Herein we report standardized substrate selection strategy designed mitigate biases and evaluate applicability, as well limits, any chemical reaction. Unsupervised learning is utilized map space industrially relevant molecules. Subsequently, potential candidates are projected onto universal map, enabling structurally diverse set with optimal relevance coverage. By testing our methodology on different reactions, were able demonstrate its effectiveness in finding general reactivity trends by using few highly representative examples. The developed empowers showcase unbiased applicability novel methodologies, facilitating practical applications. We hope that work will trigger interdisciplinary discussions synthetic chemistry, leading improved data quality.

Language: Английский

Citations

9

Incorporating Synthetic Accessibility in Drug Design: Predicting Reaction Yields of Suzuki Cross-Couplings by Leveraging AbbVie’s 15-Year Parallel Library Data Set DOI Creative Commons
Priyanka Raghavan, Alexander J. Rago, Pritha Verma

et al.

Journal of the American Chemical Society, Journal Year: 2024, Volume and Issue: 146(22), P. 15070 - 15084

Published: May 20, 2024

Despite the increased use of computational tools to supplement medicinal chemists' expertise and intuition in drug design, predicting synthetic yields chemistry endeavors remains an unsolved challenge. Existing design workflows could profoundly benefit from reaction yield prediction, as precious material waste be reduced, a greater number relevant compounds delivered advance make, test, analyze (DMTA) cycle. In this work, we detail evaluation AbbVie's library data set build machine learning models for prediction Suzuki coupling yields. The combination density functional theory (DFT)-derived features Morgan fingerprints was identified perform better than one-hot encoded baseline modeling, furnishing encouraging results. Overall, observe modest generalization unseen reactant structures within 15-year retrospective set. Additionally, compare predictions made by model those expert chemists, finding that can often predict both success with accuracy. Finally, demonstrate application approach suggest structurally electronically similar building blocks replace predicted or observed unsuccessful prior after synthesis, respectively. used select monomers have higher yields, resulting synthesis efficiency drug-like molecules.

Language: Английский

Citations

9

Developing BioNavi for Hybrid Retrosynthesis Planning DOI Creative Commons
Tao Zeng, Zhehao Jin, Shuangjia Zheng

et al.

JACS Au, Journal Year: 2024, Volume and Issue: 4(7), P. 2492 - 2502

Published: July 3, 2024

Illuminating synthetic pathways is essential for producing valuable chemicals, such as bioactive molecules. Chemical and biological syntheses are crucial, their integration often leads to more efficient sustainable pathways. Despite the rapid development of retrosynthesis models, few them consider both chemical syntheses, hindering pathway design high-value chemicals. Here, we propose BioNavi by innovating multitask learning reaction templates into deep learning-driven model hybrid synthesis in a interpretable manner. outperforms existing approaches on different data sets, achieving 75% hit rate replicating reported biosynthetic displaying superior ability designing Additional case studies further illustrate potential application de novo design. The enhanced web server (http://biopathnavi.qmclab.com/bionavi/) simplifies input operations implements step-by-step exploration according user experience. We show that handy navigator various

Language: Английский

Citations

9

Development of High-Throughput Experimentation Approaches for Rapid Radiochemical Exploration DOI
Eric Webb, Kevin Cheng, Wade P. Winton

et al.

Journal of the American Chemical Society, Journal Year: 2024, Volume and Issue: 146(15), P. 10581 - 10590

Published: April 5, 2024

Positron emission tomography is a widely used imaging platform for studying physiological processes. Despite the proliferation of modern synthetic methodologies radiolabeling, optimization these reactions still primarily relies on inefficient one-factor-at-a-time approaches. High-throughput experimentation (HTE) has proven to be powerful approach optimizing in many areas chemical synthesis. However, date, HTE rarely been applied radiochemistry. This largely because short lifetime common radioisotopes, which presents major challenges efficient parallel reaction setup and analysis using standard equipment workflows. Herein, we demonstrate an effective workflow apply it copper-mediated radiofluorination pharmaceutically relevant boronate ester substrates. The utilizes commercial allows rapid reactions, exploring space aryl boronates radiofluorinations, constructing large radiochemistry data sets.

Language: Английский

Citations

7