Nature Catalysis, Journal Year: 2024, Volume and Issue: unknown
Published: Dec. 3, 2024
Language: Английский
Nature Catalysis, Journal Year: 2024, Volume and Issue: unknown
Published: Dec. 3, 2024
Language: Английский
Digital Discovery, Journal Year: 2024, Volume and Issue: 3(5), P. 1058 - 1067
Published: Jan. 1, 2024
A generic machine learning model validation method named extrapolation (EV) has been proposed, which evaluates the trustworthiness of predictions to mitigate risk before transitions applications.
Language: Английский
Citations
6Beilstein Journal of Organic Chemistry, Journal Year: 2024, Volume and Issue: 20, P. 2476 - 2492
Published: Oct. 4, 2024
This review surveys the recent advances and challenges in predicting optimizing reaction conditions using machine learning techniques. The paper emphasizes importance of acquiring processing large diverse datasets chemical reactions, use both global local models to guide design synthetic processes. Global exploit information from comprehensive databases suggest general for new while fine-tune specific parameters a given family improve yield selectivity. also identifies current limitations opportunities this field, such as data quality availability, integration high-throughput experimentation. demonstrates how combination engineering, science, ML algorithms can enhance efficiency effectiveness design, enable novel discoveries chemistry.
Language: Английский
Citations
5Journal of the American Chemical Society, Journal Year: 2024, Volume and Issue: 146(22), P. 15070 - 15084
Published: May 20, 2024
Despite the increased use of computational tools to supplement medicinal chemists' expertise and intuition in drug design, predicting synthetic yields chemistry endeavors remains an unsolved challenge. Existing design workflows could profoundly benefit from reaction yield prediction, as precious material waste be reduced, a greater number relevant compounds delivered advance make, test, analyze (DMTA) cycle. In this work, we detail evaluation AbbVie's library data set build machine learning models for prediction Suzuki coupling yields. The combination density functional theory (DFT)-derived features Morgan fingerprints was identified perform better than one-hot encoded baseline modeling, furnishing encouraging results. Overall, observe modest generalization unseen reactant structures within 15-year retrospective set. Additionally, compare predictions made by model those expert chemists, finding that can often predict both success with accuracy. Finally, demonstrate application approach suggest structurally electronically similar building blocks replace predicted or observed unsuccessful prior after synthesis, respectively. used select monomers have higher yields, resulting synthesis efficiency drug-like molecules.
Language: Английский
Citations
4Journal of the American Chemical Society, Journal Year: 2024, Volume and Issue: 146(24), P. 16375 - 16380
Published: June 5, 2024
The rate of frontal ring-opening metathesis polymerization (FROMP) using the Grubbs generation II catalyst is impacted by both concentration and choice monomers inhibitors, usually organophosphorus derivatives. Herein we report a data-science-driven workflow to evaluate how these factors impact FROMP long formulation mixture stable (pot life). Using this workflow, built classification model single-node decision tree determine simple phosphine structural descriptor (Vbur-near) can bin versus short pot life. Additionally, applied nonlinear kernel ridge regression predict inhibitor selection/concentration comonomers rate. analysis provides selection criteria for material network structures that span from highly cross-linked thermosets non-cross-linked thermoplastics as well degradable nondegradable materials.
Language: Английский
Citations
4Tetrahedron, Journal Year: 2025, Volume and Issue: 174, P. 134498 - 134498
Published: Jan. 25, 2025
Language: Английский
Citations
0ACS Catalysis, Journal Year: 2025, Volume and Issue: unknown, P. 6067 - 6077
Published: March 31, 2025
Language: Английский
Citations
0Nature Synthesis, Journal Year: 2025, Volume and Issue: unknown
Published: April 7, 2025
Language: Английский
Citations
0Nature Communications, Journal Year: 2025, Volume and Issue: 16(1)
Published: April 15, 2025
Abstract Transition metal-catalyzed asymmetric reactions are of high contemporary importance in organic synthesis. Recently, machine learning (ML) has shown promise accelerating the development newer catalytic protocols. However, need for large amount experimental data can present a bottleneck implementing ML models. Here, we propose meta-learning workflow that harness literature-derived to extract shared reaction features and requires only few examples predict outcome new reactions. Prototypical networks used as method enantioselectivity hydrogenation olefins. This model consistently provides significant performance improvement over other popular methods such random forests graph neural networks. The our meta-model is analyzed with varying sizes training demonstrate its utility even limited data. A good on an out-of-sample test set further indicates general applicability approach. We believe this work will provide leap forward identifying promising early phases when minimal available.
Language: Английский
Citations
0Journal of Molecular Modeling, Journal Year: 2025, Volume and Issue: 31(5)
Published: April 23, 2025
Language: Английский
Citations
0Journal of Chemical Information and Modeling, Journal Year: 2025, Volume and Issue: unknown
Published: May 1, 2025
Predicting reaction yields in synthetic chemistry remains a significant challenge. This study systematically evaluates the impact of tokenization, molecular representation, pretraining data, and adversarial training on BERT-based model for yield prediction Buchwald-Hartwig Suzuki-Miyaura coupling reactions using publicly available HTE data sets. We demonstrate that representation choice (SMILES, DeepSMILES, SELFIES, Morgan fingerprint-based notation, IUPAC names) has minimal performance, while typically BPE SentencePiece tokenization outperform other methods. WordPiece is strongly discouraged SELFIES notation. Furthermore, with relatively small sets (<100 K reactions) achieves comparable performance to larger containing millions examples. The use artificially generated domain-specific proposed. prove be good surrogate schemes extracted from such as Pistachio or Reaxys. best was observed hybrid combining real domain-specific, artificial data. Finally, we show novel approach, perturbing input embeddings dynamically, improves robustness generalizability success prediction. These findings provide valuable insights developing robust practical machine learning models chemistry. GSK's BERT code base made community this work.
Language: Английский
Citations
0