Explaining compound activity predictions with a substructure-aware loss for graph neural networks DOI Creative Commons
Kenza Amara, Raquel Rodríguez-Pérez, José Jiménez-Luna

et al.

Journal of Cheminformatics, Journal Year: 2023, Volume and Issue: 15(1)

Published: July 25, 2023

Abstract Explainable machine learning is increasingly used in drug discovery to help rationalize compound property predictions. Feature attribution techniques are popular choices identify which molecular substructures responsible for a predicted change. However, established feature methods have so far displayed low performance deep algorithms such as graph neural networks (GNNs), especially when compared with simpler modeling alternatives random forests coupled atom masking. To mitigate this problem, modification of the regression objective GNNs proposed specifically account common core structures between pairs molecules. The presented approach shows higher accuracy on recently-proposed explainability benchmark. This methodology has potential assist model pipelines, particularly lead optimization efforts where specific chemical series investigated.

Language: Английский

Machine Learning-Guided Protein Engineering DOI Creative Commons
Petr Kouba, Pavel Kohout, Faraneh Haddadi

et al.

ACS Catalysis, Journal Year: 2023, Volume and Issue: 13(21), P. 13863 - 13895

Published: Oct. 13, 2023

Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid the discovery annotation of enzymes, as well suggesting beneficial mutations for improving known targets. The field protein is gathering steam, driven by recent success stories notable other areas. It already encompasses ambitious tasks such understanding predicting structure function, catalytic efficiency, enantioselectivity, dynamics, stability, solubility, aggregation, more. Nonetheless, still evolving, with many challenges overcome questions address. In this Perspective, we provide an overview ongoing trends domain, highlight case studies, examine current limitations learning-based We emphasize crucial importance thorough validation emerging models before their use rational design. present our opinions on fundamental problems outline potential directions future research.

Language: Английский

Citations

90

Machine learning in preclinical drug discovery DOI

Denise B. Catacutan,

Jeremie Alexander,

Autumn Arnold

et al.

Nature Chemical Biology, Journal Year: 2024, Volume and Issue: 20(8), P. 960 - 973

Published: July 19, 2024

Language: Английский

Citations

41

Advancing material property prediction: using physics-informed machine learning models for viscosity DOI Creative Commons
Alex K. Chew,

Matthew Sender,

Zachary Kaplan

et al.

Journal of Cheminformatics, Journal Year: 2024, Volume and Issue: 16(1)

Published: March 14, 2024

Abstract In materials science, accurately computing properties like viscosity, melting point, and glass transition temperatures solely through physics-based models is challenging. Data-driven machine learning (ML) also poses challenges in constructing ML models, especially the material science domain where data limited. To address this, we integrate physics-informed descriptors from molecular dynamics (MD) simulations to enhance accuracy interpretability of models. Our current study focuses on predicting viscosity liquid systems using MD descriptors. this work, curated a comprehensive dataset over 4000 small organic molecules’ viscosities scientific literature, publications, online databases. This enabled us develop quantitative structure–property relationships (QSPR) consisting descriptor-based graph neural network predict temperature-dependent for wide range viscosities. The QSPR reveal that including improves prediction experimental viscosities, particularly at set scale fewer than thousand points. Furthermore, feature importance tools intermolecular interactions captured by are most important predictions. Finally, can capture inverse relationship between temperature six battery-relevant solvents, some which were not included original set. research highlights effectiveness incorporating into leads improved difficult when alone or limited available. Graphical

Language: Английский

Citations

23

Active learning of the thermodynamics-dynamics trade-off in protein condensates DOI Creative Commons
Yaxin An, Michael Webb, William M. Jacobs

et al.

Science Advances, Journal Year: 2024, Volume and Issue: 10(1)

Published: Jan. 5, 2024

Phase-separated biomolecular condensates exhibit a wide range of dynamic properties, which depend on the sequences constituent proteins and RNAs. However, it is unclear to what extent condensate dynamics can be tuned without also changing thermodynamic properties that govern phase separation. Using coarse-grained simulations intrinsically disordered proteins, we show thermodynamics homopolymer are strongly correlated, with increased stability being coincident low mobilities high viscosities. We then apply an “active learning” strategy identify heteropolymer break this correlation. This data-driven approach accompanying analysis reveal how heterogeneous amino acid compositions nonuniform sequence patterning map independently tunable condensates. Our results highlight key molecular determinants governing physical establish design rules for development stimuli-responsive biomaterials.

Language: Английский

Citations

22

Surface-Enhanced Raman Spectroscopy for Biomedical Applications: Recent Advances and Future Challenges DOI Creative Commons
Li Lin, Ramón A. Álvarez‐Puebla, Luis M. Liz‐Marzán

et al.

ACS Applied Materials & Interfaces, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 24, 2025

The year 2024 marks the 50th anniversary of discovery surface-enhanced Raman spectroscopy (SERS). Over recent years, SERS has experienced rapid development and became a critical tool in biomedicine with its unparalleled sensitivity molecular specificity. This review summarizes advancements challenges substrates, nanotags, instrumentation, spectral analysis for biomedical applications. We highlight key developments colloidal solid an emphasis on surface chemistry, hotspot design, 3D hydrogel plasmonic architectures. Additionally, we introduce innovations including those interior gaps, orthogonal reporters, near-infrared-II-responsive properties, along biomimetic coatings. Emerging technologies such as optical tweezers, nanopores, wearable sensors have expanded capabilities single-cell single-molecule analysis. Advances analysis, signal digitalization, denoising, deep learning algorithms, improved quantification complex biological data. Finally, this discusses applications nucleic acid detection, protein characterization, metabolite monitoring, vivo spectroscopy, emphasizing potential liquid biopsy, metabolic phenotyping, extracellular vesicle diagnostics. concludes perspective clinical translation SERS, addressing commercialization potentials tissue sensing imaging.

Language: Английский

Citations

3

A Review of Large Language Models and Autonomous Agents in Chemistry DOI Creative Commons
Mayk Caldas Ramos, Christopher J. Collison, Andrew Dickson White

et al.

Chemical Science, Journal Year: 2024, Volume and Issue: unknown

Published: Dec. 9, 2024

Large language models (LLMs) have emerged as powerful tools in chemistry, significantly impacting molecule design, property prediction, and synthesis optimization. This review highlights LLM capabilities these domains their potential to accelerate scientific discovery through automation. We also LLM-based autonomous agents: LLMs with a broader set of interact surrounding environment. These agents perform diverse tasks such paper scraping, interfacing automated laboratories, planning. As are an emerging topic, we extend the scope our beyond chemistry discuss across any domains. covers recent history, current capabilities, design agents, addressing specific challenges, opportunities, future directions chemistry. Key challenges include data quality integration, model interpretability, need for standard benchmarks, while point towards more sophisticated multi-modal enhanced collaboration between experimental methods. Due quick pace this field, repository has been built keep track latest studies: https://github.com/ur-whitelab/LLMs-in-science.

Language: Английский

Citations

12

Introduction to Predicting Properties of Organic Materials DOI
Didier Mathieu

Challenges and advances in computational chemistry and physics, Journal Year: 2025, Volume and Issue: unknown, P. 27 - 63

Published: Jan. 1, 2025

Language: Английский

Citations

1

Comparing Explanations of Molecular Machine Learning Models Generated with Different Methods for the Calculation of Shapley Values DOI Creative Commons
Alec Lamens, Jürgen Bajorath

Molecular Informatics, Journal Year: 2025, Volume and Issue: 44(3)

Published: March 1, 2025

Feature attribution methods from explainable artificial intelligence (XAI) provide explanations of machine learning models by quantifying feature importance for predictions test instances. While features determining individual have frequently been identified in applications, the consistency importance-based using different has not thoroughly investigated. We systematically compared model molecular learning. Therefore, a system highly accurate compound activity targets was generated. For these predictions, were computed methodological variants Shapley value formalism, popular approach adapted game theory. Predictions each assessed model-agnostic and model-specific value-based method. The resulting distributions characterized global statistical analysis diverse measures. Unexpectedly, calculations yielded distinct predictions. There only little agreement between alternative explanations. Our findings suggest that should include an assessment methods.

Language: Английский

Citations

1

Identifying Substructures That Facilitate Compounds to Penetrate the Blood–Brain Barrier via Passive Transport Using Machine Learning Explainer Models DOI Creative Commons

Lucca Caiaffa Santos Rosa,

Caio Oliveira Argolo,

Cayque Monteiro Castro Nascimento

et al.

ACS Chemical Neuroscience, Journal Year: 2024, Volume and Issue: 15(11), P. 2144 - 2159

Published: May 9, 2024

The local interpretable model-agnostic explanation (LIME) method was used to interpret two machine learning models of compounds penetrating the blood–brain barrier. classification models, Random Forest, ExtraTrees, and Deep Residual Network, were trained validated using barrier penetration dataset, which shows penetrability in LIME able create explanations for such penetrability, highlighting most important substructures molecules that affect drug simple intuitive outputs prove applicability this explainable model interpreting permeability across terms molecular features. filtered with a weight equal or greater than 0.1 obtain only relevant explanations. results showed several structures are penetration. In general, it found some nitrogenous more likely permeate application these structural may help pharmaceutical industry potential synthesis research groups synthesize active rationally.

Language: Английский

Citations

8

Integrating Explainability into Graph Neural Network Models for the Prediction of X-ray Absorption Spectra DOI Creative Commons
Amir Kotobi, Kanishka Singh, Daniel Höche

et al.

Journal of the American Chemical Society, Journal Year: 2023, Volume and Issue: 145(41), P. 22584 - 22598

Published: Oct. 9, 2023

The use of sophisticated machine learning (ML) models, such as graph neural networks (GNNs), to predict complex molecular properties or all kinds spectra has grown rapidly. However, ensuring the interpretability these models' predictions remains a challenge. For example, rigorous understanding predicted X-ray absorption spectrum (XAS) generated by ML models requires an in-depth investigation respective black-box model used. Here, this is done for different GNNs based on comprehensive, custom-generated XAS data set small organic molecules. We show that thorough analysis with respect local and global environments considered in each essential selection appropriate allows robust prediction. Moreover, we employ feature attribution determine contributions various atoms molecules peaks observed spectrum. By comparing peak assignment core virtual orbitals from quantum chemical calculations underlying our set, demonstrate it possible relate atomic via

Language: Английский

Citations

14