GEMS: A Generalizable GNN Framework For Protein-Ligand Binding Affinity Prediction Through Robust Data Filtering and Language Model Integration DOI Creative Commons

David Graber,

Peter Stockinger, Fabian Meyer

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Дек. 11, 2024

The field of computational drug design requires accurate scoring functions to predict binding affinities for protein-ligand interactions. However, train-test data leakage between the PDBbind database and CASF benchmark datasets has significantly inflated performance metrics currently available deep-learning-based affinity prediction models, leading overestimation their generalization capabilities. We address this issue by proposing CleanSplit, a training dataset curated novel structure-based filtering algorithm that eliminates as well redundancies within set. Retraining current best-performing model on CleanSplit caused its drop uncompetitive levels, indicating existing models is largely driven leakage. In contrast, our graph neural network efficient molecular (GEMS) maintains high when trained CleanSplit. Leveraging sparse modeling interactions transfer learning from language GEMS able generalize strictly independent test datasets.

Язык: Английский

Convergent Protocols for Computing Protein–Ligand Interaction Energies Using Fragment-Based Quantum Chemistry DOI
Paige E. Bowling, Dustin R. Broderick, John M. Herbert

и другие.

Journal of Chemical Theory and Computation, Год журнала: 2025, Номер unknown

Опубликована: Янв. 2, 2025

Fragment-based quantum chemistry methods offer a means to sidestep the steep nonlinear scaling of electronic structure calculations so that large molecular systems can be investigated using high-level methods. Here, we use fragmentation compute protein-ligand interaction energies in with several thousand atoms, new software platform for managing fragment-based implements screened many-body expansion. Convergence tests minimal-basis semiempirical method (HF-3c) indicate two-body calculations, single-residue fragments and simple hydrogen caps, are sufficient reproduce obtained conventional supramolecular within 1 kcal/mol at about 1% computational cost. We also demonstrate HF-3c results illustrative trends density functional theory basis sets up augmented quadruple-ζ quality. Strategic deployment facilitates converged biomolecular model alongside high-quality sets, bringing

Язык: Английский

Процитировано

4

Transformers for Molecular Property Prediction: Lessons Learned from the Past Five Years DOI
A.R. Sultan, Jochen Sieg, Miriam Mathea

и другие.

Journal of Chemical Information and Modeling, Год журнала: 2024, Номер 64(16), С. 6259 - 6280

Опубликована: Авг. 13, 2024

Molecular Property Prediction (MPP) is vital for drug discovery, crop protection, and environmental science. Over the last decades, diverse computational techniques have been developed, from using simple physical chemical properties molecular fingerprints in statistical models classical machine learning to advanced deep approaches. In this review, we aim distill insights current research on employing transformer MPP. We analyze currently available explore key questions that arise when training fine-tuning a model These encompass choice scale of pretraining data, optimal architecture selections, promising objectives. Our analysis highlights areas not yet covered research, inviting further exploration enhance field's understanding. Additionally, address challenges comparing different models, emphasizing need standardized data splitting robust analysis.

Язык: Английский

Процитировано

7

Augmented BindingNet dataset for enhanced ligand binding pose predictions using deep learning DOI Creative Commons
Hui Zhu, Xuelian Li, Baoquan Chen

и другие.

npj Drug Discovery., Год журнала: 2025, Номер 2(1)

Опубликована: Янв. 8, 2025

High-quality data on protein-ligand complex structures and binding affinities are crucial for structure-based drug design. Existing datasets often lack diversity quantity, limiting the comprehensive understanding of interactions. Here, we present BindingNet v2, an expanded dataset comprising 689,796 modeled complexes across 1794 protein targets. Constructed using enhanced template-based modeling workflow from v1, it incorporates pharmacophore molecular shape similarities. v2's effectiveness in pose generation was evaluated, showing improved generalization ability Uni-Mol model novel ligands. The success rate PoseBusters increased 38.55% with PDBbind alone to 64.25% augmenting v2. Coupled physics-based refinement, rose 74.07%, passing validity checks. These results highlight value larger, diverse enhancing accuracy reliability deep learning models prediction.

Язык: Английский

Процитировано

0

GRADE and X-GRADE: Unveiling Novel Protein–Ligand Interaction Fingerprints Based on GRAIL Scores DOI Creative Commons

C Fellinger,

Thomas Seidel, Benjamin Merget

и другие.

Journal of Chemical Information and Modeling, Год журнала: 2025, Номер unknown

Опубликована: Фев. 20, 2025

Nonbonding molecular interactions, such as hydrogen bonding, hydrophobic contacts, ionic etc., are at the heart of many biological processes, and their appropriate treatment is essential for successful application numerous computational drug design methods. This paper introduces GRADE, a novel interaction fingerprint (IFP) descriptor that quantifies these interactions using floating point values derived from GRAIL scores, encoding both presence quality interactions. GRADE available in two versions: basic 35-element variant an extended 177-element variant. Three case studies demonstrate GRADE's utility: (1) dimensionality reduction visualizing chemical space protein–ligand complexes Uniform Manifold Approximation Projection (UMAP), showing competitive performance with complex descriptors; (2) binding affinity prediction, where achieved reasonable accuracy minimal machine learning optimization; (3) three-dimensional-quantitative structure–activity relationship (3D-QSAR) modeling specific protein target, enhanced Morgan Fingerprints.

Язык: Английский

Процитировано

0

Predicting the Binding of Small Molecules to Proteins through Invariant Representation of the Molecular Structure DOI
Roberta Beccaria, Andrea Lazzeri, Guido Tiana

и другие.

Journal of Chemical Information and Modeling, Год журнала: 2024, Номер 64(17), С. 6758 - 6767

Опубликована: Авг. 28, 2024

We present a computational scheme for predicting the ligands that bind to pocket of known structure. It is based on generation general abstract representation molecules, which invariant rotations, translations, and permutations atoms, has some degree isometry with space conformations. use these representations train nondeep machine learning algorithm classify binding between pockets molecule pairs show this approach better generalization capability than existing methods.

Язык: Английский

Процитировано

0

GEMS: A Generalizable GNN Framework For Protein-Ligand Binding Affinity Prediction Through Robust Data Filtering and Language Model Integration DOI Creative Commons

David Graber,

Peter Stockinger, Fabian Meyer

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Дек. 11, 2024

The field of computational drug design requires accurate scoring functions to predict binding affinities for protein-ligand interactions. However, train-test data leakage between the PDBbind database and CASF benchmark datasets has significantly inflated performance metrics currently available deep-learning-based affinity prediction models, leading overestimation their generalization capabilities. We address this issue by proposing CleanSplit, a training dataset curated novel structure-based filtering algorithm that eliminates as well redundancies within set. Retraining current best-performing model on CleanSplit caused its drop uncompetitive levels, indicating existing models is largely driven leakage. In contrast, our graph neural network efficient molecular (GEMS) maintains high when trained CleanSplit. Leveraging sparse modeling interactions transfer learning from language GEMS able generalize strictly independent test datasets.

Язык: Английский

Процитировано

0