Benchmarking protein language models for protein crystallization DOI Creative Commons
Raghvendra Mall, Rahul Kaushik, Zachary A. Martinez

и другие.

Scientific Reports, Год журнала: 2025, Номер 15(1)

Опубликована: Янв. 18, 2025

Abstract The problem of protein structure determination is usually solved by X-ray crystallography. Several in silico deep learning methods have been developed to overcome the high attrition rate, cost experiments and extensive trial-and-error settings, for predicting crystallization propensities proteins based on their sequences. In this work, we benchmark power open language models (PLMs) through TRILL platform, a be-spoke framework democratizing usage PLMs task proteins. By comparing LightGBM / XGBoost classifiers built average embedding representations learned different PLMs, such as ESM2, Ankh, ProtT5-XL, ProstT5, xTrimoPGLM, SaProt with performance state-of-the-art sequence-based like DeepCrystal, ATTCrys CLPred, identify most effective outcomes. utilizing embeddings from ESM2 model 30 36 transformer layers 150 3000 million parameters respectively gains 3- $$5\%$$ than all compared various evaluation metrics, including AUPR (Area Under Precision-Recall Curve), AUC Receiver Operating Characteristic F1 independent test sets. Furthermore, fine-tune ProtGPT2 available via generate crystallizable Starting generated step filtration processes consensus PLM-based classifiers, sequence identity CD-HIT, secondary compatibility, aggregation screening, homology search foldability evaluation, identified set 5 novel potentially crystallizable.

Язык: Английский

Machine learning for functional protein design DOI
Pascal Notin, Nathan Rollins, Yarin Gal

и другие.

Nature Biotechnology, Год журнала: 2024, Номер 42(2), С. 216 - 228

Опубликована: Фев. 1, 2024

Язык: Английский

Процитировано

98

Opportunities and Challenges for Machine Learning-Assisted Enzyme Engineering DOI Creative Commons
Jason Yang, Francesca-Zhoufan Li, Frances H. Arnold

и другие.

ACS Central Science, Год журнала: 2024, Номер 10(2), С. 226 - 241

Опубликована: Фев. 5, 2024

Enzymes can be engineered at the level of their amino acid sequences to optimize key properties such as expression, stability, substrate range, and catalytic efficiency-or even unlock new activities not found in nature. Because search space possible proteins is vast, enzyme engineering usually involves discovering an starting point that has some desired activity followed by directed evolution improve its "fitness" for a application. Recently, machine learning (ML) emerged powerful tool complement this empirical process. ML models contribute (1) discovery functional annotation known protein or generating novel with functions (2) navigating fitness landscapes optimization mappings between associated values. In Outlook, we explain how complements discuss future potential improved outcomes.

Язык: Английский

Процитировано

78

Atomic context-conditioned protein sequence design using LigandMPNN DOI Creative Commons
Justas Dauparas, Gyu Rie Lee, Robert Pecoraro

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2023, Номер unknown

Опубликована: Дек. 23, 2023

Abstract Protein sequence design in the context of small molecules, nucleotides, and metals is critical to enzyme molecule binder sensor design, but current state-of-the-art deep learning-based methods are unable model non-protein atoms molecules. Here, we describe a protein method called LigandMPNN that explicitly models all components biomolecular systems. significantly outperforms Rosetta ProteinMPNN on native backbone recovery for residues interacting with molecules (63.3% vs. 50.4% & 50.5%), nucleotides (50.5% 35.2% 34.0%), (77.5% 36.0% 40.6%). generates not only sequences also sidechain conformations allow detailed evaluation binding interactions. Experimental characterization demonstrates can generate DNA-binding proteins high affinity specificity. One-sentence summary We present allows explicit modeling molecule, nucleotide, metal, other atomic contexts.

Язык: Английский

Процитировано

49

Sparks of function by de novo protein design DOI
Alexander E. Chu, Tianyu Lu, Po‐Ssu Huang

и другие.

Nature Biotechnology, Год журнала: 2024, Номер 42(2), С. 203 - 215

Опубликована: Фев. 1, 2024

Язык: Английский

Процитировано

33

AI for targeted polypharmacology: The next frontier in drug discovery DOI
Anna Cichońska, Balaguru Ravikumar, Rayees Rahman

и другие.

Current Opinion in Structural Biology, Год журнала: 2024, Номер 84, С. 102771 - 102771

Опубликована: Янв. 11, 2024

Язык: Английский

Процитировано

28

De novo design of drug-binding proteins with predictable binding energy and specificity DOI
Lei Lü, Xuxu Gou, Sophia K. Tan

и другие.

Science, Год журнала: 2024, Номер 384(6691), С. 106 - 112

Опубликована: Апрель 4, 2024

The de novo design of small molecule-binding proteins has seen exciting recent progress; however, high-affinity binding and tunable specificity typically require laborious screening optimization after computational design. We developed a procedure to protein that recognizes common pharmacophore in series poly(ADP-ribose) polymerase-1 inhibitors. One three designed bound different inhibitors with affinities ranging from <5 nM low micromolar. X-ray crystal structures confirmed the accuracy protein-drug interactions. Molecular dynamics simulations informed role water binding. Binding free energy calculations performed directly on models were excellent agreement experimentally measured affinities. conclude tuned interaction energies is feasible entirely computation.

Язык: Английский

Процитировано

24

Structure-based virtual screening of vast chemical space as a starting point for drug discovery DOI Creative Commons
Jens Carlsson, Andreas Luttens

Current Opinion in Structural Biology, Год журнала: 2024, Номер 87, С. 102829 - 102829

Опубликована: Июнь 6, 2024

Structure-based virtual screening aims to find molecules forming favorable interactions with a biological macromolecule using computational models of complexes. The recent surge commercially available chemical space provides the opportunity search for ligands therapeutic targets among billions compounds. This review offers compact overview structure-based screens vast spaces, highlighting successful applications in early drug discovery therapeutically important such as G protein-coupled receptors and viral enzymes. Emphasis is placed on strategies explore ultra-large libraries synergies emerging machine learning techniques. current opportunities future challenges are discussed, indicating that this approach will play an role next-generation pipeline.

Язык: Английский

Процитировано

22

Leveraging machine learning models for peptide–protein interaction prediction DOI Creative Commons
Yin Song, Xuenan Mi, Diwakar Shukla

и другие.

RSC Chemical Biology, Год журнала: 2024, Номер 5(5), С. 401 - 417

Опубликована: Янв. 1, 2024

A timeline showcasing the progress of machine learning and deep methods for peptide–protein interaction predictions.

Язык: Английский

Процитировано

21

Advancing Ligand Docking through Deep Learning: Challenges and Prospects in Virtual Screening DOI
Xujun Zhang, Chao Shen, Haotian Zhang

и другие.

Accounts of Chemical Research, Год журнала: 2024, Номер 57(10), С. 1500 - 1509

Опубликована: Апрель 5, 2024

ConspectusMolecular docking, also termed ligand docking (LD), is a pivotal element of structure-based virtual screening (SBVS) used to predict the binding conformations and affinities protein–ligand complexes. Traditional LD methodologies rely on search scoring framework, utilizing heuristic algorithms explore functions evaluate strengths. However, meet efficiency demands SBVS, these are often simplified, prioritizing speed over accuracy.The emergence deep learning (DL) has exerted profound impact diverse fields, ranging from natural language processing computer vision drug discovery. DeepMind's AlphaFold2 impressively exhibited its ability accurately protein structures solely amino acid sequences, highlighting remarkable potential DL in conformation prediction. This groundbreaking advancement circumvents traditional search-scoring frameworks LD, enhancing both accuracy thereby catalyzing broader adoption pose Nevertheless, consensus certain aspects remains elusive.In this Account, we delineate current status employing augment within VS paradigm, our contributions domain. Furthermore, discuss challenges future prospects, drawing insights scholarly investigations. Initially, present an overview followed by introduction paradigms, which deviate significantly frameworks. Subsequently, delve into associated with development DL-based (DLLD), encompassing evaluation metrics, application scenarios, physical plausibility predicted conformations. In algorithms, it essential recognize multifaceted nature metrics. While prediction, measured success rate, aspect, scoring/screening power computational equally important given role tools VS. Regarding early methods focused blind where site unknown. recent studies suggest shift toward identifying sites rather than predicting poses models. contrast, known pocket been shown be more practical. Physical another significant challenge. Although DLLD models achieve higher rates compared methods, they may generate implausible local structures, such as incorrect bond angles or lengths, disadvantageous for postprocessing tasks like visualization. Finally, perspectives DLLD, emphasizing need improve generalization ability, strike balance between accuracy, account flexibility, enhance plausibility. Additionally, comparison generative regression context, exploring their respective strengths potential.

Язык: Английский

Процитировано

20

Diffusion models in protein structure and docking DOI Creative Commons
Jason Yim, H. Stärk, Gabriele Corso

и другие.

Wiley Interdisciplinary Reviews Computational Molecular Science, Год журнала: 2024, Номер 14(2)

Опубликована: Март 1, 2024

Abstract Generative AI is rapidly transforming the frontier of research in computational structural biology. Indeed, recent successes have substantially advanced protein design and drug discovery. One key methodologies underlying these advances diffusion models (DM). Diffusion originated computer vision, taking over image generation offering superior quality performance. These were subsequently extended modified for uses other areas including DMs are well equipped to model high dimensional, geometric data while exploiting strengths deep learning. In biology, example, they achieved state‐of‐the‐art results on 3D structure small molecule docking. This review covers basics models, associated modeling choices regarding molecular representations, capabilities, prevailing heuristics, as limitations forthcoming refinements. We also provide best practices around evaluation procedures help establish rigorous benchmarking evaluation. The intended a fresh view into highlight its potentials current challenges generative techniques article categorized under: Data Science > Artificial Intelligence/Machine Learning Structure Mechanism Molecular Structures Software Modeling

Язык: Английский

Процитировано

18