Improving functional protein generation via foundation model-derived latent space likelihood optimization DOI Creative Commons
Changge Guan, Fangping Wan, Marcelo D. T. Torres

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 8, 2025

A variety of deep generative models have been adopted to perform de novo functional protein generation. Compared 3D design, sequence-based generation methods, which aim generate amino acid sequences with desired functions, remain a major approach for due the abundance and quality sequence data, as well relatively low modeling complexity training. Although these are typically trained match from training exact matching every is not always essential. Certain changes (e.g., mismatches, insertions, deletions) may necessarily lead changes. This suggests that maximizing data likelihood beyond space could yield better models. Pre-trained large language (PLMs) like ESM2 can encode into latent space, potentially serving validators. We propose by simultaneously optimizing in both derived PLM. scheme also be viewed knowledge distillation dynamically re-weights samples during applied our method train GPT- (i.e., autoregressive transformers) antimicrobial peptide (AMP) malate dehydrogenase (MDH) tasks. Computational experiments confirmed outperformed various adversarial net, variational autoencoder, GPT model without proposed strategy) on tasks, demonstrating effectiveness multi-likelihood optimization strategy.

Language: Английский

PEP-FOLD4: a pH-dependent force field for peptide structure prediction in aqueous solution DOI Creative Commons
Julien Rey, Samuel Murail, Sjoerd de Vries

et al.

Nucleic Acids Research, Journal Year: 2023, Volume and Issue: 51(W1), P. W432 - W437

Published: May 11, 2023

Abstract Accurate and fast structure prediction of peptides less 40 amino acids in aqueous solution has many biological applications, but their conformations are pH- salt concentration-dependent. In this work, we present PEP-FOLD4 which goes one step beyond machine-learning approaches, such as AlphaFold2, TrRosetta RaptorX. Adding the Debye-Hueckel formalism for charged-charged side chain interactions to a Mie all intramolecular (backbone chain) interactions, PEP-FOLD4, based on coarse-grained representation peptides, performs well methods well-structured displays significant improvements poly-charged peptides. is available at http://bioserv.rpbs.univ-paris-diderot.fr/services/PEP-FOLD4. This server free there no login requirement.

Language: Английский

Citations

52

Machine learning for antimicrobial peptide identification and design DOI
Fangping Wan, Felix Wong, James J. Collins

et al.

Nature Reviews Bioengineering, Journal Year: 2024, Volume and Issue: 2(5), P. 392 - 407

Published: Feb. 26, 2024

Language: Английский

Citations

52

Cyclic peptide structure prediction and design using AlphaFold DOI Creative Commons
Stephen Rettie, Katelyn V. Campbell, Asim K. Bera

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Feb. 26, 2023

ABSTRACT Deep learning networks offer considerable opportunities for accurate structure prediction and design of biomolecules. While cyclic peptides have gained significant traction as a therapeutic modality, developing deep methods designing such has been slow, mostly due to the small number available structures molecules in this size range. Here, we report approaches modify AlphaFold network peptides. Our results show approach can accurately predict native from single sequence, with 36 out 49 cases predicted high confidence (pLDDT > 0.85) matching root mean squared deviation (RMSD) less than 1.5 Å. Further extending our approach, describe computational sequences peptide backbones generated by other backbone sampling de novo new macrocyclic We extensively sampled structural diversity between 7–13 amino acids, identified around 10,000 unique candidates fold into designed confidence. X-ray crystal seven diverse sizes match very closely models (root < 1.0 Å), highlighting atomic level accuracy approach. The scaffolds developed here provide basis custom-designing targeted applications.

Language: Английский

Citations

46

The power and pitfalls of AlphaFold2 for structure prediction beyond rigid globular proteins DOI
Vinayak Agarwal, Andrew C. McShan

Nature Chemical Biology, Journal Year: 2024, Volume and Issue: 20(8), P. 950 - 959

Published: June 21, 2024

Language: Английский

Citations

28

Leveraging machine learning models for peptide–protein interaction prediction DOI Creative Commons
Yin Song, Xuenan Mi, Diwakar Shukla

et al.

RSC Chemical Biology, Journal Year: 2024, Volume and Issue: 5(5), P. 401 - 417

Published: Jan. 1, 2024

A timeline showcasing the progress of machine learning and deep methods for peptide–protein interaction predictions.

Language: Английский

Citations

21

A unified evolution-driven deep learning framework for virus variation driver prediction DOI
Zhiwei Nie, Xudong Liu, Jie Chen

et al.

Nature Machine Intelligence, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 17, 2025

Language: Английский

Citations

2

The hidden bacterial microproteome DOI Creative Commons
Igor Fesenko, Harutyun Sahakyan,

Rajat Dhyani

et al.

Molecular Cell, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 1, 2025

Language: Английский

Citations

2

DRAVP: A Comprehensive Database of Antiviral Peptides and Proteins DOI Creative Commons

Yanchao Liu,

Youzhuo Zhu,

Xin Sun

et al.

Viruses, Journal Year: 2023, Volume and Issue: 15(4), P. 820 - 820

Published: March 23, 2023

Viruses with rapid replication and easy mutation can become resistant to antiviral drug treatment. With novel viral infections emerging, such as the recent COVID-19 pandemic, therapies are urgently needed. Antiviral proteins, interferon, have been used for treating chronic hepatitis C decades. Natural-origin antimicrobial peptides, defensins, also identified possessing activities, including direct effects ability induce indirect immune responses viruses. To promote development of drugs, we constructed a data repository peptides proteins (DRAVP). The database provides general information, activity, structure physicochemical literature information proteins. Because most lack experimentally determined structures, AlphaFold was predict each peptide’s structure. A free website users (http://dravp.cpu-bioinfor.org/, accessed on 30 August 2022) facilitate retrieval sequence analysis. Additionally, all be from web interface. DRAVP aims useful resource developing drugs.

Language: Английский

Citations

25

AMP-Diffusion: Integrating Latent Diffusion with Protein Language Models for Antimicrobial Peptide Generation DOI Creative Commons
Tianlai Chen,

Pranay Vure,

Rishab Pulugurta

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: March 6, 2024

Abstract Denoising Diffusion Probabilistic Models (DDPMs) have emerged as a potent class of generative models, demonstrating exemplary performance across diverse AI domains such computer vision and natural language processing. In the realm protein design, while there been advances in structure-based, graph-based, discrete sequence-based diffusion, exploration continuous latent space diffusion within models (pLMs) remains nascent. this work, we introduce AMP-Diffusion, model tailored for antimicrobial peptide (AMP) harnessing capabilities state-of-the-art pLM, ESM-2, to de novo generate functional AMPs downstream experimental application. Our evaluations reveal that peptides generated by AMP-Diffusion align closely both pseudo-perplexity amino acid diversity when benchmarked against experimentally-validated AMPs, further exhibit relevant physicochemical properties similar these naturally-occurring sequences. Overall, findings underscore biological plausibility our sequences pave way their empirical validation. total, framework motivates future pLM-based design.

Language: Английский

Citations

15

AlphaFold-Multimer accurately captures interactions and dynamics of intrinsically disordered protein regions DOI Creative Commons
Alireza Omidi,

Mirko Möller,

Nawar Malhis

et al.

Proceedings of the National Academy of Sciences, Journal Year: 2024, Volume and Issue: 121(44)

Published: Oct. 24, 2024

Interactions mediated by intrinsically disordered protein regions (IDRs) pose formidable challenges in structural characterization. IDRs are highly versatile, capable of adopting diverse structures and engagement modes. Motivated recent strides structure prediction, we embarked on exploring the extent to which AlphaFold-Multimer can faithfully reproduce intricacies interactions involving IDRs. To this end, gathered multiple datasets covering versatile spectrum IDR binding modes used them probe AlphaFold-Multimer’s prediction their dynamics. Our analyses revealed that is not only predicting various types bound with high success rate, but distinguishing true from decoys, unreliable predictions accurate ones achievable appropriate use intrinsic scores. We found quality drops for more heterogeneous, fuzzy interaction types, most likely due lower interface hydrophobicity higher coil content. Notably though, certain scores, such as Predicted Aligned Error residue-ipTM, correlated heterogeneity IDR, enabling clear distinctions between homogeneous Finally, our benchmarking also be successful when using full-length proteins, cognate facilitate identification a given partner, established “minD,” pinpoints potential sites protein. study demonstrates correctly identify interacting predict mode partner.

Language: Английский

Citations

14