Improving functional protein generation via foundation model-derived latent space likelihood optimization DOI Creative Commons
Changge Guan, Fangping Wan, Marcelo D. T. Torres

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 8, 2025

A variety of deep generative models have been adopted to perform de novo functional protein generation. Compared 3D design, sequence-based generation methods, which aim generate amino acid sequences with desired functions, remain a major approach for due the abundance and quality sequence data, as well relatively low modeling complexity training. Although these are typically trained match from training exact matching every is not always essential. Certain changes (e.g., mismatches, insertions, deletions) may necessarily lead changes. This suggests that maximizing data likelihood beyond space could yield better models. Pre-trained large language (PLMs) like ESM2 can encode into latent space, potentially serving validators. We propose by simultaneously optimizing in both derived PLM. scheme also be viewed knowledge distillation dynamically re-weights samples during applied our method train GPT- (i.e., autoregressive transformers) antimicrobial peptide (AMP) malate dehydrogenase (MDH) tasks. Computational experiments confirmed outperformed various adversarial net, variational autoencoder, GPT model without proposed strategy) on tasks, demonstrating effectiveness multi-likelihood optimization strategy.

Language: Английский

Structure‐aware deep learning model for peptide toxicity prediction DOI Creative Commons
Hossein Ebrahimikondori, Darcy Sutherland, Anat Yanai

et al.

Protein Science, Journal Year: 2024, Volume and Issue: 33(7)

Published: June 22, 2024

Antimicrobial resistance is a critical public health concern, necessitating the exploration of alternative treatments. While antimicrobial peptides (AMPs) show promise, assessing their toxicity using traditional wet lab methods both time-consuming and costly. We introduce tAMPer, novel multi-modal deep learning model designed to predict peptide by integrating underlying amino acid sequence composition three-dimensional structure peptides. tAMPer adopts graph-based representation for peptides, encoding ColabFold-predicted structures, where nodes represent acids edges spatial interactions. Structural features are extracted graph neural networks, recurrent networks capture sequential dependencies. tAMPer's performance was assessed on publicly available protein benchmark an AMP hemolysis data we generated. On latter, achieves F1-score 68.7%, outperforming second-best method 23.4%. benchmark, exhibited improvement over 3.0% in compared current state-of-the-art methods. anticipate accelerate discovery development reducing reliance laborious screening experiments.

Language: Английский

Citations

5

Computational modeling and prediction of deletion mutants DOI Open Access
Hope Woods, Dominic L. Schiano,

Jonathan I. Aguirre

et al.

Structure, Journal Year: 2023, Volume and Issue: 31(6), P. 713 - 723.e3

Published: April 28, 2023

Language: Английский

Citations

12

DCTPep, the data of cancer therapy peptides DOI Creative Commons
Xin Sun,

Yanchao Liu,

Tianyue Ma

et al.

Scientific Data, Journal Year: 2024, Volume and Issue: 11(1)

Published: May 25, 2024

Abstract With the discovery of therapeutic activity peptides, they have emerged as a promising class anti-cancer agents due to their specific targeting, low toxicity, and potential for high selectivity. In particular, peptide-drug conjugates enter clinical, coupling targeted peptides with traditional chemotherapy drugs or cytotoxic will become new direction in cancer treatment. To facilitate drug development therapy we constructed DCTPep, novel, open, comprehensive database peptides. addition anticancer (ACPs), peptide library also includes related therapy. These data were collected manually from published research articles, patents, other protein databases. Data on include clinically investigated and/or approved therapy, which mainly come portal websites regulatory authorities organisations different countries regions. DCTPep has total 6214 entries, believe that contribute design screening future

Language: Английский

Citations

4

Direct conformational sampling from peptide energy landscapes through hypernetwork-conditioned diffusion DOI
Osama Abdin, Philip M. Kim

Nature Machine Intelligence, Journal Year: 2024, Volume and Issue: 6(7), P. 775 - 786

Published: June 27, 2024

Language: Английский

Citations

4

Improving functional protein generation via foundation model-derived latent space likelihood optimization DOI Creative Commons
Changge Guan, Fangping Wan, Marcelo D. T. Torres

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 8, 2025

A variety of deep generative models have been adopted to perform de novo functional protein generation. Compared 3D design, sequence-based generation methods, which aim generate amino acid sequences with desired functions, remain a major approach for due the abundance and quality sequence data, as well relatively low modeling complexity training. Although these are typically trained match from training exact matching every is not always essential. Certain changes (e.g., mismatches, insertions, deletions) may necessarily lead changes. This suggests that maximizing data likelihood beyond space could yield better models. Pre-trained large language (PLMs) like ESM2 can encode into latent space, potentially serving validators. We propose by simultaneously optimizing in both derived PLM. scheme also be viewed knowledge distillation dynamically re-weights samples during applied our method train GPT- (i.e., autoregressive transformers) antimicrobial peptide (AMP) malate dehydrogenase (MDH) tasks. Computational experiments confirmed outperformed various adversarial net, variational autoencoder, GPT model without proposed strategy) on tasks, demonstrating effectiveness multi-likelihood optimization strategy.

Language: Английский

Citations

0