Accurate structure prediction of immune proteins using parameter-efficient transfer learning DOI Creative Commons

Zhu Tian,

Milong Ren,

Z. He

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Ноя. 15, 2024

Abstract Accurate prediction of immune protein structures is crucial for understanding the system and advancing immunotherapy development. While deep learning methods have significantly advanced structure by extracting evolutionary constraints from homologous sequences a target protein, they struggle with proteins due to limited number known lack in hypervariable regions. To address this challenge, we propose ImmuneFold, transfer approach that fine-tunes ESMFold specifically proteins. We leverage low-rank adaption (LoRA), parameter-efficient fine-tuning technique requires considerably less memory substantially fewer parameters. Evaluations on various proteins, including T-cell receptors, antibodies, nanobodies, demonstrate ImmuneFold outperforms existing accuracy. Furthermore, apply develop zero-shot protocol TCR-epitope binding prediction. Unlike previous supervised suffering severe overfitting experimental data, our first predicts using then directly estimates affinity calculating Rosseta energy. datasets suggest method robust accurate predicting binding. In summary, demonstrates predictions binding, highlighting its potential advance development immunotherapies.

Язык: Английский

SELFprot: Effective and Efficient Multitask Finetuning Methods for Protein Parameter Prediction DOI Creative Commons
Marltan O. Wilson,

Thomas Coudrat,

Andrew C. Warden

и другие.

Journal of Chemical Information and Modeling, Год журнала: 2025, Номер unknown

Опубликована: Март 17, 2025

Accurately predicting protein–ligand interactions and enzymatic kinetics remains a challenge for computational biology. Here, we present SELFprot, suite of modular transformer-based machine learning architectures that leverage the ESM2–35M model architecture protein sequence small molecule embeddings to improve predictions complex biochemical interactions. SELFprot employs multitask parameter-efficient finetuning through low-rank adaptation, allowing adaptive, data-driven refinement. Furthermore, ensemble techniques are used enhance robustness reduce prediction variance. Evaluated on BindingDB CatPred-DB data sets, achieves competitive performance with notable improvements in kcat, Km, Ki, Kd, IC50, EC50 values as well classification functional site residues. With comparable accuracy existing models an order magnitude fewer parameters, demonstrates versatility efficiency, making it valuable tool interaction studies bioengineering.

Язык: Английский

Процитировано

0

Sequence-Based TCR-Peptide Representations Using Cross-Epitope Contrastive Fine-Tuning of Protein Language Models DOI
Chiho Im, Ryan Zhao, Scott D. Boyd

и другие.

Lecture notes in computer science, Год журнала: 2025, Номер unknown, С. 34 - 48

Опубликована: Янв. 1, 2025

Язык: Английский

Процитировано

0

Applicability of AlphaFold2 in the modeling of dimeric, trimeric, and tetrameric coiled‐coil domains DOI Creative Commons
Rafał Madaj, Mikel Martínez-Goikoetxea, Kamil Kamiński

и другие.

Protein Science, Год журнала: 2024, Номер 34(1)

Опубликована: Дек. 17, 2024

Coiled coils are a common protein structural motif involved in cellular functions ranging from mediating protein-protein interactions to facilitating processes such as signal transduction or regulation of gene expression. They formed by two more alpha helices that wind around central axis form buried hydrophobic core. Various forms coiled-coil bundles have been reported, each characterized the number, orientation, and degree winding constituent helices. This variability is underpinned short sequence repeats coiled whose properties determine both their overall topology local geometry The strikingly repetitive has enabled development accurate sequence-based prediction methods; however, modeling domains remains challenging task. In this work, we evaluated accuracy AlphaFold2 domains, predicting global topological properties. Furthermore, show oligomeric state can be achieved using internal representations AlphaFold2, with performance better than any previous state-of-the-art method (code available at https://github.com/labstructbioinf/dc2_oligo).

Язык: Английский

Процитировано

2

Benchmarking text-integrated protein language model embeddings and embedding fusion on diverse downstream tasks DOI Creative Commons
Young Su Ko, Jonathan Parkinson, Wei Wang

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Авг. 26, 2024

Abstract Protein language models (pLMs) have traditionally been trained in an unsupervised manner using large protein sequence databases with autoregressive or masked-language modeling training paradigm. Recent methods attempted to enhance pLMs by integrating additional information, the form of text, which are referred as “text+protein” (tpLMs). We evaluate and compare six tpLMs (OntoProtein, ProteinDT, ProtST, ProteinCLIP, ProTrek, ESM3) against ESM2, a baseline text-free pLM, across downstream tasks designed assess learned representations. find that while outperform ESM2 five out benchmarks, no tpLM was consistently best. Thus, we additionally investigate potential embedding fusion, exploring whether combinations embeddings can improve performance on benchmarks exploiting strengths multiple tpLMs. single highlighting its useful strategy field machine-learning for proteins. To facilitate practical application outline heuristic framework efficiently identify optimal combination embeddings, reducing exponential time complexity exhaustive search down manageable linear complexity. Using our fusion framework, achieve state-of-the-art performances protein-protein interaction prediction homologous recovery without any specific model adjustments hyperparameter tuning. Our experiments suggest is tool proteins toolbox. Lastly, this study highlights future research strategies maximizing utility pLMs.

Язык: Английский

Процитировано

0

Efficient Inference, Training, and Fine-tuning of Protein Language Models DOI Creative Commons
Muhammed Hasan Çelik, Xiaohui Xie

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Окт. 25, 2024

Abstract Protein language models have shown great promise in predicting protein structure, function, and the effects of missense variants on fitness. However, their use has been limited by substantial computational resources required. In this work, we focus improving efficiency (PLMs), specifically Evolutionary Scale Modeling (ESM) family, to increase accessibility PLMs. By implementing optimizations such as FlashAttention Partition-Attention, a novel technique designed handle proteins variable length, achieved 16-fold speedup inference time reduced memory usage 3 14 times for long proteins. Additionally, 4-bit quantization applied billion-parameter led 2 reduction consumption with minimal performance loss variant effect prediction task. Training was also improved, 6-fold runtime through activation checkpointing DeepSpeed Zero-Offload strategy. For fine-tuning, employed parameter-efficient methods, enabling state-of-the-art predictions properties functions training only model head or small fraction adapter weights. instance, Spearman’s correlation coefficient 70% melting point an 87% area under precision-recall curve (AU-PRC) transcription factor prediction. Our efficient ESM (ESME) implementation significantly lowers barrier using these powerful models, making them accessible academic laboratories resources. The ESME is available PyPI ( pypi.org/project/esm-efficient ) GitHub github.com/uci-cbcl/esm-efficient ).

Язык: Английский

Процитировано

0

Sequence-based TCR-Peptide Representations Using Cross-Epitope Contrastive Fine-tuning of Protein Language Models DOI Creative Commons
Chiho Im, Ryan Zhao, Scott D. Boyd

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Окт. 29, 2024

Abstract Understanding T-Cell receptor (TCR) and epitope interactions is critical for advancing our knowledge of the human immune system. Traditional approaches that use sequence similarity or structure data often struggle to scale generalize across diverse TCR/epitope interactions. To address these limitations, we introduce ImmuneCLIP, a contrastive fine-tuning method leverages pre-trained protein language models align TCR embeddings in shared latent space. ImmuneCLIP evaluated on ranking binding prediction tasks, where it consistently outperforms sequence-similarity based methods existing deep learning models. Furthermore, shows strong generalization capabilities even with limited training data, highlighting its potential studying uncovering patterns improve understanding recognition systems.

Язык: Английский

Процитировано

0

Accurate structure prediction of immune proteins using parameter-efficient transfer learning DOI Creative Commons

Zhu Tian,

Milong Ren,

Z. He

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Ноя. 15, 2024

Abstract Accurate prediction of immune protein structures is crucial for understanding the system and advancing immunotherapy development. While deep learning methods have significantly advanced structure by extracting evolutionary constraints from homologous sequences a target protein, they struggle with proteins due to limited number known lack in hypervariable regions. To address this challenge, we propose ImmuneFold, transfer approach that fine-tunes ESMFold specifically proteins. We leverage low-rank adaption (LoRA), parameter-efficient fine-tuning technique requires considerably less memory substantially fewer parameters. Evaluations on various proteins, including T-cell receptors, antibodies, nanobodies, demonstrate ImmuneFold outperforms existing accuracy. Furthermore, apply develop zero-shot protocol TCR-epitope binding prediction. Unlike previous supervised suffering severe overfitting experimental data, our first predicts using then directly estimates affinity calculating Rosseta energy. datasets suggest method robust accurate predicting binding. In summary, demonstrates predictions binding, highlighting its potential advance development immunotherapies.

Язык: Английский

Процитировано

0