Sequence-based TCR-Peptide Representations Using Cross-Epitope Contrastive Fine-tuning of Protein Language Models DOI Creative Commons
Chiho Im, Ryan Zhao, Scott D. Boyd

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Oct. 29, 2024

Abstract Understanding T-Cell receptor (TCR) and epitope interactions is critical for advancing our knowledge of the human immune system. Traditional approaches that use sequence similarity or structure data often struggle to scale generalize across diverse TCR/epitope interactions. To address these limitations, we introduce ImmuneCLIP, a contrastive fine-tuning method leverages pre-trained protein language models align TCR embeddings in shared latent space. ImmuneCLIP evaluated on ranking binding prediction tasks, where it consistently outperforms sequence-similarity based methods existing deep learning models. Furthermore, shows strong generalization capabilities even with limited training data, highlighting its potential studying uncovering patterns improve understanding recognition systems.

Language: Английский

Fine-tuning protein language models boosts predictions across diverse tasks DOI Creative Commons
Robert Schmirler, Michael Heinzinger, Burkhard Rost

et al.

Nature Communications, Journal Year: 2024, Volume and Issue: 15(1)

Published: Aug. 28, 2024

Prediction methods inputting embeddings from protein language models have reached or even surpassed state-of-the-art performance on many prediction tasks. In natural processing fine-tuning large has become the de facto standard. contrast, most model-based predictions do not back-propagate to model. Here, we compare of three (ESM2, ProtT5, Ankh) eight different Two results stand out. Firstly, task-specific supervised almost always improves downstream predictions. Secondly, parameter-efficient can reach similar improvements consuming substantially fewer resources at up 4.5-fold acceleration training over full models. Our suggest try fine-tuning, in particular for problems with small datasets, such as fitness landscape a single protein. For ease adaptability, provide easy-to-use notebooks fine-tune all used during this work per-protein (pooling) and per-residue

Language: Английский

Citations

22

Language models for biological research: a primer DOI
Elana P. Simon, Kyle Swanson, James Zou

et al.

Nature Methods, Journal Year: 2024, Volume and Issue: 21(8), P. 1422 - 1429

Published: Aug. 1, 2024

Language: Английский

Citations

13

Deep learning prediction of enzyme optimum pH DOI Creative Commons
Japheth E. Gado, Matthew W. Knotts, Ada Y. Shaw

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: June 22, 2023

Abstract The relationship between pH and enzyme catalytic activity, especially the optimal (pHopt) at which enzymes function, is critical for biotechnological applications. Hence, computational methods to predict pHopt will enhance discovery design by facilitating accurate identification of that function optimally specific levels, elucidating sequence-function relationships. In this study, we proposed evaluated various machine-learning predicting pHopt, conducting extensive hyperparameter optimization, training over 11,000 model instances. Our results demonstrate models utilizing language embeddings markedly outperform other in pHopt. We present EpHod, best-performing model, making it publicly available researchers. From sequence data, EpHod directly learns structural biophysical features relate including proximity residues center accessibility solvent molecules. Overall, presents a promising advancement prediction potentially speed up development technologies.

Language: Английский

Citations

14

Aggregating Residue-Level Protein Language Model Embeddings with Optimal Transport DOI Creative Commons
Navid Naderializadeh, Rohit Singh

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Jan. 31, 2024

Abstract Protein language models (PLMs) have emerged as powerful approaches for mapping protein sequences into embeddings suitable various applications. As representation schemes, PLMs generate per-token (i.e., per-residue) representations, resulting in variable-sized outputs based on length. This variability poses a challenge protein-level prediction tasks that require uniform-sized consistent analysis across different proteins. Previous work has typically used average pooling to summarize token-level PLM outputs, but it is unclear whether this method effectively prioritizes the relevant information representations. We introduce novel utilizing optimal transport convert variable-length fixed-length conceptualize samples from probabilistic distribution and employ sliced-Wasserstein distances map these against reference set, creating Euclidean embedding output space. The agnostic length of input represents entire protein. demonstrate superiority our over several downstream tasks, particularly with constrained sizes, enabling smaller-scale match or exceed performance average-pooled larger-scale PLMs. Our aggregation scheme especially effective longer by capturing essential might be lost through pooling.

Language: Английский

Citations

5

Learning the language of protein–protein interactions DOI Creative Commons

Varun Ullanat,

Bowen Jing,

Samuel Sledzieski

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: March 10, 2025

Protein Language Models (PLMs) trained on large databases of protein sequences have proven effective in modeling biology across a wide range applications. However, while PLMs excel at capturing individual properties, they face challenges natively representing protein–protein interactions (PPIs), which are crucial to understanding cellular processes and disease mechanisms. Here, we introduce MINT, PLM specifically designed model sets interacting proteins contextual scalable manner. Using unsupervised training curated PPI dataset derived from the STRING database, MINT outperforms existing diverse tasks relating interactions, including binding affinity prediction estimation mutational effects. Beyond these core capabilities, it excels complex assemblies surpasses specialized models antibody–antigen T cell receptor–epitope prediction. MINT's predictions impacts oncogenic PPIs align with experimental studies, provides reliable estimates for potential cross–neutralization antibodies against SARS–CoV–2 variants concern. These findings position as powerful tool elucidating significant implications biomedical research therapeutic discovery.

Language: Английский

Citations

0

TUnA: an uncertainty-aware transformer model for sequence-based protein–protein interaction prediction DOI Creative Commons
Young Su Ko, Jonathan Parkinson, Cong Liu

et al.

Briefings in Bioinformatics, Journal Year: 2024, Volume and Issue: 25(5)

Published: July 25, 2024

Abstract Protein–protein interactions (PPIs) are important for many biological processes, but predicting them from sequence data remains challenging. Existing deep learning models often cannot generalize to proteins not present in the training set and do provide uncertainty estimates their predictions. To address these limitations, we TUnA, a Transformer-based uncertainty-aware model PPI prediction. TUnA uses ESM-2 embeddings with Transformer encoders incorporates Spectral-normalized Neural Gaussian Process. achieves state-of-the-art performance and, importantly, evaluates unseen sequences. We demonstrate that TUnA’s can effectively identify most reliable predictions, significantly reducing false positives. This capability is crucial bridging gap between computational predictions experimental validation.

Language: Английский

Citations

4

Combining Directed Evolution with Machine Learning Enables Accurate Genotype-to-Phenotype Predictions DOI Creative Commons
Alexander J. Howard, Ellen Youngsoo Rim, Oscar D. Garrett

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 29, 2025

Abstract Linking sequence variation to phenotypic effects is critical for efficient exploitation of large genomic datasets. Here we present a novel approach combining directed evolution with protein language modeling characterize naturally-evolved variants rice immune receptor. Using high-throughput evolution, engineered the receptor Pik-1 bind and recognize fungal proteins Avr-PikC Avr-PikF, which evade detection by currently characterized alleles. A model was fine-tuned on this data correlate ligand binding behavior. This then used found in 3,000 Rice Genomes Project dataset. Two scored highly against Avr-PikC, vitro analyses confirmed their improved over wild-type Overall, machine learning identified promising sources disease resistance shows potential utility exploring other interest.

Language: Английский

Citations

0

Foundation models of protein sequences: A brief overview DOI Creative Commons

Andreas Bjerregaard,

Peter Mørch Groth, Søren Hauberg

et al.

Current Opinion in Structural Biology, Journal Year: 2025, Volume and Issue: 91, P. 103004 - 103004

Published: Feb. 20, 2025

Language: Английский

Citations

0

Technology-supported differentiated biology education: Trends, methods, content, and impacts DOI

Afrizal Mammaliang Nurdin,

Abdul Gofur, Murni Sapta Sari

et al.

Eurasia Journal of Mathematics Science and Technology Education, Journal Year: 2025, Volume and Issue: 21(3), P. em2598 - em2598

Published: Feb. 25, 2025

This study aims to fill the gap in understanding trends, methods, content, and impacts of technology implementation differentiated biology education at secondary higher levels. The methodology employed is a systematic literature review on use education. search was conducted using terms ‘technology’ AND (‘differentiated instruction’ OR ‘personalized learning’ ‘adaptive teaching’ ‘learning style’) ‘biology education’ Scopus database, yielding 922 articles, which only 18 met criteria for further analysis. findings indicate rapid increase publications, with 61% articles published between 2022 2024. majority publications come from journals fields <i>social sciences/education</i>, while contributions biochemistry, genetics, molecular remain limited, suggesting need cross-disciplinary collaboration. Most studies (78%) used quantitative mixed 72% focusing most commonly technologies include hands-on tools, data analysis collaborative animal anatomy physiology as dominant topics. These support learning by enhancing understanding, engagement, outcomes, well observation scientific explanation skills school level, research bioinformatics level.

Language: Английский

Citations

0

LocPro: a deep learning-based prediction of protein subcellular localization for promoting multi-directional pharmaceutical research DOI Creative Commons
Yintao Zhang, Lingyan Zheng,

Nanxin You

et al.

Journal of Pharmaceutical Analysis, Journal Year: 2025, Volume and Issue: unknown, P. 101255 - 101255

Published: March 1, 2025

Language: Английский

Citations

0