Protein language model pseudolikelihoods capture features of in vivo B cell selection and evolution DOI Creative Commons
Daphne van Ginneken,

Anamay Samant,

Karlis Daga-Krumins

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Дек. 11, 2024

Abstract B cell selection and evolution play crucial roles in dictating successful immune responses. Recent advancements sequencing technologies deep-learning strategies have paved the way for generating exploiting an ever-growing wealth of antibody repertoire data. The self-supervised nature protein language models (PLMs) has demonstrated ability to learn complex representations sequences been leveraged a wide range applications including diagnostics, structural modeling, antigen-specificity predictions. PLM-derived likelihoods used improve affinities vitro, raising question whether PLMs can capture predict features vivo. Here, we explore how general antibody-specific PLM-generated sequence pseudolikelihoods (SPs) relate vivo such as expansion, isotype usage, somatic hypermutation (SHM) at single-cell resolution. Our results demonstrate that type PLM region input significantly affect generated SP. Contrary previous vitro reports, observe negative correlation between SPs binding affinity, whereas SHM, antigen specificity were strongly correlated with SPs. By constructing evolutionary lineage trees clones from human mouse repertoires, SHMs are routinely among most likely mutations suggested by mutating residues lower absolute than conserved residues. findings highlight potential further suggest their assist discovery engineering. Key points - In contrast work (Hie et al., 2024), pseudolikelihood (SP) affinity. This be explained inherent germline bias posed training data difference settings. also reveal considerable V-gene family, isotype, amount (SHM). Moreover, labeled antigen-binding SP is consistent reconstructing trajectories, detected predictable SHM using PLMs. We (CDR3 or full V(D)J) provided model, well used, influence resulting

Язык: Английский

Improving antibody language models with native pairing DOI Creative Commons
Sarah Burbach, Bryan Briney

Patterns, Год журнала: 2024, Номер 5(5), С. 100967 - 100967

Опубликована: Апрель 4, 2024

Existing antibody language models are limited by their use of unpaired sequence data. A recently published dataset ∼1.6 × 10

Язык: Английский

Процитировано

22

Addressing the antibody germline bias and its effect on language models for improved antibody design DOI Creative Commons
Tobias Hegelund Olsen, Iain H. Moal, Charlotte M. Deane

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Фев. 7, 2024

A bstract The versatile binding properties of antibodies have made them an extremely important class biotherapeutics. However, therapeutic antibody development is a complex, expensive and time-consuming task, with the final needing to not only strong specific binding, but also be minimally impacted by any developability issues. success transformer-based language models in protein sequence space availability vast amounts sequences, has led many antibody-specific help guide discovery design. Antibody diversity primarily arises from V(D)J recombination, mutations within CDRs, and/or small number away germline outside CDRs. Consequently, significant portion variable domain all natural sequences remains germline. This affects pre-training models, where this facet data introduces prevailing bias towards residues. poses challenge, as are often vital for generating potent target, meaning that need able suggest key In study, we explore implications bias, examining its impact on both general-protein models. We develop train series new optimised predicting non-germline then compare our model, AbLang-2, current show how it suggests diverse set valid high cumulative probability. AbLang-2 trained unpaired paired data, freely available ( https://github.com/oxpig/AbLang2.git ).

Язык: Английский

Процитировано

18

Large scale paired antibody language models DOI Creative Commons
Henry Kenlay, Frédéric A. Dreyer, Aleksandr Kovaltsuk

и другие.

PLoS Computational Biology, Год журнала: 2024, Номер 20(12), С. e1012646 - e1012646

Опубликована: Дек. 6, 2024

Antibodies are proteins produced by the immune system that can identify and neutralise a wide variety of antigens with high specificity affinity, constitute most successful class biotherapeutics. With advent next-generation sequencing, billions antibody sequences have been collected in recent years, though their application design better therapeutics has constrained sheer volume complexity data. To address this challenge, we present IgBert IgT5, best performing antibody-specific language models developed to date which consistently handle both paired unpaired variable region as input. These trained comprehensively using more than two billion million light heavy chains Observed Antibody Space dataset. We show our outperform existing protein on diverse range regression tasks relevant engineering. This advancement marks significant leap forward leveraging machine learning, large scale data sets high-performance computing for enhancing therapeutic development.

Язык: Английский

Процитировано

11

Enhancing Antibody Language Models with Structural Information DOI Creative Commons
Justin Barton, Jacob D. Galson, Jinwoo Leem

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Янв. 4, 2024

Abstract The central tenet of molecular biology is that a protein’s amino acid sequence determines its three-dimensional structure, and thus function. However, proteins with similar sequences do not always fold into the same shape, vice-versa, dissimilar can adopt folds. In this work, we explore antibodies, class in immune system, whose local shapes are highly unpredictable, even small variations their sequence. Inspired by CLIP method [1], propose multimodal contrastive learning approach, sequence-structure pre-training (CSSP), which amalgamates representations antibody structures mutual latent space. Integrating structural information leads both protein language models to show better correspondence similarity improves accuracy data efficiency downstream binding prediction tasks. We provide an optimised CSSP-trained model, AntiBERTa2-CSSP, for non-commercial use at https://huggingface.co/alchemab .

Язык: Английский

Процитировано

9

Biophysical cartography of the native and human-engineered antibody landscapes quantifies the plasticity of antibody developability DOI Creative Commons
Habib Bashour, Eva Smorodina, Matteo Pariset

и другие.

Communications Biology, Год журнала: 2024, Номер 7(1)

Опубликована: Июль 31, 2024

Designing effective monoclonal antibody (mAb) therapeutics faces a multi-parameter optimization challenge known as "developability", which reflects an antibody's ability to progress through development stages based on its physicochemical properties. While natural antibodies may provide valuable guidance for mAb selection, we lack comprehensive understanding of developability parameter (DP) plasticity (redundancy, predictability, sensitivity) and how the DP landscapes human-engineered relate one another. These gaps hinder fundamental profile cartography. To chart engineered landscapes, computed 40 sequence- 46 structure-based DPs over two million native single-chain sequences. We find lower redundancy among compared sequence-based DPs. Sequence sensitivity single amino acid substitutions varied by region DP, structure values across conformational ensemble structures. show that sequence are more predictable than ones different machine-learning tasks embeddings, indicating constrained design space. Human-engineered localize within antibodies, suggesting explore mere subspaces one. Our work quantifies developability, providing resource therapeutic design. Analysis 2 reveals form This large-scale analysis allows quantification plasticity, accelerating drug

Язык: Английский

Процитировано

8

Supervised fine-tuning of pre-trained antibody language models improves antigen specificity prediction DOI Creative Commons
Meng Wang,

Jonathan Patsenker,

Henry Li

и другие.

PLoS Computational Biology, Год журнала: 2025, Номер 21(3), С. e1012153 - e1012153

Опубликована: Март 31, 2025

Antibodies play a crucial role in the adaptive immune response, with their specificity to antigens being fundamental determinant of function. Accurate prediction antibody-antigen is vital for understanding responses, guiding vaccine design, and developing antibody-based therapeutics. In this study, we present method supervised fine-tuning antibody language models, which improves on pre-trained model embeddings binding SARS-CoV-2 spike protein influenza hemagglutinin. We perform four models predict these demonstrate that fine-tuned classifiers exhibit enhanced predictive accuracy compared trained embeddings. Additionally, investigate change attention activations after gain insights into molecular basis antigen recognition by antibodies. Furthermore, apply BCR repertoire data related vaccination, demonstrating ability capture changes following vaccination. Overall, our study highlights effect as valuable tools improve prediction.

Язык: Английский

Процитировано

1

Benchmarking antibody clustering methods using sequence, structural, and machine learning similarity measures for antibody discovery applications DOI Creative Commons

Dawid Chomicz,

Jarosław Kończak,

Sonia Wróbel

и другие.

Frontiers in Molecular Biosciences, Год журнала: 2024, Номер 11

Опубликована: Март 28, 2024

Antibodies are proteins produced by our immune system that have been harnessed as biotherapeutics. The discovery of antibody-based therapeutics relies on analyzing large volumes diverse sequences coming from phage display or animal immunizations. Identification suitable therapeutic candidates is achieved grouping the their similarity and subsequent selection a set antibodies for further tests. Such groupings typically created using sequence-similarity measures alone. Maximizing diversity in selected crucial to reducing number tests molecules with near-identical properties. With advances structural modeling machine learning, can now be grouped across other dimensions, such predicted paratopes three-dimensional structures. Here we benchmarked antibody methods clonotype, sequence, paratope prediction, structure embedding information. results were two tasks: binder detection epitope mapping. We demonstrate no method appears outperform others, while mapping, paratope, clusterings top performers. Most importantly, all propose orthogonal groupings, offering more pools when multiple than any single To facilitate exploring different methods, an online tool-CLAP-available at ( clap.naturalantibody.com ) allows users group, contrast, visualize methods.

Язык: Английский

Процитировано

6

Accurate prediction of antibody function and structure using bio-inspired antibody language model DOI Creative Commons

Hongtai Jing,

Zhengtao Gao,

Sheng Xu

и другие.

Briefings in Bioinformatics, Год журнала: 2024, Номер 25(4)

Опубликована: Май 23, 2024

Abstract In recent decades, antibodies have emerged as indispensable therapeutics for combating diseases, particularly viral infections. However, their development has been hindered by limited structural information and labor-intensive engineering processes. Fortunately, significant advancements in deep learning methods facilitated the precise prediction of protein structure function leveraging co-evolution from homologous proteins. Despite these advances, predicting conformation remains challenging due to unique evolution high flexibility antigen-binding regions. Here, address this challenge, we present Bio-inspired Antibody Language Model (BALM). This model is trained on a vast dataset comprising 336 million 40% nonredundant unlabeled antibody sequences, capturing both conserved properties specific antibodies. Notably, BALM showcases exceptional performance across four tasks. Moreover, introduce BALMFold, an end-to-end method derived BALM, capable swiftly full atomic structures individual sequences. Remarkably, BALMFold outperforms those well-established like AlphaFold2, IgFold, ESMFold OmegaFold benchmark, demonstrating potential advance innovative streamline therapeutic reducing need unnecessary trials. The server freely available at https://beamlab-sh.com/models/BALMFold.

Язык: Английский

Процитировано

6

Language model-based B cell receptor sequence embeddings can effectively encode receptor specificity DOI Creative Commons
Meng Wang,

Jonathan Patsenker,

Henry Li

и другие.

Nucleic Acids Research, Год журнала: 2023, Номер 52(2), С. 548 - 557

Опубликована: Дек. 18, 2023

High throughput sequencing of B cell receptors (BCRs) is increasingly applied to study the immense diversity antibodies. Learning biologically meaningful embeddings BCR sequences beneficial for predictive modeling. Several embedding methods have been developed BCRs, but no direct performance benchmarking exists. Moreover, impact input sequence length and paired-chain information on prediction remains be explored. We evaluated multiple models predict properties receptor specificity. Despite differences in model architectures, most effectively capture BCR-specific slightly outperform general protein language predicting In addition, incorporating full-length heavy chains paired light chain improves all embeddings. This provides insights into improve downstream applications antibody analysis discovery.

Язык: Английский

Процитировано

12

Reading the repertoire: Progress in adaptive immune receptor analysis using machine learning DOI Creative Commons

Timothy J O'Donnell,

Chakravarthi Kanduri, Giulio Isacchini

и другие.

Cell Systems, Год журнала: 2024, Номер 15(12), С. 1168 - 1189

Опубликована: Дек. 1, 2024

Язык: Английский

Процитировано

4