Cited by Protein language model pseudolikelihoods capture features of in vivo B cell selection and evolution

Improving antibody language models with native pairing DOI

Sarah Burbach, Bryan Briney

Patterns, Год журнала: 2024, Номер 5(5), С. 100967 - 100967

Опубликована: Апрель 4, 2024

Existing antibody language models are limited by their use of unpaired sequence data. A recently published dataset ∼1.6 × 10

Язык: Английский

Процитировано

Addressing the antibody germline bias and its effect on language models for improved antibody design DOI

Tobias Hegelund Olsen, Iain H. Moal, Charlotte M. Deane

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Фев. 7, 2024

A bstract The versatile binding properties of antibodies have made them an extremely important class biotherapeutics. However, therapeutic antibody development is a complex, expensive and time-consuming task, with the final needing to not only strong specific binding, but also be minimally impacted by any developability issues. success transformer-based language models in protein sequence space availability vast amounts sequences, has led many antibody-specific help guide discovery design. Antibody diversity primarily arises from V(D)J recombination, mutations within CDRs, and/or small number away germline outside CDRs. Consequently, significant portion variable domain all natural sequences remains germline. This affects pre-training models, where this facet data introduces prevailing bias towards residues. poses challenge, as are often vital for generating potent target, meaning that need able suggest key In study, we explore implications bias, examining its impact on both general-protein models. We develop train series new optimised predicting non-germline then compare our model, AbLang-2, current show how it suggests diverse set valid high cumulative probability. AbLang-2 trained unpaired paired data, freely available ( https://github.com/oxpig/AbLang2.git ).

Язык: Английский

Процитировано

Large scale paired antibody language models DOI

Henry Kenlay, Frédéric A. Dreyer, Aleksandr Kovaltsuk

и другие.

PLoS Computational Biology, Год журнала: 2024, Номер 20(12), С. e1012646 - e1012646

Опубликована: Дек. 6, 2024

Antibodies are proteins produced by the immune system that can identify and neutralise a wide variety of antigens with high specificity affinity, constitute most successful class biotherapeutics. With advent next-generation sequencing, billions antibody sequences have been collected in recent years, though their application design better therapeutics has constrained sheer volume complexity data. To address this challenge, we present IgBert IgT5, best performing antibody-specific language models developed to date which consistently handle both paired unpaired variable region as input. These trained comprehensively using more than two billion million light heavy chains Observed Antibody Space dataset. We show our outperform existing protein on diverse range regression tasks relevant engineering. This advancement marks significant leap forward leveraging machine learning, large scale data sets high-performance computing for enhancing therapeutic development.

Язык: Английский

Процитировано

Enhancing Antibody Language Models with Structural Information DOI

Justin Barton, Jacob D. Galson, Jinwoo Leem

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Янв. 4, 2024

Abstract The central tenet of molecular biology is that a protein’s amino acid sequence determines its three-dimensional structure, and thus function. However, proteins with similar sequences do not always fold into the same shape, vice-versa, dissimilar can adopt folds. In this work, we explore antibodies, class in immune system, whose local shapes are highly unpredictable, even small variations their sequence. Inspired by CLIP method [1], propose multimodal contrastive learning approach, sequence-structure pre-training (CSSP), which amalgamates representations antibody structures mutual latent space. Integrating structural information leads both protein language models to show better correspondence similarity improves accuracy data efficiency downstream binding prediction tasks. We provide an optimised CSSP-trained model, AntiBERTa2-CSSP, for non-commercial use at https://huggingface.co/alchemab .

Язык: Английский

Процитировано

Biophysical cartography of the native and human-engineered antibody landscapes quantifies the plasticity of antibody developability DOI

Habib Bashour, Eva Smorodina, Matteo Pariset

и другие.

Communications Biology, Год журнала: 2024, Номер 7(1)

Опубликована: Июль 31, 2024

Designing effective monoclonal antibody (mAb) therapeutics faces a multi-parameter optimization challenge known as "developability", which reflects an antibody's ability to progress through development stages based on its physicochemical properties. While natural antibodies may provide valuable guidance for mAb selection, we lack comprehensive understanding of developability parameter (DP) plasticity (redundancy, predictability, sensitivity) and how the DP landscapes human-engineered relate one another. These gaps hinder fundamental profile cartography. To chart engineered landscapes, computed 40 sequence- 46 structure-based DPs over two million native single-chain sequences. We find lower redundancy among compared sequence-based DPs. Sequence sensitivity single amino acid substitutions varied by region DP, structure values across conformational ensemble structures. show that sequence are more predictable than ones different machine-learning tasks embeddings, indicating constrained design space. Human-engineered localize within antibodies, suggesting explore mere subspaces one. Our work quantifies developability, providing resource therapeutic design. Analysis 2 reveals form This large-scale analysis allows quantification plasticity, accelerating drug

Язык: Английский

Процитировано

Supervised fine-tuning of pre-trained antibody language models improves antigen specificity prediction DOI

Meng Wang,

Jonathan Patsenker,

Henry Li

и другие.

PLoS Computational Biology, Год журнала: 2025, Номер 21(3), С. e1012153 - e1012153

Опубликована: Март 31, 2025

Antibodies play a crucial role in the adaptive immune response, with their specificity to antigens being fundamental determinant of function. Accurate prediction antibody-antigen is vital for understanding responses, guiding vaccine design, and developing antibody-based therapeutics. In this study, we present method supervised fine-tuning antibody language models, which improves on pre-trained model embeddings binding SARS-CoV-2 spike protein influenza hemagglutinin. We perform four models predict these demonstrate that fine-tuned classifiers exhibit enhanced predictive accuracy compared trained embeddings. Additionally, investigate change attention activations after gain insights into molecular basis antigen recognition by antibodies. Furthermore, apply BCR repertoire data related vaccination, demonstrating ability capture changes following vaccination. Overall, our study highlights effect as valuable tools improve prediction.

Язык: Английский

Процитировано

Benchmarking antibody clustering methods using sequence, structural, and machine learning similarity measures for antibody discovery applications DOI

Dawid Chomicz,

Jarosław Kończak,

Sonia Wróbel

и другие.

Frontiers in Molecular Biosciences, Год журнала: 2024, Номер 11

Опубликована: Март 28, 2024

Antibodies are proteins produced by our immune system that have been harnessed as biotherapeutics. The discovery of antibody-based therapeutics relies on analyzing large volumes diverse sequences coming from phage display or animal immunizations. Identification suitable therapeutic candidates is achieved grouping the their similarity and subsequent selection a set antibodies for further tests. Such groupings typically created using sequence-similarity measures alone. Maximizing diversity in selected crucial to reducing number tests molecules with near-identical properties. With advances structural modeling machine learning, can now be grouped across other dimensions, such predicted paratopes three-dimensional structures. Here we benchmarked antibody methods clonotype, sequence, paratope prediction, structure embedding information. results were two tasks: binder detection epitope mapping. We demonstrate no method appears outperform others, while mapping, paratope, clusterings top performers. Most importantly, all propose orthogonal groupings, offering more pools when multiple than any single To facilitate exploring different methods, an online tool-CLAP-available at ( clap.naturalantibody.com ) allows users group, contrast, visualize methods.

Язык: Английский

Процитировано

Accurate prediction of antibody function and structure using bio-inspired antibody language model DOI

Hongtai Jing,

Zhengtao Gao,

Sheng Xu

и другие.

Briefings in Bioinformatics, Год журнала: 2024, Номер 25(4)

Опубликована: Май 23, 2024

Abstract In recent decades, antibodies have emerged as indispensable therapeutics for combating diseases, particularly viral infections. However, their development has been hindered by limited structural information and labor-intensive engineering processes. Fortunately, significant advancements in deep learning methods facilitated the precise prediction of protein structure function leveraging co-evolution from homologous proteins. Despite these advances, predicting conformation remains challenging due to unique evolution high flexibility antigen-binding regions. Here, address this challenge, we present Bio-inspired Antibody Language Model (BALM). This model is trained on a vast dataset comprising 336 million 40% nonredundant unlabeled antibody sequences, capturing both conserved properties specific antibodies. Notably, BALM showcases exceptional performance across four tasks. Moreover, introduce BALMFold, an end-to-end method derived BALM, capable swiftly full atomic structures individual sequences. Remarkably, BALMFold outperforms those well-established like AlphaFold2, IgFold, ESMFold OmegaFold benchmark, demonstrating potential advance innovative streamline therapeutic reducing need unnecessary trials. The server freely available at https://beamlab-sh.com/models/BALMFold.

Язык: Английский

Процитировано

Language model-based B cell receptor sequence embeddings can effectively encode receptor specificity DOI

Meng Wang,

Jonathan Patsenker,

Henry Li

и другие.

Nucleic Acids Research, Год журнала: 2023, Номер 52(2), С. 548 - 557

Опубликована: Дек. 18, 2023

High throughput sequencing of B cell receptors (BCRs) is increasingly applied to study the immense diversity antibodies. Learning biologically meaningful embeddings BCR sequences beneficial for predictive modeling. Several embedding methods have been developed BCRs, but no direct performance benchmarking exists. Moreover, impact input sequence length and paired-chain information on prediction remains be explored. We evaluated multiple models predict properties receptor specificity. Despite differences in model architectures, most effectively capture BCR-specific slightly outperform general protein language predicting In addition, incorporating full-length heavy chains paired light chain improves all embeddings. This provides insights into improve downstream applications antibody analysis discovery.

Язык: Английский

Процитировано

Reading the repertoire: Progress in adaptive immune receptor analysis using machine learning DOI

Timothy J O'Donnell,

Chakravarthi Kanduri, Giulio Isacchini

и другие.

Cell Systems, Год журнала: 2024, Номер 15(12), С. 1168 - 1189

Опубликована: Дек. 1, 2024

Язык: Английский

Процитировано