Contrasting Sequence with Structure: Pre-training Graph Representations with PLMs DOI Creative Commons
Louis Robinson, Timothy Atkinson, Liviu Copoiu

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Dec. 4, 2023

Abstract Understanding protein function is vital for drug discovery, disease diagnosis, and engineering. While Protein Language Models (PLMs) pre-trained on vast sequence datasets have achieved remarkable success, equivalent Structure (PSMs) remain underrepresented. We attribute this to the relative lack of high-confidence structural data suitable pre-training objectives. In context, we introduce BioCLIP, a contrastive learning framework that pre-trains PSMs by leveraging PLMs, generating meaningful per-residue per-chain representations. When evaluated tasks such as protein-protein interaction, Gene Ontology annotation, Enzyme Commission number prediction, BioCLIP-trained consistently outperform models trained from scratch further enhance performance when merged with embeddings. Notably, BioCLIP approaches, or exceeds, specialized methods across all benchmarks using its singular design. Our work addresses challenges obtaining quality designing self-supervised objectives, setting stage more comprehensive function. Source code publicly available 2 .

Language: Английский

The Impact of Evolving SARS-CoV-2 Mutations and Variants on COVID-19 Vaccines DOI Creative Commons
Gary R. McLean, Jeremy P. Kamil, Benhur Lee

et al.

mBio, Journal Year: 2022, Volume and Issue: 13(2)

Published: March 30, 2022

The emergence of several new variants severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in recent months has raised concerns around the potential impact on ongoing vaccination programs. Data from clinical trials and real-world evidence suggest that current vaccines remain highly effective against alpha variant (B.1.1.7), while some have reduced efficacy effectiveness symptomatic disease caused by beta (B.1.351) delta (B.1.617.2); however, hospitalization remains high. Although data primary regimen omicron (B.1.1.529) are limited, booster programs using mRNA been shown to restore protection infection (regardless vaccine used for regimen) maintain high hospitalization. However, wanes with time after dose. Studies demonstrated reductions varying magnitude neutralizing activity vaccine-elicited antibodies a range SARS-CoV-2 variants, particular exhibiting partial immune escape. suggests T-cell responses preserved across platforms, regardless concern. Nevertheless, various mitigation strategies under investigation address or future including modification certain (including omicron), multivalent formulations, different delivery mechanisms.

Language: Английский

Citations

179

The Evolution and Biology of SARS-CoV-2 Variants DOI Open Access
Amalio Telenti, Emma B. Hodcroft, David L. Robertson

et al.

Cold Spring Harbor Perspectives in Medicine, Journal Year: 2022, Volume and Issue: 12(5), P. a041390 - a041390

Published: April 20, 2022

Our understanding of the still unfolding severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic would have been extremely limited without study genetics and evolution this new human coronavirus. Large-scale genome-sequencing efforts provided close to real-time tracking global spread diversification SARS-CoV-2 since its entry into population in late 2019. These data underpinned analysis origins, epidemiology, adaptations population: principally immune evasion increasing transmissibility. SARS-CoV-2, despite being a pathogen, was highly capable human-to-human transmission. During rapid humans, has evolved independent forms, so-called "variants concern," that are better optimized for The most important adaptation bat progenitor both SARS-CoV-1 infection (and other mammals) is use angiotensin-converting enzyme (ACE2) receptor. Relaxed structural constraints provide plasticity SARS-related spike protein permitting it accommodate significant amino acid replacements antigenic consequence compromising ability bind ACE2. Although bulk research justifiably concentrated on viral as main determinant changes transmissibility, there accumulating evidence contribution regions proteome virus-host interaction. Whereas levels community transmission recombinants genetically distinct variants at present low, when divergent cocirculate, recombination between clades detected, risk viruses with properties emerge. Applying computational machine learning methods genome sequence sets generate experimentally verifiable predictions will serve an early warning system novel variant surveillance be future vaccine planning. Omicron, latest concern, focused attention step change events, "shift," opposed incremental "drift" antigenicity. Both increase transmissibility shift Omicron led readily causing infections fully vaccinated and/or previously infected. Omicron's virulence, while reduced relative concern replaced, Delta, very much premised past exposure individuals clear signal boosted vaccination protects from disease. Currently, proven itself dangerous pathogen unpredictable evolutionary capacity, leading too great not ensure all world screened by sequencing, protected through available affordable vaccines, non-punitive strategies place detecting responding concern.

Language: Английский

Citations

158

Learning from prepandemic data to forecast viral escape DOI Creative Commons
Nicole N. Thadani, Sarah F. Gurev, Pascal Notin

et al.

Nature, Journal Year: 2023, Volume and Issue: 622(7984), P. 818 - 825

Published: Oct. 11, 2023

Abstract Effective pandemic preparedness relies on anticipating viral mutations that are able to evade host immune responses facilitate vaccine and therapeutic design. However, current strategies for evolution prediction not available early in a pandemic—experimental approaches require polyclonal antibodies test against 1–16 , existing computational methods draw heavily from strain prevalence make reliable predictions of variants concern 17–19 . To address this, we developed EVEscape, generalizable modular framework combines fitness deep learning model historical sequences with biophysical structural information. EVEscape quantifies the escape potential at scale has advantage being applicable before surveillance sequencing, experimental scans or three-dimensional structures antibody complexes available. We demonstrate trained 2020, is as accurate high-throughput variation SARS-CoV-2 other viruses including influenza, HIV understudied such Lassa Nipah. provide continually revised scores all strains predict probable further forecast emerging tool continuing development ( evescape.org ).

Language: Английский

Citations

79

Deep-learning-enabled protein–protein interaction analysis for prediction of SARS-CoV-2 infectivity and variant evolution DOI
Guangyu Wang, Xiaohong Liu, Kai Wang

et al.

Nature Medicine, Journal Year: 2023, Volume and Issue: 29(8), P. 2007 - 2018

Published: July 31, 2023

Language: Английский

Citations

44

Inferring effects of mutations on SARS-CoV-2 transmission from genomic surveillance data DOI Creative Commons
Brian Lee, Ahmed Abdul Quadeer, Muhammad S. Sohail

et al.

Nature Communications, Journal Year: 2025, Volume and Issue: 16(1)

Published: Jan. 7, 2025

Abstract New and more transmissible variants of SARS-CoV-2 have arisen multiple times over the course pandemic. Rapidly identifying mutations that affect transmission could improve our understanding viral biology highlight new warrant further study. Here we develop a generic, analytical epidemiological model to infer effects from genomic surveillance data. Applying data across many regions, find substantially rate, both within outside Spike protein. The largest on are strongly supported by experimental evidence prior studies. Importantly, detects lineages with increased even at low frequencies. As an example, significant advantages for Alpha, Delta, Omicron shortly after their appearances in regional data, when they comprised only around 1-2% sample sequences. Our thus facilitates rapid identification

Language: Английский

Citations

2

Fitness, growth and transmissibility of SARS-CoV-2 genetic variants DOI
Erik Volz

Nature Reviews Genetics, Journal Year: 2023, Volume and Issue: 24(10), P. 724 - 734

Published: June 16, 2023

Language: Английский

Citations

38

Predicting the antigenic evolution of SARS-COV-2 with deep learning DOI Creative Commons
Wenkai Han, Ningning Chen, Xinzhou Xu

et al.

Nature Communications, Journal Year: 2023, Volume and Issue: 14(1)

Published: June 13, 2023

The relentless evolution of SARS-CoV-2 poses a significant threat to public health, as it adapts immune pressure from vaccines and natural infections. Gaining insights into potential antigenic changes is critical but challenging due the vast sequence space. Here, we introduce Machine Learning-guided Antigenic Evolution Prediction (MLAEP), which combines structure modeling, multi-task learning, genetic algorithms predict viral fitness landscape explore via in silico directed evolution. By analyzing existing variants, MLAEP accurately infers variant order along evolutionary trajectories, correlating with corresponding sampling time. Our approach identified novel mutations immunocompromised COVID-19 patients emerging variants like XBB1.5. Additionally, predictions were validated through vitro neutralizing antibody binding assays, demonstrating that predicted exhibited enhanced evasion. profiling predicting changes, aids vaccine development enhances preparedness against future variants.

Language: Английский

Citations

36

GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics DOI

Maxim Zvyagin,

Alexander Brace, Kyle Hippe

et al.

The International Journal of High Performance Computing Applications, Journal Year: 2023, Volume and Issue: 37(6), P. 683 - 705

Published: Oct. 27, 2023

We seek to transform how new and emergent variants of pandemic-causing viruses, specifically SARS-CoV-2, are identified classified. By adapting large language models (LLMs) for genomic data, we build genome-scale (GenSLMs) which can learn the evolutionary landscape SARS-CoV-2 genomes. pre-training on over 110 million prokaryotic gene sequences fine-tuning a SARS-CoV-2-specific model 1.5 genomes, show that GenSLMs accurately rapidly identify concern. Thus, our knowledge, represents one first whole-genome scale foundation generalize other prediction tasks. demonstrate scaling GPU-based supercomputers AI-hardware accelerators utilizing 1.63 Zettaflops in training runs with sustained performance 121 PFLOPS mixed precision peak 850 PFLOPS. present initial scientific insights from examining tracking dynamics paving path realizing this biological data.

Language: Английский

Citations

35

Progressive loss of conserved spike protein neutralizing antibody sites in Omicron sublineages is balanced by preserved T cell immunity DOI Creative Commons
Alexander Muik, Bonny Gaby Lui,

Jasmin Quandt

et al.

Cell Reports, Journal Year: 2023, Volume and Issue: 42(8), P. 112888 - 112888

Published: July 31, 2023

Evolution of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Omicron variant has led to emergence sublineages with different patterns neutralizing antibody evasion. We report that BA.4/BA.5 breakthrough infection individuals immunized SARS-CoV-2 wild-type-strain-based mRNA vaccines results in a boost BA.4.6, BF.7, BQ.1.1, and BA.2.75 neutralization but does not efficiently BA.2.75.2, XBB, or XBB.1.5 neutralization. In silico analyses showed spike glycoprotein lost most B cell epitopes, especially XBB.1.5. contrast, T epitopes are conserved across variants including responses mRNA-vaccinated, SARS-CoV-2-naive against wild-type strain, BA.1, were comparable, suggesting immunity recent may remain largely unaffected. While some effectively evade immunity, spike-protein-specific due nature polymorphic cell-mediated immune responses, continue contribute prevention/limitation COVID-19 manifestation.

Language: Английский

Citations

23

Using minor variant genomes and machine learning to study the genome biology of SARS-CoV-2 over time DOI Creative Commons
Xiaofeng Dong, David A. Matthews, Giulia Gallo

et al.

Nucleic Acids Research, Journal Year: 2025, Volume and Issue: 53(4)

Published: Feb. 8, 2025

In infected individuals, viruses are present as a population consisting of dominant and minor variant genomes. Most databases contain information on the genome sequence. Since emergence SARS-CoV-2 in late 2019, variants have been selected that more transmissible capable partial immune escape. Currently, models for projecting evolution based using sequences to forecast whether known mutation will be prevalent future. However, novel (and other viruses) driven by evolutionary pressure acting genomes, which then become form potential next wave infection. this study, sequencing data from 96 209 patients, sampled over 3-year period, were used analyse patterns These develop unsupervised machine learning clusters identify amino acids had greater than others Spike protein. Being able may future would better inform design longer-lived medical countermeasures allow risk-based evaluation viral properties, including assessment transmissibility escape, thus providing candidates with early warning signals when new emerges.

Language: Английский

Citations

1