The structural coverage of the human proteome before and after AlphaFold DOI Creative Commons
Eduard Porta‐Pardo, Victoria Ruiz‐Serra, Samuel Valentini

et al.

PLoS Computational Biology, Journal Year: 2022, Volume and Issue: 18(1), P. e1009818 - e1009818

Published: Jan. 24, 2022

The protein structure field is experiencing a revolution. From the increased throughput of techniques to determine experimental structures, developments such as cryo-EM that allow us find structures large complexes or, more recently, development artificial intelligence tools, AlphaFold, can predict with high accuracy folding proteins for which availability homology templates limited. Here we quantify effect recently released AlphaFold database structural models in our knowledge on human proteins. Our results indicate current baseline coverage 48%, considering experimentally-derived or template-based models, elevates up 76% when including predictions. At same time fraction dark proteome reduced from 26% just 10% are considered. Furthermore, although disease-associated genes and mutations was near complete before release (69% Clinvar pathogenic 88% oncogenic mutations), still provide an additional 3% 13% these critically important sets biomedical mutations. Finally, show how contribution non-human organisms, bacteria, significantly larger than proteome. Overall, sequence-structure gap has almost disappeared, outstanding success direct consequences genome derived medical applications.

Language: Английский

Search and sequence analysis tools services from EMBL-EBI in 2022 DOI Creative Commons
Fábio Madeira,

Matt Pearce,

Adrian R. Tivey

et al.

Nucleic Acids Research, Journal Year: 2022, Volume and Issue: 50(W1), P. W276 - W279

Published: March 28, 2022

Abstract The EMBL-EBI search and sequence analysis tools frameworks provide integrated access to EMBL-EBI’s data resources core bioinformatics analytical tools. EBI Search (https://www.ebi.ac.uk/ebisearch) provides a full-text engine across nearly 5 billion entries, while the Job Dispatcher framework (https://www.ebi.ac.uk/services) enables scientific community perform diverse range of using popular applications. Both allow users interact through user-friendly web applications, as well via RESTful SOAP-based APIs. Here, we describe recent improvements these services updates made accommodate increasing requirements during COVID-19 pandemic.

Language: Английский

Citations

1991

Ensembl 2022 DOI Creative Commons
Fiona Cunningham, James E. Allen, Jamie Allen

et al.

Nucleic Acids Research, Journal Year: 2021, Volume and Issue: 50(D1), P. D988 - D995

Published: Oct. 19, 2021

Ensembl (https://www.ensembl.org) is unique in its flexible infrastructure for access to genomic data and annotation. It has been designed efficiently deliver annotation at scale all eukaryotic life, it also provides deep comprehensive key species. Genomes representing a greater diversity of species are increasingly being sequenced. In response, we have focussed our recent efforts on expediting the new assemblies. Here, report release greatest annual number newly annotated genomes history via dedicated Rapid Release platform (http://rapid.ensembl.org). We developed method generate comparative analyses these assemblies and, first time, non-vertebrate eukaryotes. Meanwhile, continually improve, extend update high-value reference vertebrate details here. range specific software tools tasks, such as Variant Effect Predictor (VEP) interface Recoder. All data, freely available download accessible programmatically.

Language: Английский

Citations

1662

Rare variant contribution to human disease in 281,104 UK Biobank exomes DOI Creative Commons
Quanli Wang, Ryan S. Dhindsa, Keren Carss

et al.

Nature, Journal Year: 2021, Volume and Issue: 597(7877), P. 527 - 532

Published: Aug. 10, 2021

Abstract Genome-wide association studies have uncovered thousands of common variants associated with human disease, but the contribution rare to disease remains relatively unexplored. The UK Biobank contains detailed phenotypic data linked medical records for approximately 500,000 participants, offering an unprecedented opportunity evaluate effect variation on a broad collection traits 1,2 . Here we study relationships between protein-coding and 17,361 binary 1,419 quantitative phenotypes using exome sequencing from 269,171 participants European ancestry. Gene-based collapsing analyses revealed 1,703 statistically significant gene–phenotype associations traits, median odds ratio 12.4. Furthermore, 83% these were undetectable via single-variant tests, emphasizing power gene-based analysis in setting high allelic heterogeneity. Gene–phenotype also significantly enriched loss-of-function-mediated approved drug targets. Finally, performed ancestry-specific pan-ancestry 11,933 African, East Asian or South Our results highlight disease. Summary statistics are publicly available through interactive portal ( http://azphewas.com/ ).

Language: Английский

Citations

379

The Human Pangenome Project: a global resource to map genomic diversity DOI Open Access
Ting Wang,

Lucinda Antonacci-Fulton,

Kerstin Howe

et al.

Nature, Journal Year: 2022, Volume and Issue: 604(7906), P. 437 - 446

Published: April 20, 2022

Language: Английский

Citations

355

ReMap 2022: a database of Human, Mouse, Drosophila and Arabidopsis regulatory regions from an integrative analysis of DNA-binding sequencing experiments DOI Creative Commons
Fayrouz Hammal, Pierre de Langen, Aurélie Bergon

et al.

Nucleic Acids Research, Journal Year: 2021, Volume and Issue: 50(D1), P. D316 - D325

Published: Oct. 13, 2021

Abstract ReMap (https://remap.univ-amu.fr) aims to provide manually curated, high-quality catalogs of regulatory regions resulting from a large-scale integrative analysis DNA-binding experiments in Human, Mouse, Fly and Arabidopsis thaliana for hundreds transcription factors regulators. In this 2022 update, we have uniformly processed >11 000 sequencing datasets public sources across four species. The updated Human atlas includes 8103 covering total 1210 transcriptional regulators (TRs) with catalog 182 million (M) peaks, while the reaches 4.8M 423 TRs 694 datasets. Also, release is enriched by two new Mus musculus Drosophila melanogaster. First, Mouse consists 123M peaks 648 as result integration validation 5503 ChIP-seq Second, melanogaster contains 16.6M 550 1205 are browsable through track hubs at UCSC, Ensembl NCBI genome browsers. Finally, comes Cis Regulatory Module identification method, improved quality controls, faster search results, better user experience an interactive tour video tutorials on browsing filtering catalogs.

Language: Английский

Citations

302

OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity DOI Creative Commons
Dmitry Kuznetsov,

Fredrik Tegenfeldt,

Mosè Manni

et al.

Nucleic Acids Research, Journal Year: 2022, Volume and Issue: 51(D1), P. D445 - D451

Published: Oct. 26, 2022

Abstract OrthoDB provides evolutionary and functional annotations of genes in a diverse sampling eukaryotes, prokaryotes, viruses. Genomics continues to accelerate our exploration gene diversity orthology is the most precise way bridging knowledge with rapidly expanding universe genomic sequences. samples organisms best quality genomics data provide leading coverage species diversity. This update underlying over 18 000 prokaryotes almost 2000 eukaryotes 100 million propels another level. achievement also demonstrates scalability OrthoLoger software for delineation orthologs, freely available from https://orthologer.ezlab.org. In addition ab-initio computations used release, allows mapping novel sets precomputed orthologs thereby links their annotations. The LEMMI-style benchmarking ensures its state-of-the-art performance https://lemortho.ezlab.org. web interface has been further developed include pairwise view any other sampled species. OrthoDB-computed as well extensively collated can be accessed via REST API or SPARQL/RDF, downloaded browsed online https://www.orthodb.org.

Language: Английский

Citations

292

Ensembl 2024 DOI Creative Commons
Peter W. Harrison,

M Ridwan Amode,

Olanrewaju Austine-Orimoloye

et al.

Nucleic Acids Research, Journal Year: 2023, Volume and Issue: 52(D1), P. D891 - D899

Published: Nov. 11, 2023

Abstract Ensembl (https://www.ensembl.org) is a freely available genomic resource that has produced high-quality annotations, tools, and services for vertebrates model organisms more than two decades. In recent years, there been dramatic shift in the landscape, with large increase number phylogenetic breadth of reference genomes, alongside major advances pan-genome representations higher species. order to support these efforts accelerate downstream research, continues focus on scaling rapid annotation new genome assemblies, developing methods comparative analysis, expanding depth quality our annotations. This year we have continued expansion global biodiversity doubling annotated genomes Rapid Release site over 1700, driven by close collaboration projects such as Darwin Tree Life. We also strengthened key agricultural species, including first regulatory builds farmed animals, updated tools resources scientific community, notably Variant Effect Predictor. data, software, are available.

Language: Английский

Citations

283

Ensembl Genomes 2022: an expanding genome resource for non-vertebrates DOI Creative Commons
Andrew Yates, James E. Allen,

Ridwan Amode

et al.

Nucleic Acids Research, Journal Year: 2021, Volume and Issue: 50(D1), P. D996 - D1003

Published: Nov. 10, 2021

Abstract Ensembl Genomes (https://www.ensemblgenomes.org) provides access to non-vertebrate genomes and analysis complementing vertebrate resources developed by the project (https://www.ensembl.org). The two collectively present genome annotation through a consistent set of interfaces spanning tree life presenting sequence, annotation, variation, transcriptomic data comparative analysis. Here, we our largest increase in plant, metazoan fungal since project's inception creating one world's most comprehensive genomic describe efforts reduce redundancy Bacteria portal. We detail new gene emerging support for pangenome analysis, accelerate dissemination Rapid Release resource AlphaFold visualization. Finally, details future plans including updates on integration with Ensembl, how plan improve microbial research community. Software are made available without restriction via website, online tools platform programmatic (available under an Apache 2.0 license). Data synchronised Ensembl's release cycle.

Language: Английский

Citations

274

cGAS–STING drives the IL-6-dependent survival of chromosomally instable cancers DOI
Christy Hong, Michaël Schubert, Andréa E. Tijhuis

et al.

Nature, Journal Year: 2022, Volume and Issue: 607(7918), P. 366 - 373

Published: June 15, 2022

Language: Английский

Citations

240

The Earth BioGenome Project 2020: Starting the clock DOI Creative Commons
Harris A. Lewin, Stephen Richards, Erez Lieberman Aiden

et al.

Proceedings of the National Academy of Sciences, Journal Year: 2022, Volume and Issue: 119(4)

Published: Jan. 18, 2022

November 2020 marked 2 y since the launch of Earth BioGenome Project (EBP), which aims to sequence all known eukaryotic species in a 10-y timeframe. Since then, significant progress has been made across aspects EBP roadmap, as outlined 2018 article describing project’s goals, strategies, and challenges (1). The phase ended clock started on reaching EBP’s major milestones. This Special Feature explores many facets EBP, including review progress, description scientific exemplar projects, ethical legal social issues, applications biodiversity genomics. In this Introduction, we summarize current status held virtually October 5 9, 2020, recent updates through February 2021. References nine Perspective articles included are cited guide reader toward deeper understanding goals facing EBP. It is urgent that move forward. year global failure meeting any 20 “Aichi goals” for preservation wildlife ecosystems (2). International Union Conservation Nature now counts more than 35,000 (28%) surveyed plants animals threatened with extinction (3). may lose 50% its by end century if nothing done mitigate anthropogenic factors drive destroy health sustain human existence Degradation aquatic terrestrial continued unabated, soon face possibility massive ecosystem collapse scale. Such would have an enormous impact not only biodiversity, but also political stability, might ultimately affect survival … [↵][1]1To whom correspondence be addressed. Email: lewin{at}ucdavis.edu. [1]: #xref-corresp-1-1

Language: Английский

Citations

239