Glycoinformatics in the Artificial Intelligence Era DOI Creative Commons
Daniel Bojar, Frédérique Lisacek

Chemical Reviews, Journal Year: 2022, Volume and Issue: 122(20), P. 15971 - 15988

Published: Aug. 12, 2022

Artificial intelligence (AI) methods have been and are now being increasingly integrated in prediction software implemented bioinformatics its glycoscience branch known as glycoinformatics. AI techniques evolved the past decades, their applications not yet widespread. This limited use is partly explained by peculiarities of glyco-data that notoriously hard to produce analyze. Nonetheless, time goes, accumulation glycomics, glycoproteomics, glycan-binding data has reached a point where even most recent deep learning can provide predictors with good performance. We discuss historical development application various broader field A particular focus placed on shining light challenges handling, contextualized lessons learnt from related disciplines. Ending discussion state-of-the-art approaches glycoinformatics, we also envision future including need occur order truly unleash capabilities systems biology era.

Language: Английский

IMG/VR v4: an expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata DOI Creative Commons
Antônio Pedro Camargo, Stephen Nayfach, I-Min A. Chen

et al.

Nucleic Acids Research, Journal Year: 2022, Volume and Issue: 51(D1), P. D733 - D743

Published: Nov. 18, 2022

Viruses are widely recognized as critical members of all microbiomes. Metagenomics enables large-scale exploration the global virosphere, progressively revealing extensive genomic diversity viruses on Earth and highlighting myriad ways by which impact biological processes. IMG/VR provides access to largest collection viral sequences obtained from (meta)genomes, along with functional annotation rich metadata. A web interface users efficiently browse search based genome features and/or sequence similarity. Here, we present fourth version IMG/VR, composed >15 million virus genomes fragments, a ≈6-fold increase in size compared previous version. These clustered into 8.7 operational taxonomic units, including 231 408 at least one high-quality representative. Viral now systematically identified genomes, metagenomes, metatranscriptomes using new detection approach (geNomad), IMG standard complemented quality estimation CheckV, classification reflecting latest standards, microbial host taxonomy prediction. v4 is available https://img.jgi.doe.gov/vr, underlying data download https://genome.jgi.doe.gov/portal/IMG_VR.

Language: Английский

Citations

241

Expansion of the global RNA virome reveals diverse clades of bacteriophages DOI Creative Commons
Uri Neri, Yuri I. Wolf, Simon Roux

et al.

Cell, Journal Year: 2022, Volume and Issue: 185(21), P. 4023 - 4037.e18

Published: Sept. 28, 2022

High-throughput RNA sequencing offers broad opportunities to explore the Earth virome. Mining 5,150 diverse metatranscriptomes uncovered >2.5 million virus contigs. Analysis of >330,000 RNA-dependent polymerases (RdRPs) shows that this expansion corresponds a 5-fold increase known diversity. Gene content analysis revealed multiple protein domains previously not found in viruses and implicated virus-host interactions. Extended RdRP phylogeny supports monophyly five established phyla reveals two putative additional bacteriophage numerous classes orders. The dramatically expanded phylum Lenarviricota, consisting bacterial related eukaryotic viruses, now accounts for third Identification CRISPR spacer matches bacteriolytic proteins suggests subsets picobirnaviruses partitiviruses, associated with eukaryotes, infect prokaryotic hosts.

Language: Английский

Citations

206

Identification of mobile genetic elements with geNomad DOI Creative Commons
Antônio Pedro Camargo, Simon Roux, Frederik Schulz

et al.

Nature Biotechnology, Journal Year: 2023, Volume and Issue: 42(8), P. 1303 - 1312

Published: Sept. 21, 2023

Identifying and characterizing mobile genetic elements in sequencing data is essential for understanding their diversity, ecology, biotechnological applications impact on public health. Here we introduce geNomad, a classification annotation framework that combines information from gene content deep neural network to identify sequences of plasmids viruses. geNomad uses dataset more than 200,000 marker protein profiles provide functional taxonomic assignment viral genomes. Using conditional random field model, also detects proviruses integrated into host genomes with high precision. In benchmarks, achieved performance diverse viruses (Matthews correlation coefficient 77.8% 95.3%, respectively), substantially outperforming other tools. Leveraging geNomad's speed scalability, processed over 2.7 trillion base pairs data, leading the discovery millions are available through IMG/VR IMG/PR databases. at https://portal.nersc.gov/genomad .

Language: Английский

Citations

202

Database resources of the National Center for Biotechnology Information DOI Creative Commons
Eric W Sayers, Jeffrey Beck, Evan Bolton

et al.

Nucleic Acids Research, Journal Year: 2023, Volume and Issue: 52(D1), P. D33 - D43

Published: Nov. 22, 2023

Abstract The National Center for Biotechnology Information (NCBI) provides online information resources biology, including the GenBank® nucleic acid sequence database and PubMed® of citations abstracts published in life science journals. NCBI search retrieval operations most these data from 35 distinct databases. E-utilities serve as programming interface Resources receiving significant updates past year include PubMed, PMC, Bookshelf, SciENcv, NIH Comparative Genomics Resource (CGR), Virus, SRA, RefSeq, foreign contamination screening tools, Taxonomy, iCn3D, ClinVar, GTR, MedGen, dbSNP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, PubChem. These can be accessed through home page at https://www.ncbi.nlm.nih.gov.

Language: Английский

Citations

145

Metagenomic Data Assembly – The Way of Decoding Unknown Microorganisms DOI Creative Commons
Alla Lapidus, Anton Korobeynikov

Frontiers in Microbiology, Journal Year: 2021, Volume and Issue: 12

Published: March 23, 2021

Metagenomics is a segment of conventional microbial genomics dedicated to the sequencing and analysis combined genomic DNA entire environmental samples. The most critical step metagenomic data reconstruction individual genes genomes microorganisms in communities using assemblers – computational programs that put together small fragments sequenced generated by instruments. Here, we describe challenges assembly, wide spectrum applications which assemblies were used better understand ecology evolution ecosystems, present one efficient assemblers, SPAdes was upgraded become applicable for metagenomics.

Language: Английский

Citations

107

Twenty-five years of Genomes OnLine Database (GOLD): data updates and new features in v.9 DOI Creative Commons
Supratim Mukherjee,

Dimitri Stamatis,

Cindy Tianqing Li

et al.

Nucleic Acids Research, Journal Year: 2022, Volume and Issue: 51(D1), P. D957 - D963

Published: Oct. 16, 2022

Abstract The Genomes OnLine Database (GOLD) (https://gold.jgi.doe.gov/) at the Department of Energy Joint Genome Institute (DOE-JGI) continues to maintain its role as one flagship genomic metadata repositories world. ever-increasing number projects and are freely available user community world-wide. GOLD’s is consumed by scientists remains an important source for large-scale comparative genomics analysis initiatives. Encouraged this active engagement growth, GOLD has continued add new components capabilities. features such a public Application Programming Interface (API) Ecosystem landing page well growth different entities in current v.9 edition described detail manuscript.

Language: Английский

Citations

73

RdRp-scan: A bioinformatic resource to identify and annotate divergent RNA viruses in metagenomic sequence data DOI Creative Commons
Justine Charon, Jan P. Buchmann, Sabrina Sadiq

et al.

Virus Evolution, Journal Year: 2022, Volume and Issue: 8(2)

Published: July 1, 2022

Despite a rapid expansion in the number of documented viruses following advent metagenomic sequencing, identification and annotation highly divergent RNA remain challenging, particularly from poorly characterized hosts environmental samples. Protein structures are more conserved than primary sequence data, such that structure-based comparisons provide an opportunity to reveal viral 'dusk matter': sequences with low, but detectable, levels identity known available protein structures. Here, we present new open computational resource-RdRp-scan-that contains standardized bioinformatic toolkit identify annotate data based on detection RNA-dependent polymerase (RdRp) sequences. By combining RdRp-specific hidden Markov models (HMMs) structural comparisons, show RdRp-scan can efficiently detect RdRp as low 10 per cent those not identifiable using standard sequence-to-sequence comparisons. In addition, facilitate placement newly detected virus-like into diversity viruses, provides custom curated databases core motifs, well pre-built multiple alignments. parallel, our analysis by revealed while most taxonomically unassigned RdRps fell pre-established clusters, some potentially orders related Wolframvirales Tolivirales. Finally, survey A, B, C motifs within database additional variations both position might insights structure, function, evolution polymerases.

Language: Английский

Citations

72

Mining metatranscriptomes reveals a vast world of viroid-like circular RNAs DOI Creative Commons
Benjamin D. Lee, Uri Neri, Simon Roux

et al.

Cell, Journal Year: 2023, Volume and Issue: 186(3), P. 646 - 661.e4

Published: Jan. 24, 2023

Viroids and viroid-like covalently closed circular (ccc) RNAs are minimal replicators that typically encode no proteins hijack cellular enzymes for replication. The extent diversity of agents poorly understood. We developed a computational pipeline to identify cccRNAs applied it 5,131 metatranscriptomes 1,344 plant transcriptomes. search yielded 11,378 spanning 4,409 species-level clusters, 5-fold increase compared the previously identified elements. Within this diverse collection, we discovered numerous putative viroids, satellite RNAs, retrozymes, ribozy-like viruses. Diverse ribozyme combinations unusual ribozymes within were identified. Self-cleaving in ambiviruses, some mito-like viruses capsid-encoding virus-like cccRNAs. broad presence transcriptomes ecosystems implies their host range is far broader than currently known, matches CRISPR spacers suggest replicate prokaryotes.

Language: Английский

Citations

65

ElasticBLAST: accelerating sequence search via cloud computing DOI Creative Commons
Christiam Camacho,

Grzegorz M. Boratyn,

Victor Joukov

et al.

BMC Bioinformatics, Journal Year: 2023, Volume and Issue: 24(1)

Published: March 26, 2023

Biomedical researchers use alignments produced by BLAST (Basic Local Alignment Search Tool) to categorize their query sequences. Producing such is an essential bioinformatics task that well suited for the cloud. The cloud can perform many calculations quickly as store and access large volumes of data. Bioinformaticians also it collaborate with other researchers, sharing results, datasets even pipelines on a common platform.We present ElasticBLAST, native application in ElasticBLAST handle anywhere from few thousands queries run searches virtual CPUs (if desired), deleting resources when done. It uses tools orchestration request discounted instances, lowering costs users. supported Amazon Web Services Google Cloud Platform. search databases are user provided or National Center Biotechnology Information.We show useful efficiently cloud, demonstrating two examples. At same time, hides much complexity working threshold move work

Language: Английский

Citations

65

Host traits shape virome composition and virus transmission in wild small mammals DOI Creative Commons
Yanmei Chen,

Shu-Jian Hu,

Xian‐Dan Lin

et al.

Cell, Journal Year: 2023, Volume and Issue: 186(21), P. 4662 - 4675.e12

Published: Sept. 20, 2023

Language: Английский

Citations

62