Tree of motility – A proposed history of motility systems in the tree of life DOI Creative Commons
Makoto Miyata, Robert Robinson, Taro Q.P. Uyeda

et al.

Genes to Cells, Journal Year: 2020, Volume and Issue: 25(1), P. 6 - 21

Published: Jan. 1, 2020

Abstract Motility often plays a decisive role in the survival of species. Five systems motility have been studied depth: those propelled by bacterial flagella, eukaryotic actin polymerization and motor proteins myosin, kinesin dynein. However, many organisms exhibit surprisingly diverse motilities, advances genomics, molecular biology imaging showed that motilities inherently independent mechanisms. This makes defining breadth nontrivial, because novel may be driven unknown Here, we classify known based on unique classes movement‐producing protein architectures. Based this criterion, current total stands at 18 types. In perspective, discuss these modes relative to latest phylogenetic Tree Life propose history motility. During ~4 billion years since emergence life, arose Bacteria with flagella pili, Archaea archaella. Newer became possible Eukarya changes cell envelope. Presence or absence peptidoglycan layer, acquisition robust membrane dynamics, enlargement cells environmental opportunities likely provided context for (co)evolution types

Language: Английский

Evaluating Protein Transfer Learning with TAPE DOI Open Access
Roshan Rao, Nicholas Bhattacharya, Neil Thomas

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2019, Volume and Issue: unknown

Published: June 20, 2019

Abstract Protein modeling is an increasingly popular area of machine learning research. Semi-supervised has emerged as important paradigm in protein due to the high cost acquiring supervised labels, but current literature fragmented when it comes datasets and standardized evaluation techniques. To facilitate progress this field, we introduce Tasks Assessing Embeddings (TAPE), a set five biologically relevant semi-supervised tasks spread across different domains biology. We curate into specific training, validation, test splits ensure that each task tests generalization transfers real-life scenarios. bench-mark range approaches representation learning, which span recent work well canonical sequence find self-supervised pretraining helpful for almost all models on tasks, more than doubling performance some cases. Despite increase, several cases features learned by still lag behind extracted state-of-the-art non-neural This gap suggests huge opportunity innovative architecture design improved paradigms better capture signal biological sequences. TAPE will help community focus effort scientifically problems. Toward end, data code used run these experiments are available at https://github.com/songlab-cal/tape .

Language: Английский

Citations

469

The Evolution and Ecology of Bacterial Warfare DOI Creative Commons
Elisa T. Granato, Thomas A. Meiller-Legrand, Kevin R. Foster

et al.

Current Biology, Journal Year: 2019, Volume and Issue: 29(11), P. R521 - R537

Published: June 1, 2019

Language: Английский

Citations

457

CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning DOI
Alex Chklovski, Donovan H. Parks, Ben J. Woodcroft

et al.

Nature Methods, Journal Year: 2023, Volume and Issue: 20(8), P. 1203 - 1212

Published: July 27, 2023

Language: Английский

Citations

418

Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT DOI Creative Commons
F. A. Bastiaan von Meijenfeldt,

Ksenia Arkhipova,

Diego D. Cambuy

et al.

Genome biology, Journal Year: 2019, Volume and Issue: 20(1)

Published: Oct. 22, 2019

Current-day metagenomics analyses increasingly involve de novo taxonomic classification of long DNA sequences and metagenome-assembled genomes. Here, we show that the conventional best-hit approach often leads to classifications are too specific, especially when represent novel deep lineages. We present a method integrates multiple signals classify (Contig Annotation Tool, CAT) genomes (Bin BAT). Classifications automatically made at low ranks if closely related organisms in reference database higher otherwise. The result is high precision even for from considerably unknown organisms.

Language: Английский

Citations

411

Innovations to culturing the uncultured microbial majority DOI
William H. Lewis,

Guillaume Tahon,

Patricia Geesink

et al.

Nature Reviews Microbiology, Journal Year: 2020, Volume and Issue: 19(4), P. 225 - 240

Published: Oct. 22, 2020

Language: Английский

Citations

404

Biosynthetic capacity, metabolic variety and unusual biology in the CPR and DPANN radiations DOI
Cindy J. Castelle, Christopher T. Brown, Karthik Anantharaman

et al.

Nature Reviews Microbiology, Journal Year: 2018, Volume and Issue: 16(10), P. 629 - 645

Published: Sept. 4, 2018

Language: Английский

Citations

376

Microbial diversity in extreme environments DOI

Wensheng Shu,

Li‐Nan Huang

Nature Reviews Microbiology, Journal Year: 2021, Volume and Issue: 20(4), P. 219 - 235

Published: Nov. 9, 2021

Language: Английский

Citations

359

Small and mighty: adaptation of superphylum Patescibacteria to groundwater environment drives their genome simplicity DOI Creative Commons
Renmao Tian, Daliang Ning, Zhili He

et al.

Microbiome, Journal Year: 2020, Volume and Issue: 8(1)

Published: April 6, 2020

Abstract Background The newly defined superphylum Patescibacteria such as Parcubacteria (OD1) and Microgenomates (OP11) has been found to be prevalent in groundwater, sediment, lake, other aquifer environments. Recently increasing attention paid this diverse including > 20 candidate phyla (a large part of the phylum radiation, CPR) because it refreshed our view tree life. However, adaptive traits contributing its prevalence are still not well known. Results Here, we investigated genomic features metabolic pathways groundwater through genome-resolved metagenomics analysis 600 Gbp sequence data. We observed that, while members have reduced genomes (~ 1 Mbp) exclusively, functions essential growth reproduction genetic information processing were retained. Surprisingly, they sharply redundant nonessential functions, specific activities stress response systems. ultra-small cells simplified membrane structures, flagellar assembly, transporters, two-component Despite lack CRISPR viral defense, bacteria may evade predation deletion common phage receptors alternative strategies, which explain low representation prophage proteins their CRISPR. By establishing linkages between bacterial environmental conditions, results provide important insights into evolution CPR group. Conclusions that streamlined many acquiring advantages avoiding invasion, adapt environment. unique small genome size, cell lacking lineage bringing new understandings on life Bacteria. Our mechanisms for adaptation environments, demonstrate a case where less is more, mighty.

Language: Английский

Citations

314

Evaluating Protein Transfer Learning with TAPE DOI Creative Commons
Roshan Rao, Nicholas Bhattacharya, Neil Thomas

et al.

arXiv (Cornell University), Journal Year: 2019, Volume and Issue: unknown

Published: Jan. 1, 2019

Machine learning applied to protein sequences is an increasingly popular area of research. Semi-supervised for proteins has emerged as important paradigm due the high cost acquiring supervised labels, but current literature fragmented when it comes datasets and standardized evaluation techniques. To facilitate progress in this field, we introduce Tasks Assessing Protein Embeddings (TAPE), a set five biologically relevant semi-supervised tasks spread across different domains biology. We curate into specific training, validation, test splits ensure that each task tests generalization transfers real-life scenarios. benchmark range approaches representation learning, which span recent work well canonical sequence find self-supervised pretraining helpful almost all models on tasks, more than doubling performance some cases. Despite increase, several cases features learned by still lag behind extracted state-of-the-art non-neural This gap suggests huge opportunity innovative architecture design improved modeling paradigms better capture signal biological sequences. TAPE will help machine community focus effort scientifically problems. Toward end, data code used run these experiments are available at https://github.com/songlab-cal/tape.

Language: Английский

Citations

312

Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea DOI Creative Commons
Qiyun Zhu, Uyen Mai,

Wayne Pfeiffer

et al.

Nature Communications, Journal Year: 2019, Volume and Issue: 10(1)

Published: Dec. 2, 2019

Rapid growth of genome data provides opportunities for updating microbial evolutionary relationships, but this is challenged by the discordant evolution individual genes. Here we build a reference phylogeny 10,575 evenly-sampled bacterial and archaeal genomes, based on comprehensive set 381 markers, using multiple strategies. Our trees indicate remarkably closer proximity between Archaea Bacteria than previous estimates that were limited to fewer "core" genes, such as ribosomal proteins. The robustness results was tested with respect several variables, including taxon site sampling, amino acid substitution heterogeneity saturation, non-vertical evolution, impact exclusion candidate phyla radiation (CPR) taxa. provide an updated view domain-level relationships.

Language: Английский

Citations

294