Investigating species boundaries in Colletotrichum DOI
Chitrabhanu S. Bhunjun, Chayanard Phukhamsakda, Ruvishika S. Jayawardena

и другие.

Fungal Diversity, Год журнала: 2021, Номер 107(1), С. 107 - 127

Опубликована: Март 1, 2021

Язык: Английский

Drivers and dynamics of a massive adaptive radiation in cichlid fishes DOI
Fabrizia Ronco, Michael Matschiner, Astrid Böhne

и другие.

Nature, Год журнала: 2020, Номер 589(7840), С. 76 - 81

Опубликована: Ноя. 18, 2020

Язык: Английский

Процитировано

276

The origins and spread of domestic horses from the Western Eurasian steppes DOI Creative Commons
Pablo Librado, Naveed Khan, Antoine Fages

и другие.

Nature, Год журнала: 2021, Номер 598(7882), С. 634 - 640

Опубликована: Окт. 20, 2021

Abstract Domestication of horses fundamentally transformed long-range mobility and warfare 1 . However, modern domesticated breeds do not descend from the earliest domestic horse lineage associated with archaeological evidence bridling, milking corralling 2–4 at Botai, Central Asia around 3500 bc 3 Other longstanding candidate regions for domestication, such as Iberia 5 Anatolia 6 , have also recently been challenged. Thus, genetic, geographic temporal origins remained unknown. Here we pinpoint Western Eurasian steppes, especially lower Volga-Don region, homeland horses. Furthermore, map population changes accompanying domestication 273 ancient genomes. This reveals that ultimately replaced almost all other local populations they expanded rapidly across Eurasia about 2000 synchronously equestrian material culture, including Sintashta spoke-wheeled chariots. We find equestrianism involved strong selection critical locomotor behavioural adaptations GSDMC ZFPM1 genes. Our results reject commonly held association 7 between horseback riding massive expansion Yamnaya steppe pastoralists into Europe 3000 8,9 driving spread Indo-European languages 10 contrasts scenario in where Indo-Iranian languages, chariots together, following early second millennium culture 11,12

Язык: Английский

Процитировано

241

Widespread introgression across a phylogeny of 155 Drosophila genomes DOI Creative Commons
Anton Suvorov, Bernard Kim, Jeremy Wang

и другие.

Current Biology, Год журнала: 2021, Номер 32(1), С. 111 - 123.e5

Опубликована: Ноя. 16, 2021

Язык: Английский

Процитировано

229

Epigenetic patterns in a complete human genome DOI
Ariel Gershman, Michael Sauria, Xavi Guitart

и другие.

Science, Год журнала: 2022, Номер 376(6588)

Опубликована: Март 31, 2022

The completion of a telomere-to-telomere human reference genome, T2T-CHM13, has resolved complex regions the including repetitive and homologous regions. Here, we present high-resolution epigenetic study previously unresolved sequences, representing entire acrocentric chromosome short arms, gene family expansions, diverse collection repeat classes. This resource precisely maps CpG methylation (32.28 million CpGs), DNA accessibility, short-read datasets (166,058 chromatin immunoprecipitation sequencing peaks) to provide evidence activity across unidentified or corrected genes reveals clinically relevant paralog-specific regulation. Probing centromeres from six individuals generated an estimate variability in kinetochore localization. analysis provides framework with which investigate most elusive granting insights into

Язык: Английский

Процитировано

217

Evolutionary and structural analyses of SARS-CoV-2 D614G spike protein mutation now documented worldwide DOI Creative Commons
Sandra Isabel, Lucía Graña-Miraglia, Jahir M. Gutierrez

и другие.

Scientific Reports, Год журнала: 2020, Номер 10(1)

Опубликована: Авг. 20, 2020

The COVID-19 pandemic, caused by the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), was declared on March 11, 2020 World Health Organization. As of 31st May, 2020, there have been more than 6 million cases diagnosed worldwide and over 370,000 deaths, according to Johns Hopkins. Thousands SARS-CoV-2 strains sequenced date, providing a valuable opportunity investigate evolution virus global scale. We performed phylogenetic analysis 1,225 genomes spanning from late December 2019 mid-March 2020. identified missense mutation, D614G, in spike protein SARS-CoV-2, which has emerged as predominant clade Europe (954 1,449 (66%) sequences) is spreading (1,237 2,795 (44%) sequences). Molecular dating estimated emergence this around mid-to-late January (10-25 January) also applied structural bioinformatics assess potential impact D614G virulence epidemiology SARS-CoV-2. In silico analyses structure suggests that mutation most likely neutral function it relates its interaction with human ACE2 receptor. lack clinical metadata available prevented our investigation association between viral disease severity phenotype. Future work can leverage outcome data both genomic diversity needed monitor pandemic.

Язык: Английский

Процитировано

200

Nucleocapsid mutations R203K/G204R increase the infectivity, fitness, and virulence of SARS-CoV-2 DOI Creative Commons
Haibo Wu, Na Xing, Kaiwen Meng

и другие.

Cell Host & Microbe, Год журнала: 2021, Номер 29(12), С. 1788 - 1801.e6

Опубликована: Ноя. 13, 2021

Previous work found that the co-occurring mutations R203K/G204R on SARS-CoV-2 nucleocapsid (N) protein are increasing in frequency among emerging variants of concern or interest. Through a combination silico analyses, this study demonstrates adaptive, while large-scale phylogenetic analyses indicate associate with emergence high-transmissibility lineage B.1.1.7. Competition experiments suggest 203K/204R possess replication advantage over preceding R203/G204 variants, possibly related to ribonucleocapsid (RNP) assembly. Moreover, virus shows increased infectivity human lung cells and hamsters. Accordingly, we observe positive association between COVID-19 severity sample 203K/204R. Our suggests contribute transmission virulence select variants. In addition spike protein, important for viral spreading during pandemic.

Язык: Английский

Процитировано

195

Tracking the COVID-19 pandemic in Australia using genomics DOI Creative Commons
Torsten Seemann, Courtney R. Lane, Norelle L. Sherry

и другие.

Nature Communications, Год журнала: 2020, Номер 11(1)

Опубликована: Сен. 1, 2020

Abstract Genomic sequencing has significant potential to inform public health management for SARS-CoV-2. Here we report high-throughput genomics SARS-CoV-2, 80% of cases in Victoria, Australia (population 6.24 million) between 6 January and 14 April 2020 (total 1,333 COVID-19 cases). We integrate epidemiological, genomic phylodynamic data identify clusters impact interventions. The global diversity SARS-CoV-2 is represented, consistent with multiple importations. Seventy-six distinct were identified, including large associated social venues, healthcare cruise ships. Sequencing sequential samples from 98 patients reveals minimal intra-patient diversity. Phylodynamic modelling indicates a reduction the effective viral reproductive number ( R e ) 1.63 0.48 after implementing travel restrictions physical distancing. Our provide concrete framework use responses, its rapidly transmission chains, increasingly important as ease globally.

Язык: Английский

Процитировано

188

Phylogenomics — principles, opportunities and pitfalls of big‐data phylogenetics DOI Open Access
Andrew D. Young, Jéssica P. Gillung

Systematic Entomology, Год журнала: 2019, Номер 45(2), С. 225 - 247

Опубликована: Дек. 16, 2019

Phylogenetics is the science of reconstructing evolutionary history life on Earth. Traditionally, phylogenies were constructed using morphological data only, but introduction Sanger sequencing and PCR in late 1970s enabled genetic information to be incorporated into phylogenetic analyses. Early studies employing multilocus analyses contributed greatly our knowledge challenged some well-established views relationships among many groups plants animals. Since publication these pioneering studies, significant methodological advances both analytical techniques have been made, molecular are now broadly accepted represent robust hypotheses organismal relationships. Next-generation techniques, developed mid-2000s, revolutionized DNA led a dramatic reduction cost per nucleotide sharp increase generation speed. As result, unprecedented amounts sequence for model nonmodel organisms has become affordable. This development transformed field phylogenetics phylogenomics—where genome-scale obtained from multiple samples at once much reduced (Mardis, 2011). The phylogenomic pipeline can very complex, presenting an overwhelming array methodologies available acquisition, manipulation, analysis interpretation massive datasets. Researchers also overcome challenges strategy design, identification orthologous loci, selection phylogeny estimation. particularly daunting researchers new field—both students established scientists—who wish delve novel methods reconstruct evolution their study group. Here we present entry-level overview theory tools that central phylogenomics, with emphasis appropriate application useful genomic data. We focus technologies statistical estimation, software implementing large discuss tradeoffs improving accuracy analyses, including biological sources systematic error Finally, provide glossary commonly encountered terms used phylogenomics may those entering hoping sort through multitude methods, terminology inherent this relatively new, rapidly advancing field. word 'phylogenomics' was first introduced context prediction gene function (Eisen, 1998), soon after inference (O'Brien & Stanyon, 1999). discipline owes its existence made technology over past two decades (Metzker, 2010). It comprises several areas research interface between biology major goals: (i) infer taxa gain insights mechanisms evolution; (ii) use multispecies comparisons putative functions or protein sequences. Traditional include few loci therefore limited by stochastic sampling error. there small number phylogenetically informative characters one genes, random 'noise' influences backbone nodes, potentially leading poorly resolved supported trees. problem addressed successfully larger Modern which take advantage hundreds thousands across genome, are, average, orders magnitude than traditional size datasets significantly reduces impact availability as limiting factor, offering great promise resolving historically recalcitrant nodes tree life. High-throughput [also called next-generation (NGS)] (Fig. 1) yielded immense quantities. differ fundamentally method they allow massively parallel sequencing, providing extremely high throughput simultaneously Millions billions nucleotides sequenced parallel, yielding more minimizing need fragment-cloning 1). Recent progress NGS rapid bioinformatics any generate sequences interest. whole-genome (Lam, 2012), whole-transcriptome shotgun (also RNA RNA-seq, transcriptomics; Wang, 2009), whole-exome (Rabbani, 2014), reduced-representation genome target enrichment) (e.g., Faircloth, 2012; Lemmon, 2012). Table 1 summarizes most phylogenomics. For details different see Beginner's Handbook Next Generation Sequencing Genohub (https://genohub.com/next-generation-sequencing-handbook/) (see Ambardar, 2016; Besser et al., 2018, references therein). Choosing important effects downstream workflows, especially read length, library preparation (e.g. ultraconserved elements anchored hybrid enrichment, discussed later) requires step. Strict experimental reproducibility integral—albeit uncommon—aspect sciences, mainly due varied technical implementation curation procedures. Despite importance fields biology, experiments low, estimated 60% published being 'lost science' unavailability underlying (Magee, 2014). Published difficult impossible replicate expand upon, utilized software, versions, parameters, dependencies operating system versions challenging uncover recreate. promotion open reproducible create productive responsible scientific culture enabling build upon previous continuously address complex questions. philosophy encompasses sharing code produce analysis, well archiving all raw (Mork, 2015; Shade Teal, 2015). Data provenance, recording input transformation key issue reproducibility. Several recommendations guidelines promote best practices management proposed (Cranston 2014; Magee, Debiasse Ryan, 2019), ensuring provenance Dunn, 2013; Oakley, Szitenberg, To ensure bioinformatics, it vital checkpoints enforced—places workflow devoted scrutinizing integrity, so results validated iterations consistency results. Additionally, adopting iterative, branching systematically explore space crucial. Linear methodology, computational procedures lined up other, presented rarely reality analysis. Instead, estimating trees often not messy enterprise, exploration recommended order select pipelines answer question hand. good particular study, highly comprehensive notes kept throughout process. In particular, keeping 'readme' file every step helpful track used, parameter values utilized, goals each how relate indication format changes. All contribute standardization ease efforts (Shade Phylogenomic precious resource: alignments expensive generate, seemingly infinite potential synthesis reuse. phylogeneticists faced combination algorithms, models manipulation techniques. issue, here flowchart containing steps 2). meant exhaustive, merely visualization recent studies. Taxon extreme inference, increased taxa—coupled loci—is advocated solution Ideally, should same pace, high-throughput caused increases far outpace taxon sampling. greater evidence regarding emerge, placement lineages within clades change dramatically. thus influence (Rosenberg Kumar, 2003; Nabhan Sarkar, specific place early study. 'Sufficient' always dependent questions addressed. unravel entire taxonomic unit, most, if all, subordinate unit sampled. Even though increasing demonstrated denser improves (Heath 2008). sampling, however, choice. Transcriptomics, instance, specimens collected stored directly liquid nitrogen RNAlater, whereas other exome will require molecular-grade specimens, preferably preserved high-grade ethyl ethanol laboratory ultrafreezer. A notable exception enrichment (UCEs), old, pinned insect museum (Blaimer, 2016). Genome-scale projects vulnerable nonproportional dataset increases, does accumulation nonrandom accompanying nonphylogenetic signal (Jeffroy, 2006). Bayesian macroevolutionary patterns—including divergence-time ancestral state reconstruction, diversification rate estimation—assume proportional clade, deviations lead biases (Stadler, 2009). However, implementations enable 'corrections' uneven revbayes implements corrections birth-death various models, except fossilized birth-death). Before worth evaluating previously resources. National Center Biotechnology Information's Sequence Read Archive (NCBI SRA) contains user-uploaded alignment (Leinonen, Other resources FlyBase (Thurmond, database Drosophila genes genomes, WormBase (https://www.wormbase.org), Caenorhabditis elegans related nematodes, UCSC Genome Browser (Kent 2002), repository mostly vertebrate genomes. Utilizing databases save money and/or ongoing projects. Moreau (2014), offers detailed description extraction either commercial kits phenol/chloroform protocols. After extraction, deposited publicly accessible collections association unique identifier, publications utilizing specimen metadata (including collector, date collection, geographic origin). Vouchering identifiers (alphanumeric number) crucial Therefore, nondestructive partially destructive whenever possible, cases, extracted itself becomes voucher. By contrast, when such transcriptomic small-bodied organisms, photographic voucher associated Moreover, destroyed part sample conspecifics communal social insects), another serve voucher, provided clear specimen. Properly vouchering alleviating issues identity unstable taxonomy (Pleijel 2008; Turney, Although increasingly cost-efficient years, widely simply amount unambiguously resolve life, inadequacy. Appropriate locus markers branches depths still incipient. Questions remain about whether coding noncoding data, conserved variable long short (Betancur-R. Edwards Chen 2017). critical decisions project decision must priori result types sequenced. Different own characteristics, advantages, limitations, cost-effectiveness, use, quality required, filtering workflow. (Table 2) subdivided sequencing. Shotgun process fragmented random, returning depending depth achieved, uses bidirectional probes (analogous primers sequencing) recover only regions Popular skimming, transcriptome (i.e. RNA-seq). (AHE) (Lemmon, 2012) UCEs (McCormack Faircloth [see Mandel (2014) alternative family Compositae]. These reviewed briefly 2 covered detail elsewhere Lemmon McCormack, Wen Zhang 2019). 3) involves fragmenting template pieces, then randomly obtain reads. Next, overlap reads assemble them longer contig. RNA-seq considered special form where whole mRNA reverse-transcribed reverse-complement DNA, depth, average times individual base sequenced, concept Because multiple-copy mitochondrial, ribosomal, plastid DNA) frequently single-copy regions. shallow fragments sufficient quantities recovered. Shallow-depth skimming (Straub time ribosomal DNA. Conversely, near-complete genomes desired required numbers genome. (or transcriptomics) resulting generated undergoing active transcription tissue preservation. genome-reduction strategy, facilitates comparison activity tissues, stages, rearing conditions, etc. One drawbacks transcriptomics required—specimens flash-frozen precluding utilization already museums. Targeted capture, 4), umbrella term efficient, cost-effective generating organisms. effectively reduce complexity (60–120 bp), single-stranded baits hybridize sequences, recovery interest coverage. recovered, although nontarget mitochondrial symbiont sequences) set Multiple multiplexed together, enables 100 simultaneously. There main animal phylogenomics: AHE (Lemmon Both approaches rely subset organisms' proposed, criteria, Hyb-Seq (Weitemier Compositae COS (Mandel 2014) RELEC (Karin, in-solution targeted anchor subsequently enriched. flanking (and informative) once. capture sets interest, exclude repetitive misleading pseudogenes paralogues. benefit consistently allows meta-analyses accumulate time. consistent, transcriptomes gathered types, accurately assess orthologues align isoforms during Anchored targets protein-coding meaning enriched comprise coding, cases intronic untranslated region). means easily coded analysed amino acids. Candidate identified reference species. Then, species included probe kit design isolated, generated. Probes based alignments, substantial control applied single copy variation efficient Full 500–800 marker legacy e.g. COI) pool integration Ultraconserved elements, turn, shared evolutionarily distant taxa. universal UCE population-level short, aligning subsequent detection (95–100%) conservation ways identifying designing them, approach described (2017). shown perform collecting (Blaimer McCormack 2016), facilitate expansion no restricted fresh specimens. older producing fewer shorter general, possible retrieve old phyluce, user-friendly open-source processing (Faircloth, phyluce packages tutorials accessible, beginners. resources, familiarity working command-line environment Unix Unix-like system. guide command line Happy Belly Bioinformatics (https://astrobiomike.github.io/unix/). practice arthropod al. (2019). Performing quality-control crucial, yet sometimes overlooked Reads ideally inspected quality, guanine-cytosine (GC) content, presence adapter errors calling insertions/deletions). programs fastqc Illumina (Andrews, 2010), ngsqc platforms (Dai offer detect visualize errors, complementary trimmomatic (Bolger trim low-quality bases remove contamination. While all-in-one control, similar proposed. cutadapt, trimming, Applied Biosystems' SOliD sequencer (Martin, Likewise, scythe removing adapters 3′ end degraded (https://github.com/vsbuffalo/scythe). sickle standalone program trimming (Joshi Fass, htstream consists streaming (https://github.com/ibest/HTStream). handle single-end paired-end parallelization. Once performed reads, next assembly contigs. refers merging platform necessary because cannot go, rather pieces approximately 20 000–30 000 length time, used. assembled assembly: de novo reference-based. De consist constructing, simplifying Bruijn graphs extract contigs, needed Compeau (2011) general graphs]. case reference-based independently aligned. Ultimately, almost placed likely position, contrast assembly, synergies occur. preferred fully developed, constantly evolving. Performance depends type assembled, tradeoffs, computer memory exhibited. velvet (Zerbino Birney, 2008), trinity (Grabherr, 2011), soapdenovo (Li spades (Bankevich, abyss 2.0 (Jackman Narzisi Mishra Hölzer Marz atram (Allen 2015) easy whole-genome, genome-skimming aforementioned assemblers automation parallelization tasks. addition, performs contigs scale compared full genome/transcriptome avoiding whole-genome/transcriptome assembly. problems stage (incomplete redundancy) difficulties orthologue paralogue identification, matrix construction. missing final aligned matrix, ultimately coverage, lengths, needed. current do inherently possess three; good-quality while pacbio generates low Phylogenetic orthology, i.e. whose common ancestor diverged speciation (orthologues), duplication event (paralogues) (Fitch, 1970). Genes arising events complicate concatenated describing paralogues tree. Orthology assessment biologists. assumed i

Язык: Английский

Процитировано

169

Full genome viral sequences inform patterns of SARS-CoV-2 spread into and within Israel DOI Creative Commons
Danielle Miller, Michael A. Martin, Noam Harel

и другие.

Nature Communications, Год журнала: 2020, Номер 11(1)

Опубликована: Ноя. 2, 2020

Abstract Full genome sequences are increasingly used to track the geographic spread and transmission dynamics of viral pathogens. Here, with a focus on Israel, we sequence 212 SARS-CoV-2 use them perform comprehensive analysis trace origins virus. We find that travelers returning from United States America significantly contributed in more than their proportion incoming infected travelers. Using phylodynamic analysis, estimate basic reproduction number virus was initially around 2.5, dropping by two-thirds following implementation social distancing measures. further report high levels heterogeneity spread, between 2-10% individuals resulting 80% secondary infections. Overall, our findings demonstrate effectiveness measures for reducing spread.

Язык: Английский

Процитировано

160

Towards reproducible computational drug discovery DOI Creative Commons
Nalini Schaduangrat, Samuel Lampa, Saw Simeon

и другие.

Journal of Cheminformatics, Год журнала: 2020, Номер 12(1)

Опубликована: Янв. 28, 2020

Abstract The reproducibility of experiments has been a long standing impediment for further scientific progress. Computational methods have instrumental in drug discovery efforts owing to its multifaceted utilization data collection, pre-processing, analysis and inference. This article provides an in-depth coverage on the computational discovery. review explores following topics: (1) current state-of-the-art reproducible research, (2) research documentation (e.g. electronic laboratory notebook, Jupyter etc.), (3) science (i.e. comparison contrast with related concepts as replicability, reusability reliability), (4) model development discovery, (5) issues deployment, (6) use case scenarios streamlining protocol. In disciplines, it become common practice share programming codes used numerical calculations not only facilitate reproducibility, but also foster collaborations drive project by introducing new ideas, growing data, augmenting code, etc.). It is therefore inevitable that field design would adopt open approach towards curation sharing data/code.

Язык: Английский

Процитировано

154