Genomic Treasure Troves: Complete Genome Sequencing of Herbarium and Insect Museum Specimens DOI Creative Commons

Martijn Staats,

Roy H.J. Erkens, Bart van de Vossenberg

et al.

PLoS ONE, Journal Year: 2013, Volume and Issue: 8(7), P. e69189 - e69189

Published: July 29, 2013

Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population studies. Many researchers have been discouraged from using historical specimens molecular studies because of both generally limited success DNA extraction challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, prospects changed dramatically, as most NGS methods are actually designed taking short fragmented molecules templates. Here we show that a standard multiplex paired-end Illumina approach, sequence data can be generated reliably dry-preserved plant, fungal insect collected up to 115 years ago, minimal destructive sampling. Using reference-based assembly were able produce entire nuclear genome 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen high uniform coverage. Nuclear sequences three 22–82 age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) 81.4–97.9% exome Complete organellar assembled all specimens. de novo retrieved between 16.2–71.0% coding regions, hence remain somewhat cautious about Non-target contaminations observed 2 our museum We anticipate future genomics projects will perhaps not generate cases (our contained relatively small low-complexity genomes), but at least generating vital comparative testing (phylo)genetic, demographic genetic hypotheses, become increasingly more horizontal. Furthermore, enables recovering crucial information old type date remained mostly unutilized and, thus, opens new frontier taxonomic research well.

Language: Английский

MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization DOI Creative Commons
Kazutaka Katoh,

John Rozewicki,

Kazunori Yamada

et al.

Briefings in Bioinformatics, Journal Year: 2017, Volume and Issue: 20(4), P. 1160 - 1166

Published: Aug. 7, 2017

This article describes several features in the MAFFT online service for multiple sequence alignment (MSA). As a result of recent advances sequencing technologies, huge numbers biological sequences are available and need MSAs with large is increasing. To extract biologically relevant information from such data, sophistication algorithms necessary but not sufficient. Intuitive interactive tools experimental biologists to semiautomatically handle data becoming important. We working on development toward these two directions. Here, we explain (i) Web interface recently developed options (ii) usage refine sets MSAs.

Language: Английский

Citations

6851

RepeatModeler2 for automated genomic discovery of transposable element families DOI Open Access
Jullien M. Flynn, Robert Hubley, Clément Goubert

et al.

Proceedings of the National Academy of Sciences, Journal Year: 2020, Volume and Issue: 117(17), P. 9451 - 9457

Published: April 16, 2020

The accelerating pace of genome sequencing throughout the tree life is driving need for improved unsupervised annotation components such as transposable elements (TEs). Because types and sequences TEs are highly variable across species, automated TE discovery challenging time-consuming tasks. A critical first step de novo identification accurate compilation sequence models representing all unique families dispersed in genome. Here we introduce RepeatModeler2, a pipeline that greatly facilitates this process. This program brings substantial improvements over original version RepeatModeler, one most widely used tools discovery. In particular, incorporates module structural complete long terminal repeat (LTR) retroelements, which widespread eukaryotic genomes but recalcitrant to because their size complexity. We benchmarked RepeatModeler2 on three model species with diverse landscapes high-quality, manually curated libraries: Drosophila melanogaster (fruit fly), Danio rerio (zebrafish), Oryza sativa (rice). these identified approximately 3 times more consensus matching >95% identity coverage than RepeatModeler. As expected, greatest improvement LTR retroelements. Thus, represents valuable addition toolkit will enhance study sequences. available source code or containerized package under an open license ( https://github.com/Dfam-consortium/RepeatModeler , http://www.repeatmasker.org/RepeatModeler/ ).

Language: Английский

Citations

2485

The Ensembl gene annotation system DOI Creative Commons
Bronwen Aken,

Sarah Ayling,

Daniel Barrell

et al.

Database, Journal Year: 2016, Volume and Issue: 2016, P. baw093 - baw093

Published: Jan. 1, 2016

The Ensembl gene annotation system has been used to annotate over 70 different vertebrate species across a wide range of genome projects. Furthermore, it generates the automatic alignment-based for human and mouse GENCODE sets. is based on alignment biological sequences, including cDNAs, proteins RNA-seq reads, target in order construct candidate transcript models. Careful assessment filtering these transcripts ultimately leads final set, which made available website. Here, we describe process detail.Database URL: http://www.ensembl.org/index.html.

Language: Английский

Citations

1104

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species DOI
Keith Bradnam,

Joseph Fass,

Anton Alexandrov

et al.

GigaScience, Journal Year: 2013, Volume and Issue: 2(1)

Published: July 22, 2013

The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly such into high-quality, finished sequences remains challenging. Many tools are available, but they differ greatly in terms their performance (speed, scalability, hardware requirements, acceptance newer read technologies) final output (composition assembled sequence). More importantly, it largely unclear how best assess the quality sequences. Assemblathon competitions intended current state-of-the-art methods assembly. In 2, we provided a variety be for three vertebrate species (a bird, fish, snake). This resulted total 43 submitted assemblies from 21 participating teams. We evaluated these using combination optical map data, Fosmid sequences, several statistical methods. From over 100 different metrics, chose ten key measures by which overall assemblies. assemblers produced useful assemblies, containing significant representation genes structure. high degree variability between entries suggests that there is still much room improvement field approaches work well assembling one may not necessarily another.

Language: Английский

Citations

686

Genome Annotation and Curation Using MAKER and MAKER‐P DOI
Michael S. Campbell, Carson Holt,

Barry Moore

et al.

Current Protocols in Bioinformatics, Journal Year: 2014, Volume and Issue: 48(1)

Published: Dec. 1, 2014

Abstract This unit describes how to use the genome annotation and curation tools MAKER MAKER‐P annotate protein‐coding noncoding RNA genes in newly assembled genomes, update/combine legacy annotations light of new evidence, add quality metrics from other pipelines, map existing a assembly. can rapidly genomes any size, scale match available computational resources. © 2014 by John Wiley & Sons, Inc.

Language: Английский

Citations

660

The genome of Chenopodium quinoa DOI Creative Commons
David E. Jarvis, Yung Shwen Ho, Damien J. Lightfoot

et al.

Nature, Journal Year: 2017, Volume and Issue: 542(7641), P. 307 - 312

Published: Feb. 8, 2017

Abstract Chenopodium quinoa (quinoa) is a highly nutritious grain identified as an important crop to improve world food security. Unfortunately, few resources are available facilitate its genetic improvement. Here we report the assembly of high-quality, chromosome-scale reference genome sequence for quinoa, which was produced using single-molecule real-time sequencing in combination with optical, chromosome-contact and maps. We also two diploids from ancestral gene pools enables identification sub-genomes reduced-coverage sequences 22 other samples allotetraploid goosefoot complex. The facilitated transcription factor likely control production anti-nutritional triterpenoid saponins found seeds, including mutation that appears cause alternative splicing premature stop codon sweet strains. These genomic first step towards improvement quinoa.

Language: Английский

Citations

645

Using intron position conservation for homology-based gene prediction DOI Creative Commons
Jens Keilwagen, Michael Wenk, Jessica L. Erickson

et al.

Nucleic Acids Research, Journal Year: 2016, Volume and Issue: 44(9), P. e89 - e89

Published: Feb. 17, 2016

Annotation of protein-coding genes is very important in bioinformatics and biology has a decisive influence on many downstream analyses. Homology-based gene prediction programs allow for transferring knowledge about from an annotated organism to interest. Here, we present homology-based program called GeMoMa. GeMoMa utilizes the conservation intron positions within predict related other organisms. We assess performance compare it with state-of-the-art competitors plant animal genomes using extended best reciprocal hit approach. find that often makes more precise predictions than its yielding substantially increased number correct transcripts. Subsequently, exemplarily validate Sanger sequencing. Finally, use RNA-seq data programs, again performs well. Hence, conclude exploiting position improves prediction, make freely available as command-line tool Galaxy integration.

Language: Английский

Citations

583

An introduction to the analysis of shotgun metagenomic data DOI Creative Commons
Thomas J. Sharpton

Frontiers in Plant Science, Journal Year: 2014, Volume and Issue: 5

Published: June 16, 2014

Environmental DNA sequencing has revealed the expansive biodiversity of microorganisms and clarified relationship between host-associated microbial communities host phenotype. Shotgun metagenomic is a relatively new powerful environmental approach that provides insight into community function. But, analysis sequences complicated due to complex structure data. Fortunately, tools data resources have been developed circumvent these complexities allow researchers determine which microbes are present in what they might be doing. This review describes analytical strategies specific can applied considerations caveats associated with their use. Specifically, it documents how metagenomes analyzed quantify diversity, assemble novel genomes, identify taxa genes, metabolic pathways encoded community. It also discusses several methods used compare functions differentiate communities.

Language: Английский

Citations

576

A simple method to control over-alignment in the MAFFT multiple sequence alignment program DOI Creative Commons
Kazutaka Katoh, Daron M. Standley

Bioinformatics, Journal Year: 2016, Volume and Issue: 32(13), P. 1933 - 1942

Published: Feb. 26, 2016

Abstract Motivation: We present a new feature of the MAFFT multiple alignment program for suppressing over-alignment (aligning unrelated segments). Conventional is highly sensitive in aligning conserved regions remote homologs, but risk recently becoming greater, as low-quality or noisy sequences are increasing protein sequence databases, due, example, to sequencing errors and difficulty gene prediction. Results: The proposed method utilizes variable scoring matrix different pairs (or groups) single alignment, based on global similarity each pair. This significantly increases correctly gapped sites real examples simulations under various conditions. Regarding sensitivity, effect slightly negative protein-based benchmarks, mostly neutral simulation-based benchmarks. approach natural biological reasoning should be compatible with many methods dynamic programming alignment. Availability implementation: available versions 7.263 higher. http://mafft.cbrc.jp/alignment/software/ Contact: [email protected] Supplementary information: data at Bioinformatics online.

Language: Английский

Citations

537

The genome of cultivated peanut provides insight into legume karyotypes, polyploid evolution and crop domestication DOI Creative Commons
Weijian Zhuang, Hua Chen,

Meng Yang

et al.

Nature Genetics, Journal Year: 2019, Volume and Issue: 51(5), P. 865 - 876

Published: May 1, 2019

High oil and protein content make tetraploid peanut a leading food legume. Here we report high-quality genome sequence, comprising 2.54 Gb with 20 pseudomolecules 83,709 protein-coding gene models. We characterize functional groups implicated in seed size evolution, content, disease resistance symbiotic nitrogen fixation. The B subgenome has more genes general expression dominance, temporally associated long-terminal-repeat expansion the A that also raises questions about A-genome progenitor. polyploid provided insights into evolution of Arachis hypogaea other legume chromosomes. Resequencing 52 accessions suggests independent domestications formed ecotypes. Whereas 0.42–0.47 million years ago (Ma) polyploidy constrained genetic variation, sequence aids mapping candidate-gene discovery for traits such as color, foliar others, providing cornerstone genomics improvement. High-quality cultivated models provides mechanisms underlying leaf peanut.

Language: Английский

Citations

527