PLoS ONE,
Journal Year:
2013,
Volume and Issue:
8(7), P. e69189 - e69189
Published: July 29, 2013
Unlocking
the
vast
genomic
diversity
stored
in
natural
history
collections
would
create
unprecedented
opportunities
for
genome-scale
evolutionary,
phylogenetic,
domestication
and
population
studies.
Many
researchers
have
been
discouraged
from
using
historical
specimens
molecular
studies
because
of
both
generally
limited
success
DNA
extraction
challenges
associated
with
PCR-amplifying
highly
degraded
DNA.
In
today's
next-generation
sequencing
(NGS)
world,
prospects
changed
dramatically,
as
most
NGS
methods
are
actually
designed
taking
short
fragmented
molecules
templates.
Here
we
show
that
a
standard
multiplex
paired-end
Illumina
approach,
sequence
data
can
be
generated
reliably
dry-preserved
plant,
fungal
insect
collected
up
to
115
years
ago,
minimal
destructive
sampling.
Using
reference-based
assembly
were
able
produce
entire
nuclear
genome
43-year-old
Arabidopsis
thaliana
(Brassicaceae)
herbarium
specimen
high
uniform
coverage.
Nuclear
sequences
three
22–82
age
(Agaricus
bisporus,
Laccaria
bicolor,
Pleurotus
ostreatus)
81.4–97.9%
exome
Complete
organellar
assembled
all
specimens.
de
novo
retrieved
between
16.2–71.0%
coding
regions,
hence
remain
somewhat
cautious
about
Non-target
contaminations
observed
2
our
museum
We
anticipate
future
genomics
projects
will
perhaps
not
generate
cases
(our
contained
relatively
small
low-complexity
genomes),
but
at
least
generating
vital
comparative
testing
(phylo)genetic,
demographic
genetic
hypotheses,
become
increasingly
more
horizontal.
Furthermore,
enables
recovering
crucial
information
old
type
date
remained
mostly
unutilized
and,
thus,
opens
new
frontier
taxonomic
research
well.
Briefings in Bioinformatics,
Journal Year:
2017,
Volume and Issue:
20(4), P. 1160 - 1166
Published: Aug. 7, 2017
This
article
describes
several
features
in
the
MAFFT
online
service
for
multiple
sequence
alignment
(MSA).
As
a
result
of
recent
advances
sequencing
technologies,
huge
numbers
biological
sequences
are
available
and
need
MSAs
with
large
is
increasing.
To
extract
biologically
relevant
information
from
such
data,
sophistication
algorithms
necessary
but
not
sufficient.
Intuitive
interactive
tools
experimental
biologists
to
semiautomatically
handle
data
becoming
important.
We
working
on
development
toward
these
two
directions.
Here,
we
explain
(i)
Web
interface
recently
developed
options
(ii)
usage
refine
sets
MSAs.
Proceedings of the National Academy of Sciences,
Journal Year:
2020,
Volume and Issue:
117(17), P. 9451 - 9457
Published: April 16, 2020
The
accelerating
pace
of
genome
sequencing
throughout
the
tree
life
is
driving
need
for
improved
unsupervised
annotation
components
such
as
transposable
elements
(TEs).
Because
types
and
sequences
TEs
are
highly
variable
across
species,
automated
TE
discovery
challenging
time-consuming
tasks.
A
critical
first
step
de
novo
identification
accurate
compilation
sequence
models
representing
all
unique
families
dispersed
in
genome.
Here
we
introduce
RepeatModeler2,
a
pipeline
that
greatly
facilitates
this
process.
This
program
brings
substantial
improvements
over
original
version
RepeatModeler,
one
most
widely
used
tools
discovery.
In
particular,
incorporates
module
structural
complete
long
terminal
repeat
(LTR)
retroelements,
which
widespread
eukaryotic
genomes
but
recalcitrant
to
because
their
size
complexity.
We
benchmarked
RepeatModeler2
on
three
model
species
with
diverse
landscapes
high-quality,
manually
curated
libraries:
Drosophila
melanogaster
(fruit
fly),
Danio
rerio
(zebrafish),
Oryza
sativa
(rice).
these
identified
approximately
3
times
more
consensus
matching
>95%
identity
coverage
than
RepeatModeler.
As
expected,
greatest
improvement
LTR
retroelements.
Thus,
represents
valuable
addition
toolkit
will
enhance
study
sequences.
available
source
code
or
containerized
package
under
an
open
license
(
https://github.com/Dfam-consortium/RepeatModeler
,
http://www.repeatmasker.org/RepeatModeler/
).
Database,
Journal Year:
2016,
Volume and Issue:
2016, P. baw093 - baw093
Published: Jan. 1, 2016
The
Ensembl
gene
annotation
system
has
been
used
to
annotate
over
70
different
vertebrate
species
across
a
wide
range
of
genome
projects.
Furthermore,
it
generates
the
automatic
alignment-based
for
human
and
mouse
GENCODE
sets.
is
based
on
alignment
biological
sequences,
including
cDNAs,
proteins
RNA-seq
reads,
target
in
order
construct
candidate
transcript
models.
Careful
assessment
filtering
these
transcripts
ultimately
leads
final
set,
which
made
available
website.
Here,
we
describe
process
detail.Database
URL:
http://www.ensembl.org/index.html.
GigaScience,
Journal Year:
2013,
Volume and Issue:
2(1)
Published: July 22, 2013
The
process
of
generating
raw
genome
sequence
data
continues
to
become
cheaper,
faster,
and
more
accurate.
However,
assembly
such
into
high-quality,
finished
sequences
remains
challenging.
Many
tools
are
available,
but
they
differ
greatly
in
terms
their
performance
(speed,
scalability,
hardware
requirements,
acceptance
newer
read
technologies)
final
output
(composition
assembled
sequence).
More
importantly,
it
largely
unclear
how
best
assess
the
quality
sequences.
Assemblathon
competitions
intended
current
state-of-the-art
methods
assembly.
In
2,
we
provided
a
variety
be
for
three
vertebrate
species
(a
bird,
fish,
snake).
This
resulted
total
43
submitted
assemblies
from
21
participating
teams.
We
evaluated
these
using
combination
optical
map
data,
Fosmid
sequences,
several
statistical
methods.
From
over
100
different
metrics,
chose
ten
key
measures
by
which
overall
assemblies.
assemblers
produced
useful
assemblies,
containing
significant
representation
genes
structure.
high
degree
variability
between
entries
suggests
that
there
is
still
much
room
improvement
field
approaches
work
well
assembling
one
may
not
necessarily
another.
Nature,
Journal Year:
2017,
Volume and Issue:
542(7641), P. 307 - 312
Published: Feb. 8, 2017
Abstract
Chenopodium
quinoa
(quinoa)
is
a
highly
nutritious
grain
identified
as
an
important
crop
to
improve
world
food
security.
Unfortunately,
few
resources
are
available
facilitate
its
genetic
improvement.
Here
we
report
the
assembly
of
high-quality,
chromosome-scale
reference
genome
sequence
for
quinoa,
which
was
produced
using
single-molecule
real-time
sequencing
in
combination
with
optical,
chromosome-contact
and
maps.
We
also
two
diploids
from
ancestral
gene
pools
enables
identification
sub-genomes
reduced-coverage
sequences
22
other
samples
allotetraploid
goosefoot
complex.
The
facilitated
transcription
factor
likely
control
production
anti-nutritional
triterpenoid
saponins
found
seeds,
including
mutation
that
appears
cause
alternative
splicing
premature
stop
codon
sweet
strains.
These
genomic
first
step
towards
improvement
quinoa.
Nucleic Acids Research,
Journal Year:
2016,
Volume and Issue:
44(9), P. e89 - e89
Published: Feb. 17, 2016
Annotation
of
protein-coding
genes
is
very
important
in
bioinformatics
and
biology
has
a
decisive
influence
on
many
downstream
analyses.
Homology-based
gene
prediction
programs
allow
for
transferring
knowledge
about
from
an
annotated
organism
to
interest.
Here,
we
present
homology-based
program
called
GeMoMa.
GeMoMa
utilizes
the
conservation
intron
positions
within
predict
related
other
organisms.
We
assess
performance
compare
it
with
state-of-the-art
competitors
plant
animal
genomes
using
extended
best
reciprocal
hit
approach.
find
that
often
makes
more
precise
predictions
than
its
yielding
substantially
increased
number
correct
transcripts.
Subsequently,
exemplarily
validate
Sanger
sequencing.
Finally,
use
RNA-seq
data
programs,
again
performs
well.
Hence,
conclude
exploiting
position
improves
prediction,
make
freely
available
as
command-line
tool
Galaxy
integration.
Frontiers in Plant Science,
Journal Year:
2014,
Volume and Issue:
5
Published: June 16, 2014
Environmental
DNA
sequencing
has
revealed
the
expansive
biodiversity
of
microorganisms
and
clarified
relationship
between
host-associated
microbial
communities
host
phenotype.
Shotgun
metagenomic
is
a
relatively
new
powerful
environmental
approach
that
provides
insight
into
community
function.
But,
analysis
sequences
complicated
due
to
complex
structure
data.
Fortunately,
tools
data
resources
have
been
developed
circumvent
these
complexities
allow
researchers
determine
which
microbes
are
present
in
what
they
might
be
doing.
This
review
describes
analytical
strategies
specific
can
applied
considerations
caveats
associated
with
their
use.
Specifically,
it
documents
how
metagenomes
analyzed
quantify
diversity,
assemble
novel
genomes,
identify
taxa
genes,
metabolic
pathways
encoded
community.
It
also
discusses
several
methods
used
compare
functions
differentiate
communities.
Bioinformatics,
Journal Year:
2016,
Volume and Issue:
32(13), P. 1933 - 1942
Published: Feb. 26, 2016
Abstract
Motivation:
We
present
a
new
feature
of
the
MAFFT
multiple
alignment
program
for
suppressing
over-alignment
(aligning
unrelated
segments).
Conventional
is
highly
sensitive
in
aligning
conserved
regions
remote
homologs,
but
risk
recently
becoming
greater,
as
low-quality
or
noisy
sequences
are
increasing
protein
sequence
databases,
due,
example,
to
sequencing
errors
and
difficulty
gene
prediction.
Results:
The
proposed
method
utilizes
variable
scoring
matrix
different
pairs
(or
groups)
single
alignment,
based
on
global
similarity
each
pair.
This
significantly
increases
correctly
gapped
sites
real
examples
simulations
under
various
conditions.
Regarding
sensitivity,
effect
slightly
negative
protein-based
benchmarks,
mostly
neutral
simulation-based
benchmarks.
approach
natural
biological
reasoning
should
be
compatible
with
many
methods
dynamic
programming
alignment.
Availability
implementation:
available
versions
7.263
higher.
http://mafft.cbrc.jp/alignment/software/
Contact:
[email protected]
Supplementary
information:
data
at
Bioinformatics
online.
Nature Genetics,
Journal Year:
2019,
Volume and Issue:
51(5), P. 865 - 876
Published: May 1, 2019
High
oil
and
protein
content
make
tetraploid
peanut
a
leading
food
legume.
Here
we
report
high-quality
genome
sequence,
comprising
2.54
Gb
with
20
pseudomolecules
83,709
protein-coding
gene
models.
We
characterize
functional
groups
implicated
in
seed
size
evolution,
content,
disease
resistance
symbiotic
nitrogen
fixation.
The
B
subgenome
has
more
genes
general
expression
dominance,
temporally
associated
long-terminal-repeat
expansion
the
A
that
also
raises
questions
about
A-genome
progenitor.
polyploid
provided
insights
into
evolution
of
Arachis
hypogaea
other
legume
chromosomes.
Resequencing
52
accessions
suggests
independent
domestications
formed
ecotypes.
Whereas
0.42–0.47
million
years
ago
(Ma)
polyploidy
constrained
genetic
variation,
sequence
aids
mapping
candidate-gene
discovery
for
traits
such
as
color,
foliar
others,
providing
cornerstone
genomics
improvement.
High-quality
cultivated
models
provides
mechanisms
underlying
leaf
peanut.