Nature Communications,
Journal Year:
2023,
Volume and Issue:
14(1)
Published: Sept. 22, 2023
The
role
of
de
novo
evolved
genes
from
non-coding
sequences
in
regulating
morphological
differentiation
between
species/subspecies
remains
largely
unknown.
Here,
we
show
that
a
rice
gene
GSE9
contributes
to
grain
shape
difference
indica/xian
and
japonica/geng
varieties.
evolves
previous
region
wild
Oryza
rufipogon
through
the
acquisition
start
codon.
This
is
inherited
by
most
japonica
varieties,
while
original
sequence
(absence
codon,
gse9)
present
majority
indica
Knockout
varieties
leads
slender
grains,
whereas
introgression
background
results
round
grains.
Population
evolutionary
analyses
reveal
gse9
are
derived
Or-I
Or-III
groups,
respectively.
Our
findings
uncover
genetic
divergence
subspecies,
provide
target
for
precise
manipulation
shape.
Genome biology,
Journal Year:
2024,
Volume and Issue:
25(1)
Published: April 26, 2024
Long-read
sequencing
data,
particularly
those
derived
from
the
Oxford
Nanopore
platform,
tend
to
exhibit
high
error
rates.
Here,
we
present
NextDenovo,
an
efficient
correction
and
assembly
tool
for
noisy
long
reads,
which
achieves
a
level
of
accuracy
in
genome
assembly.
We
apply
NextDenovo
assemble
35
diverse
human
genomes
around
world
using
long-read
data.
These
allow
us
identify
landscape
segmental
duplication
gene
copy
number
variation
modern
populations.
The
use
should
pave
way
population-scale
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: March 12, 2023
Abstract
Long
read
sequencing
data,
particularly
those
derived
from
the
Oxford
Nanopore
(ONT)
platform,
tend
to
exhibit
a
high
error
rate.
Here,
we
present
NextDenovo,
highly
efficient
correction
and
assembly
tool
for
noisy
long
reads,
which
achieves
level
of
accuracy
in
genome
assembly.
NextDenovo
can
rapidly
correct
reads;
these
corrected
reads
contain
fewer
errors
than
other
comparable
tools
are
characterized
by
chimeric
alignments.
We
applied
quality
reference
genomes
35
diverse
humans
across
world
using
ONT
data.
Based
on
de
novo
assemblies,
were
able
identify
landscape
segmental
duplications
gene
copy
number
variation
modern
human
population.
The
use
program
should
pave
way
population-scale
long-read
assembly,
thereby
facilitating
construction
pan-genomes,
Molecular Plant,
Journal Year:
2023,
Volume and Issue:
16(8), P. 1232 - 1236
Published: Aug. 1, 2023
In
2005,
the
current
commonly
used
rice
reference
genome
(Oryza
sativa
ssp.
japonica
cv.
Nipponbare)
was
initially
released
by
International
Rice
Genome
Sequencing
Project
(International
Project,
2005International
ProjectThe
map-based
sequence
of
genome.Nature.
2005;
436:
793-800https://doi.org/10.1038/nature03895Crossref
PubMed
Scopus
(3053)
Google
Scholar).
Thereafter,
further
updated
in
2013
with
improved
assembly
(IRGSP-1.0)
and
gene
annotations
(MSU7,
RAP-DB)
(Kawahara
et
al.,
2013Kawahara
Y.
de
la
Bastide
M.
Hamilton
J.P.
Kanamori
H.
McCombie
W.R.
Ouyang
S.
Schwartz
D.C.
Tanaka
T.
Wu
J.
Zhou
al.Improvement
Oryza
Nipponbare
using
next
generation
optical
map
data.Rice.
2013;
6:
4https://doi.org/10.1186/1939-8433-6-4Crossref
(1108)
Scholar;
Sakai
2013Sakai
Lee
S.S.
Numa
Kim
Kawahara
Wakimoto
Yang
C.C.
Iwamoto
Abe
al.Rice
Annotation
Database
(RAP-DB):
an
integrative
interactive
database
for
genomics.Plant
Cell
Physiol.
54:
e6https://doi.org/10.1093/pcp/pcs183Crossref
(489)
past
10
years,
this
has
been
serving
as
one
most
important
genetic
resources
subsequent
functional
genomics
efforts.
As
several
genomes
had
assembled
into
gapless
chromosomes
only
2–5
telomeres
absent
(Li
2021Li
K.
Jiang
W.
Hui
Kong
Feng
L.Y.
Gao
L.Z.
Li
P.
Lu
Gapless
indica
reveals
synergistic
contributions
active
transposable
elements
segmental
duplications
to
evolution.Mol.
Plant.
2021;
14:
1745-1756https://doi.org/10.1016/j.molp.2021.06.017Abstract
Full
Text
PDF
(31)
Song
2021Song
J.M.
Xie
W.Z.
Wang
Guo
Y.X.
Koo
D.H.
Kudrna
D.
Gong
C.
Huang
J.W.
Zhang
al.Two
gap-free
a
global
view
centromere
architecture
rice.Mol.
1757-1767https://doi.org/10.1016/j.molp.2021.06.018Abstract
(77)
2022Zhang
Fu
Han
X.
Yan
Su
Lin
Z.
Qin
al.The
telomere-to-telomere
four
parents
SV
PAV
patterns
hybrid
breeding.Plant
Biotechnol.
2022;
20:
1642-1644https://doi.org/10.1111/pbi.13880Crossref
(13)
Scholar),
IRGSP-1.0
its
still
performed
widely
reference.
However,
limitations
sequencing
technology
intricate
genomic
organization
led
under-representation
complex
regions
reference,
leaving
total
72
major
gaps
(including
19
telomeres),
167
minor
gaps,
779
unknown
bases
estimated
length
∼3%
unsolved.
To
pursue
complete
foundational
genome,
we
applied
strategy
that
integrated
Pacbio
HiFi
Oxford
Nanopore
Technology
(ONT)
ultra-long
reads
generate
original
contigs,
which
were
then
scaffolded
onto
chromosome-level
support
Hi-C
dataset.
Gap
filling
terminal
extension
conducted
resolve
remaining
seven
telomere
region
within
scaffolds.
All
gap-closure
supported
uniform
coverage
ONT
(Supplemental
Figure
1).
A
large
rDNA
array
identified
beside
short
arm
chromosome
9
nearly
identical
repeats
45S
2),
artificially
filled
consecutive
blocks
reflecting
their
copy
number
(see
supplemental
materials
methods).
This
captured
93.8%
93.9%
containing
full-length
mapping,
but
should
be
treated
model
sequences.
Following
polishing
employing
Illumina
PE
(next-generation
[NGS])
reads,
produced
T2T-NIP
(version
AGIS-1.0),
all
12
24
resolved
(Figure
1A).
Multiple
strategies
evaluate
accuracy
completeness
T2T-NIP.
available
primary
data—including
HiFi,
ONT,
NGS,
Hi-C—were
remapped
high
mapping
rates
>99.6%
datasets
except
(93.1%).
displayed
across
whole
dataset
because
centromeres
near
two
1B).
Chromatin
immunoprecipitation
(ChIP-seq)
CENH3
antibody
identify
location
1A,
Supplemental
Table
1,
3).
CentO-enriched
also
homology
155-
165-bp
CentO
satellite
1A
1),
eight
showed
similar
or
consistent
size
previous
report
determined
fluorescence
situ
hybridization
(Cheng
2002Cheng
Dong
F.
Langdon
Buell
C.R.
Gu
Blattner
F.R.
Functional
are
marked
repeat
centromere-specific
retrotransposon.Plant
Cell.
2002;
1691-1704https://doi.org/10.1105/tpc.003079Crossref
(321)
The
consensus
approximately
error
per
5
million
(Q63),
much
higher
2).
For
content
assessment,
99.88%
BUSCO
1614
set
3),
equal
than
previously
reported
1747
ribosomal
RNA
(rRNA)
genes
T2T-NIP,
whereas
hundred
IRGSP-1.0.
57
359
protein-coding
325
794
(51.1%)
identified,
both
represent
more
Tables
4
5).
array,
1022
annotated
transcriptome
data
6).
Among
314
gap-filling
excluding
142
confirmed
expressed
tissue-specific
4).
With
achieved
385.7
base
pairs
(Mbp),
including
abundant
improvements
compared
prior
4–6).
Compared
IRGSP-1.0,
contains
12.5
Mbp
newly
sequence,
arrays
(33.2%),
pericentromeric
centromeric
(32.1%),
(27.1%),
subtelomeric
(5.1%),
necessary
fundamental
cellular
processes
1C–1E).
Some
largest
covered
nine
chromosomes,
telomeric
repetitive
three
represented
unresolved
sequences
7).
addition
these
apparent
other
gap
found
artificial
otherwise
incorrect
8).
We
investigated
possible
500
kb
flanking
adjacent
far
from
(39/44)
excellent
synteny
while
almost
close
(11/12)
contained
additional
extensive
structural
differences
(e.g.,
deletions
inversions
lengths
>20
kb)
1D).
Additionally,
could
well
resulting
continuous
100–117
1D
These
results
demonstrated
significant
update
resolving
misassembled
structures
probably
caused
removes
long-standing
barrier
hidden
3%
sequence-based
analysis,
regions.
Therefore,
it
is
describe
initial
analysis
truly
discuss
potential
applications.
have
rich
collection
omics
models
transposon
(TEs),
sequencing,
methylation
datasets,
presented
online
(http://www.ricesuperpir.com/web/nip).
highlight
utility
resources,
demonstrate
examples
duplicated
11
associated
gaps.
AGIS_Os10g035850
(denoted
LOC_Os10g43075
IRGSP-1.0/MSU7)
traversed
boundary
at
10,
incomplete
annotation
76.3%
entire
some
misannotated
exons
version.
thus
correction
model,
six
new
each
splicing
alternatives
Most
TE-related
multiple
copies
(paralogs)
sequences,
always
complicated
analysis.
When
NGS
absence
paralogs
causes
incorrectly
align
LOC_Os11g12240
(AGIS_Os11g010790),
many
false-positive
variants
1F).
mapped
show
expected
typical
heterozygous
variation
pattern
small
region.
Any
paralogs,
others
like
them,
will
overlooked
when
thereby
promoting
importance
release
investigate
how
affects
short-read
variant
calling,
collected
230
cultivated
sativa)
wild
rufipogon)
accessions
our
study
(Shang
2022Shang
L.
He
Yuan
Q.
Wei
Hu
Zhao
al.A
super
pan-genomic
landscape
rice.Cell
Res.
32:
878-896https://doi.org/10.1038/s41422-022-00685-zCrossref
(39)
consisted
populations:
Xian/indica
(XI),
Geng/japonica
(GJ),
Aus
(cA).
same
pipeline
calling
based
on
eliminate
interferences
software
parameters.
On
average,
BWA-MEM
1.04
×
107
(6.9%)
properly
paired
Interestingly,
even
though
per-read
mismatch
rate
1.2%–8.2%
lower
populations
1G).
Similarly,
characteristics
such
reducing
misoriented
read
1H)
improving
uniformity
1I)
Within
regions,
noted
decrease
2.0%–4.3%
standard
deviation
analogous
among
population
groups
1I).
From
alignments,
741
895
221
high-quality
single-nucleotide
indel
relative
(per-sample
mean,
3
225
631)
744
667
800
237
686),
observing
shared
called
individual
6
9).
Along
improvement
rate,
attribute
reduction
per-sample
calls
errors,
especially
resolution
correct
conclusion
observation
sample
decreased
largely
homozygous
slight
increase
GJ
superiority
accurate
reads.
Next,
effects
(SV)
published
long
Alignment
reduced
observed
1J)
1K)
populations.
corrected
errors
facilitated
alignment,
what
S10).
results,
(from
−16.3%
−4.6%)
SVs
different
against
instead
Similar
variations
above,
those
7),
likely
due
rare
supplement
phenotype
genome-wide
association
studies
(GWASs)
assess
efficiency
101
SNPs
five
agronomic
traits,
detected
example,
pleiotropic
locus
related
yield
plant
1
(qYPP1)
significantly
grain
height
not
1L–1M
Gene-editing
experiments
screening
revealed
between
plants
type
function-loss
mutation
encoding
subunit
ADP-glucose
pyrophosphorylase,
OsAGPL2
1N
favorable
haplotype
showing
(44.7
±
11.8
g)
haplotypes
1O).
T2T-NIP-specific
width
enhanced
mining
summary,
assembly,
addressing
missing
information,
represents
resource.
introduced
∼12.5
1324
predictions,
include
arrays,
subtelomeres,
unlocking
variational
studies.
raw
deposited
National
Center
Biotechnology
Information
under
project
accession
PRJNA953663
Genomics
Data
PRJCA018610.
browser
can
easily
accessed
website
research
Natural
Science
Foundation
China
(32188102,
32101718),
Guangdong
Basic
Applied
Research
(2023B1515020053),
Youth
Innovation
Chinese
Academy
Agricultural
Sciences
(Y20230C36),
specific
fund
Platform
Academicians
Hainan
Province
(YSPTZX202303).
Genes,
Journal Year:
2023,
Volume and Issue:
14(7), P. 1484 - 1484
Published: July 21, 2023
Rapidly
rising
population
and
climate
changes
are
two
critical
issues
that
require
immediate
action
to
achieve
sustainable
development
goals.
The
is
posing
increased
demand
for
food,
thereby
pushing
an
acceleration
in
agricultural
production.
Furthermore,
anthropogenic
activities
have
resulted
environmental
pollution
such
as
water
soil
degradation
well
alterations
the
composition
concentration
of
gases.
These
affecting
not
only
biodiversity
loss
but
also
physio-biochemical
processes
crop
plants,
resulting
a
stress-induced
decline
yield.
To
overcome
problems
ensure
supply
food
material,
consistent
efforts
being
made
develop
strategies
techniques
increase
yield
enhance
tolerance
toward
climate-induced
stress.
Plant
breeding
evolved
after
domestication
initially
remained
dependent
on
phenotype-based
selection
improvement.
But
it
has
grown
through
cytological
biochemical
methods,
newer
contemporary
methods
based
DNA-marker-based
help
agronomically
useful
traits.
now
supported
by
high-end
molecular
biology
tools
like
PCR,
high-throughput
genotyping
phenotyping,
data
from
morpho-physiology,
statistical
tools,
bioinformatics,
machine
learning.
After
establishing
its
worth
animal
breeding,
genomic
(GS),
improved
variant
marker-assisted
(MAS),
way
into
crop-breeding
programs
powerful
tool.
novel
innovative
marker-based
models
genetic
evaluation,
GS
makes
use
markers.
can
amend
complex
traits
shorten
period,
making
advantageous
over
pedigree
(MAS).
It
reduces
time
resources
required
plant
while
allowing
gain
attributes.
been
taken
new
heights
integrating
advanced
technologies
speed
learning,
environmental/weather
further
harness
potential,
approach
known
integrated
(IGS).
This
review
highlights
IGS
strategies,
procedures,
approaches,
associated
emerging
issues,
with
special
emphasis
cereal
crops.
In
this
domain,
highlight
potential
cutting-edge
innovation
climate-smart
crops
endure
abiotic
stresses
motive
keeping
production
quality
at
par
global
demand.
Genome biology,
Journal Year:
2023,
Volume and Issue:
24(1)
Published: Jan. 26, 2023
Abstract
Background
A
pangenome
aims
to
capture
the
complete
genetic
diversity
within
a
species
and
reduce
bias
in
analysis
inherent
using
single
reference
genome.
However,
current
linear
format
of
most
plant
pangenomes
limits
presentation
position
information
for
novel
sequences.
Graph
have
been
developed
overcome
this
limitation.
bioinformatics
tools
graph
genomes
are
lacking.
Results
To
problem,
we
develop
strategy
construction
downstream
pipeline
(PSVCP)
that
captures
variants’
while
maintaining
linearized
layout.
Using
PSVCP,
construct
high-quality
rice
12
representative
analyze
an
international
panel
with
413
diverse
accessions
as
reference.
We
show
PSVCP
successfully
identifies
causal
structural
variations
grain
weight
height.
Our
results
provide
insights
into
population
structure
genomic
diversity.
characterize
new
locus
(
qPH8-1
)
associated
height
on
chromosome
8
undetected
by
SNP-based
genome-wide
association
study
(GWAS).
Conclusions
demonstrate
constructed
our
combined
presence
absence
variation-based
GWAS
can
additional
power
analysis.
The
genome
sequence
variants
data
valuable
resources
genomics
research
improvement
future.
Molecular Plant,
Journal Year:
2023,
Volume and Issue:
16(4), P. 678 - 693
Published: Feb. 9, 2023
Structural
variations
(SVs)
have
long
been
described
as
being
involved
in
the
origin,
adaption,
and
domestication
of
species.
However,
underlying
genetic
genomic
mechanisms
are
poorly
understood.
Here,
we
report
a
high-quality
genome
assembly
Gossypium
barbadense
acc.
Tanguis,
landrace
that
is
closely
related
to
formation
extra-long-staple
(ELS)
cultivated
cotton.
An
SV-based
pan-genome
(Pan-SV)
was
then
constructed
using
total
182
593
non-redundant
SVs,
including
2236
inversions,
97
398
insertions,
82
959
deletions
from
11
assembled
genomes
allopolyploid
The
utility
this
Pan-SV
demonstrated
through
population
structure
analysis
genome-wide
association
studies
(GWASs).
Using
segregation
mapping
populations
produced
crossing
ELS
cotton
along
with
an
GWAS,
certain
SVs
responsible
for
speciation,
domestication,
improvement
tetraploid
cottons
were
identified.
Importantly,
some
presently
identified
associated
yield
fiber
quality
had
not
previous
SNP-based
GWAS.
In
particular,
9-bp
insertion
or
deletion
found
associate
elimination
interspecific
reproductive
isolation
between
hirsutum
G.
barbadense.
Collectively,
study
provides
new
insights
into
genome-wide,
gene-scale
linked
important
agronomic
traits
major
crop
species
highlights
importance
during
Journal of Animal Science and Biotechnology/Journal of animal science and biotechnology,
Journal Year:
2023,
Volume and Issue:
14(1)
Published: May 5, 2023
Abstract
As
large-scale
genomic
studies
have
progressed,
it
has
been
revealed
that
a
single
reference
genome
pattern
cannot
represent
genetic
diversity
at
the
species
level.
While
domestic
animals
tend
to
complex
routes
of
origin
and
migration,
suggesting
possible
omission
some
population-specific
sequences
in
current
genome.
Conversely,
pangenome
is
collection
all
DNA
contains
shared
by
individuals
(core
genome)
also
able
display
sequence
information
unique
each
individual
(variable
genome).
The
progress
research
humans,
plants
proved
missing
components
identification
large
structural
variants
(SVs)
can
be
explored
through
pangenomic
studies.
Many
specific
shown
related
biological
adaptability,
phenotype
important
economic
traits.
maturity
technologies
methods
such
as
third-generation
sequencing,
Telomere-to-telomere
genomes,
graphic
reference-free
assembly
will
further
promote
development
pangenome.
In
future,
combined
with
long-read
data
multi-omics
help
resolve
SVs
their
relationship
main
traits
interest
domesticated
animals,
providing
better
insights
into
animal
domestication,
evolution
breeding.
this
review,
we
mainly
discuss
how
analysis
reveals
variations
(sheep,
cattle,
pigs,
chickens)
impacts
on
phenotypes
contribute
understanding
diversity.
Additionally,
go
potential
issues
future
perspectives
livestock
poultry.
Nature Communications,
Journal Year:
2023,
Volume and Issue:
14(1)
Published: March 21, 2023
Understanding
and
exploiting
genetic
diversity
is
a
key
factor
for
the
productive
stable
production
of
rice.
Here,
we
utilize
73
high-quality
genomes
that
encompass
subpopulation
structure
Asian
rice
(Oryza
sativa),
plus
two
wild
relatives
(O.
rufipogon
O.
punctata),
to
build
pan-genome
inversion
index
1769
non-redundant
inversions
span
an
average
~29%
sativa
cv.
Nipponbare
reference
genome
sequence.
Using
this
index,
estimate
rate
~700
per
million
years
in
rice,
which
16
50
times
higher
than
previously
estimated
plants.
Detailed
analyses
these
show
evidence
their
effects
on
gene
expression,
recombination
rate,
linkage
disequilibrium.
Our
study
uncovers
prevalence
scale
large
(≥100
bp)
across
hints
at
largely
unexplored
role
functional
biology
crop
performance.
Nature Genetics,
Journal Year:
2024,
Volume and Issue:
56(5), P. 982 - 991
Published: April 11, 2024
Abstract
Although
originally
primarily
a
system
for
functional
biology,
Arabidopsis
thaliana
has,
owing
to
its
broad
geographical
distribution
and
adaptation
diverse
environments,
developed
into
powerful
model
in
population
genomics.
Here
we
present
chromosome-level
genome
assemblies
of
69
accessions
from
global
species
range.
We
found
that
genomic
colinearity
is
very
conserved,
even
among
geographically
genetically
distant
accessions.
Along
chromosome
arms,
megabase-scale
rearrangements
are
rare
typically
only
single
accession.
This
indicates
the
karyotype
quasi-fixed
arms
counter-selected.
Centromeric
regions
display
higher
structural
dynamics,
divergences
core
centromeres
account
most
size
variations.
Pan-genome
analyses
uncovered
32,986
distinct
gene
families,
60%
being
all
40%
appearing
be
dispensable,
including
18%
private
accession,
indicating
unexplored
genic
diversity.
These
new
will
empower
future
genetic
research.