Genome Biology and Evolution,
Journal Year:
2023,
Volume and Issue:
15(2)
Published: Jan. 23, 2023
Abstract
Population
genetics
is
transitioning
into
a
data-driven
discipline
thanks
to
the
availability
of
large-scale
genomic
data
and
need
study
increasingly
complex
evolutionary
scenarios.
With
likelihood
Bayesian
approaches
becoming
either
intractable
or
computationally
unfeasible,
machine
learning,
in
particular
deep
algorithms
are
emerging
as
popular
techniques
for
population
genetic
inferences.
These
rely
on
that
learn
non-linear
relationships
between
input
model
parameters
being
estimated
through
representation
learning
from
training
sets.
Deep
currently
employed
field
comprise
discriminative
generative
models
with
fully
connected,
convolutional,
recurrent
layers.
Additionally,
wide
range
powerful
simulators
generate
under
scenarios
now
available.
The
application
empirical
sets
mostly
replicates
previous
findings
demography
reconstruction
signals
natural
selection
organisms.
To
showcase
feasibility
tackle
new
challenges,
we
designed
branched
architecture
detect
recent
balancing
temporal
haplotypic
data,
which
exhibited
good
predictive
performance
simulated
data.
Investigations
interpretability
neural
networks,
their
robustness
uncertain
creative
will
provide
further
opportunities
technological
advancements
field.
Science,
Journal Year:
2022,
Volume and Issue:
376(6588), P. 44 - 53
Published: March 31, 2022
Since
its
initial
release
in
2000,
the
human
reference
genome
has
covered
only
euchromatic
fraction
of
genome,
leaving
important
heterochromatic
regions
unfinished.
Addressing
remaining
8%
Telomere-to-Telomere
(T2T)
Consortium
presents
a
complete
3.055
billion–base
pair
sequence
T2T-CHM13,
that
includes
gapless
assemblies
for
all
chromosomes
except
Y,
corrects
errors
prior
references,
and
introduces
nearly
200
million
base
pairs
containing
1956
gene
predictions,
99
which
are
predicted
to
be
protein
coding.
The
completed
include
centromeric
satellite
arrays,
recent
segmental
duplications,
short
arms
five
acrocentric
chromosomes,
unlocking
these
complex
variational
functional
studies.
Nature,
Journal Year:
2023,
Volume and Issue:
617(7960), P. 312 - 324
Published: May 10, 2023
Abstract
Here
the
Human
Pangenome
Reference
Consortium
presents
a
first
draft
of
human
pangenome
reference.
The
contains
47
phased,
diploid
assemblies
from
cohort
genetically
diverse
individuals
1
.
These
cover
more
than
99%
expected
sequence
in
each
genome
and
are
accurate
at
structural
base
pair
levels.
Based
on
alignments
assemblies,
we
generate
that
captures
known
variants
haplotypes
reveals
new
alleles
structurally
complex
loci.
We
also
add
119
million
pairs
euchromatic
polymorphic
sequences
1,115
gene
duplications
relative
to
existing
reference
GRCh38.
Roughly
90
additional
derived
variation.
Using
our
analyse
short-read
data
reduced
small
variant
discovery
errors
by
34%
increased
number
detected
per
haplotype
104%
compared
with
GRCh38-based
workflows,
which
enabled
typing
vast
majority
sample.
Science,
Journal Year:
2022,
Volume and Issue:
376(6588)
Published: March 31, 2022
Existing
human
genome
assemblies
have
almost
entirely
excluded
repetitive
sequences
within
and
near
centromeres,
limiting
our
understanding
of
their
organization,
evolution,
functions,
which
include
facilitating
proper
chromosome
segregation.
Now,
a
complete,
telomere-to-telomere
assembly
(T2T-CHM13)
has
enabled
us
to
comprehensively
characterize
pericentromeric
centromeric
repeats,
constitute
6.2%
the
(189.9
megabases).
Detailed
maps
these
regions
revealed
multimegabase
structural
rearrangements,
including
in
active
repeat
arrays.
Analysis
centromere-associated
uncovered
strong
relationship
between
position
centromere
evolution
surrounding
DNA
through
layered
expansions.
Furthermore,
comparisons
X
centromeres
across
diverse
panel
individuals
illuminated
high
degrees
structural,
epigenetic,
sequence
variation
complex
rapidly
evolving
regions.
Science,
Journal Year:
2022,
Volume and Issue:
376(6588)
Published: March 31, 2022
Compared
to
its
predecessors,
the
Telomere-to-Telomere
CHM13
genome
adds
nearly
200
million
base
pairs
of
sequence,
corrects
thousands
structural
errors,
and
unlocks
most
complex
regions
human
for
clinical
functional
study.
We
show
how
this
reference
universally
improves
read
mapping
variant
calling
3202
17
globally
diverse
samples
sequenced
with
short
long
reads,
respectively.
identify
hundreds
variants
per
sample
in
previously
unresolved
regions,
showcasing
promise
T2T-CHM13
evolutionary
biomedical
discovery.
Simultaneously,
eliminates
tens
spurious
sample,
including
reduction
false
positives
269
medically
relevant
genes
by
up
a
factor
12.
Because
these
improvements
discovery
coupled
population
genomic
resources,
is
positioned
replace
GRCh38
as
prevailing
genetics.
Nature Genetics,
Journal Year:
2022,
Volume and Issue:
54(4), P. 518 - 525
Published: April 1, 2022
Abstract
Typical
genotyping
workflows
map
reads
to
a
reference
genome
before
identifying
genetic
variants.
Generating
such
alignments
introduces
biases
and
comes
with
substantial
computational
burden.
Furthermore,
short-read
lengths
limit
the
ability
characterize
repetitive
genomic
regions,
which
are
particularly
challenging
for
fast
k
-mer-based
genotypers.
In
present
study,
we
propose
new
algorithm,
PanGenie,
that
leverages
haplotype-resolved
pangenome
together
-mer
counts
from
sequencing
data
genotype
wide
spectrum
of
variation—a
process
refer
as
inference.
Compared
mapping-based
approaches,
PanGenie
is
more
than
4
times
faster
at
30-fold
coverage
achieves
better
concordances
almost
all
variant
types
coverages
tested.
Improvements
especially
pronounced
large
insertions
(≥50
bp)
variants
in
enabling
inclusion
these
classes
genome-wide
association
studies.
efficiently
increasing
amount
assemblies
unravel
functional
impact
previously
inaccessible
while
being
compared
alignment-based
workflows.
Science,
Journal Year:
2021,
Volume and Issue:
373(6562), P. 1499 - 1505
Published: Sept. 23, 2021
Repeats
associated
with
phenotype
The
degree
to
which
repeated
sequences
within
a
genome
affect
human
phenotypes
has
been
difficult
establish.
Mukamel
et
al
.
examined
thousands
of
genomes
in
the
UK
Biobank
and
found
that
some
largest
effects
common
genetic
variants
on
phenotypes,
including
those
clinical
relevance,
arise
from
protein-coding
repeat
polymorphisms
(see
Perspective
by
Gymrek
Goren).
Mapping
size
copy
number
these
protein
domains
links
variation
lipoprotein(a)
concentration,
height,
male
pattern
balding.
Furthermore,
alleles
frequencies
differ
between
individuals
African
European
descent,
resulting
differences
populations
relevance
for
traits
levels,
risk
factor
coronary
artery
disease.
—LMZ
Cell,
Journal Year:
2022,
Volume and Issue:
185(11), P. 1986 - 2005.e26
Published: May 1, 2022
Unlike
copy
number
variants
(CNVs),
inversions
remain
an
underexplored
genetic
variation
class.
By
integrating
multiple
genomic
technologies,
we
discover
729
in
41
human
genomes.
Approximately
85%
of
<2
kbp
form
by
twin-priming
during
L1
retrotransposition;
80%
the
larger
are
balanced
and
affect
twice
as
many
nucleotides
CNVs.
Balanced
show
excess
common
variants,
72%
flanked
segmental
duplications
(SDs)
or
retrotransposons.
Since
flanking
repeats
promote
non-allelic
homologous
recombination,
developed
complementary
approaches
to
identify
recurrent
inversion
formation.
We
describe
40
encompassing
0.6%
genome,
showing
rates
up
2.7
×
10−4
per
locus
generation.
Recurrent
exhibit
a
sex-chromosomal
bias
co-localize
with
disorder
critical
regions.
propose
that
recurrence
results
elevated
heterozygous
carriers
structural
SD
diversity,
which
increases
mutability
population
predisposes
specific
haplotypes
disease-causing