Molecular Biology and Evolution,
Journal Year:
2024,
Volume and Issue:
41(7)
Published: June 14, 2024
Inferring
the
demographic
history
of
populations
provides
fundamental
insights
into
species
dynamics
and
is
essential
for
developing
a
null
model
to
accurately
study
selective
processes.
However,
background
selection
sweeps
can
produce
genomic
signatures
at
linked
sites
that
mimic
or
mask
signals
associated
with
historical
population
size
change.
While
theoretical
biases
introduced
by
effects
have
been
well
established,
it
unclear
whether
ancestral
recombination
graph
(ARG)-based
approaches
inference
in
typical
empirical
analyses
are
susceptible
misinference
due
these
effects.
To
address
this,
we
developed
highly
realistic
forward
simulations
human
Drosophila
melanogaster
populations,
including
empirically
estimated
variability
gene
density,
mutation
rates,
purifying,
positive
selection,
across
different
scenarios,
broadly
assess
impact
on
using
genealogy-based
approach.
Our
results
indicate
minimally
although
could
cause
similar
genome
architecture
parameters
experiencing
more
frequent
recurrent
sweeps.
We
found
accurate
D.
ARG-based
methods
compromised
presence
pervasive
alone,
leading
spurious
inferences
recent
expansion,
which
may
be
further
worsened
sweeps,
depending
proportion
strength
beneficial
mutations.
Caution
additional
testing
species-specific
needed
when
inferring
non-human
avoid
selection.
Genetics,
Journal Year:
2024,
Volume and Issue:
228(1)
Published: July 16, 2024
Abstract
As
a
result
of
recombination,
adjacent
nucleotides
can
have
different
paths
genetic
inheritance
and
therefore
the
genealogical
trees
for
sample
DNA
sequences
vary
along
genome.
The
structure
capturing
details
these
intricately
interwoven
is
referred
to
as
an
ancestral
recombination
graph
(ARG).
Classical
formalisms
focused
on
mapping
coalescence
events
nodes
in
ARG.
However,
this
approach
out
step
with
some
modern
developments,
which
do
not
represent
terms
or
explicitly
infer
them.
We
present
simple
formalism
that
defines
ARG
specific
genomes
their
intervals
inheritance,
show
how
it
generalizes
classical
treatments
encompasses
outputs
recent
methods.
discuss
nuances
arising
from
more
general
structure,
argue
forms
appropriate
basis
software
standard
rapidly
growing
field.
Nature,
Journal Year:
2025,
Volume and Issue:
637(8044), P. 118 - 126
Published: Jan. 1, 2025
Abstract
Many
known
and
unknown
historical
events
have
remained
below
detection
thresholds
of
genetic
studies
because
subtle
ancestry
changes
are
challenging
to
reconstruct.
Methods
based
on
shared
haplotypes
1,2
rare
variants
3,4
improve
power
but
not
explicitly
temporal
been
possible
adopt
in
unbiased
models.
Here
we
develop
Twigstats,
an
approach
time-stratified
analysis
that
can
statistical
by
order
magnitude
focusing
coalescences
recent
times,
while
remaining
population-specific
drift.
We
apply
this
framework
1,556
available
ancient
whole
genomes
from
Europe
the
period.
able
model
individual-level
using
preceding
provide
high
resolution.
During
first
half
millennium
ce
,
observe
at
least
two
different
streams
Scandinavian-related
expanding
across
western,
central
eastern
Europe.
By
contrast,
during
second
patterns
suggest
regional
disappearance
or
substantial
admixture
these
ancestries.
In
Scandinavia,
document
a
major
influx
approximately
800
when
large
proportion
Viking
Age
individuals
carried
groups
related
seen
early
Iron
Age.
Our
findings
higher-resolution
lens
for
history.
Peer Community Journal,
Journal Year:
2024,
Volume and Issue:
4
Published: March 18, 2024
The
reproductive
mechanism
of
a
species
is
key
driver
genome
evolution.
standard
Wright-Fisher
model
for
the
reproduction
individuals
in
population
assumes
that
each
individual
produces
number
offspring
negligible
compared
to
total
size.
Yet
many
plants,
invertebrates,
prokaryotes
or
fish
exhibit
neutrally
skewed
distribution
strong
selection
events
yielding
few
produce
up
same
magnitude
as
As
result,
genealogy
sample
characterized
by
multiple
(more
than
two)
coalescing
simultaneously
common
ancestor.
current
methods
developed
detect
such
merger
do
not
account
complex
demographic
scenarios
recombination,
and
require
large
sizes.
We
tackle
these
limitations
developing
two
novel
different
approaches
infer
from
sequence
data
ancestral
recombination
graph
(ARG):
sequentially
Markovian
coalescent
(SMβC)
neural
network
(GNNcoal).
first
give
proof
accuracy
our
estimate
parameter
past
history
using
simulated
under
β-coalescent
model.
Secondly,
we
show
can
also
recover
effect
positive
selective
sweeps
along
genome.
Finally,
are
able
distinguish
while
inferring
variation
Our
findings
stress
aptitude
networks
leverage
information
ARG
inference
but
urgent
need
more
accurate
approaches.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: March 29, 2024
Describing
the
distribution
of
genetic
variation
across
individuals
is
a
fundamental
goal
population
genetics.
In
humans,
traditional
approaches
for
describing
often
rely
on
discrete
ancestry
labels,
which,
despite
their
utility,
can
obscure
complex,
multi-faceted
nature
human
history.
These
labels
risk
oversimplifying
by
ignoring
its
temporal
depth
and
geographic
continuity,
may
therefore
conflate
notions
race,
ethnicity,
geography,
ancestry.
Here,
we
present
method
that
capitalizes
rich
genealogical
information
encoded
in
genomic
tree
sequences
to
infer
locations
shared
ancestors
sample
sequenced
individuals.
We
use
this
history
set
genomes
sampled
from
Europe,
Asia,
Africa,
accurately
recovering
major
movements
those
continents.
Our
findings
demonstrate
importance
defining
spatial-temporal
context
caution
against
oversimplified
interpretations
data
prevalent
contemporary
discussions
race
PLoS Genetics,
Journal Year:
2025,
Volume and Issue:
21(1), P. e1011537 - e1011537
Published: Jan. 8, 2025
Inference
of
evolutionary
and
demographic
parameters
from
a
sample
genome
sequences
often
proceeds
by
first
inferring
identical-by-descent
(IBD)
segments.
By
exploiting
efficient
data
encoding
based
on
the
ancestral
recombination
graph
(ARG),
we
obtain
three
major
advantages
over
current
approaches:
(i)
no
need
to
impose
length
threshold
IBD
segments,
(ii)
can
be
defined
without
hard-to-verify
requirement
recombination,
(iii)
computation
time
reduced
with
little
loss
statistical
efficiency
using
only
segments
set
sequence
pairs
that
scales
linearly
size.
We
demonstrate
powerful
inferences
when
true
information
is
available
simulated
data.
For
inferred
real
data,
propose
an
approximate
Bayesian
inference
algorithm
use
it
show
even
poorly-inferred
short
improve
estimation.
Our
mutation-rate
estimator
achieves
precision
similar
previously-published
method
despite
4
000-fold
reduction
in
used
for
inference,
identify
significant
differences
between
human
populations.
Computational
cost
limits
model
complexity
our
approach,
but
are
able
incorporate
unknown
nuisance
misspecification,
still
finding
improved
parameter
inference.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 14, 2025
Elucidating
ancestry-specific
structures
in
admixed
populations
is
crucial
for
comprehending
population
history
and
mitigating
confounding
effects
genome-wide
association
studies.
Existing
methods
elucidating
the
generally
rely
on
frequency-based
estimates
of
genetic
relationship
matrix
(GRM)
among
individuals
after
masking
segments
from
ancestry
components
not
being
targeted
investigation.
However,
these
approaches
disregard
linkage
information
between
markers,
potentially
limiting
their
resolution
revealing
structure
within
an
component.
We
introduce
expected
GRM
(as-eGRM),
a
novel
framework
relatedness
individuals.
The
key
design
as-eGRM
consists
defining
pairwise
based
genealogical
trees
encoded
Ancestral
Recombination
Graph
(ARG)
local
calls
computing
expectation
across
genome.
Comprehensive
evaluations
using
both
simulated
stepping-stone
models
empirical
datasets
three-way
Latino
cohorts
showed
that
analysis
robustly
outperforms
existing
with
diverse
demographic
histories.
Taken
together,
has
promise
to
better
reveal
fine-scale
component
individuals,
which
can
help
improve
robustness
interpretation
findings
studies
disease
or
complex
traits
understudied
populations.
Science,
Journal Year:
2025,
Volume and Issue:
387(6741), P. 1391 - 1397
Published: March 27, 2025
Describing
the
distribution
of
genetic
variation
across
individuals
is
a
fundamental
goal
population
genetics.
We
present
method
that
capitalizes
on
rich
genealogical
information
encoded
in
genomic
tree
sequences
to
infer
geographic
locations
shared
ancestors
sample
sequenced
individuals.
used
this
history
ancestry
set
human
genomes
sampled
from
Europe,
Asia,
and
Africa,
accurately
recovering
major
movements
those
continents.
Our
findings
demonstrate
importance
defining
spatiotemporal
context
when
describing
caution
against
oversimplified
interpretations
data
prevalent
contemporary
discussions
race
ancestry.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: May 30, 2024
Our
view
of
genetic
polymorphism
is
shaped
by
methods
that
provide
a
limited
and
reference-biased
picture.
Long-read
sequencing
technologies,
which
are
starting
to
nearly
complete
genome
sequences
for
population
samples,
should
solve
the
problem—except
characterizing
making
sense
non-SNP
variation
difficult
even
with
perfect
sequence
data.
Here,
we
analyze
27
genomes
Arabidopsis
thaliana
in
an
attempt
address
these
issues,
illustrate
what
can
be
learned
analyzing
whole-genome
data
unbiased
manner.
Estimated
sizes
range
from
135
155
Mb,
differences
almost
entirely
due
centromeric
rDNA
repeats.
The
completely
assembled
chromosome
arms
comprise
roughly
120
Mb
all
accessions,
but
full
structural
variants,
many
caused
insertions
transposable
elements
(TEs)
subsequent
partial
deletions
such
insertions.
Even
only
pan-genome
coordinate
system
includes
resulting
ends
up
being
40%
larger
than
size
any
one
genome.
analysis
reveals
incompletely
annotated
mobile-ome:
our
ability
predict
actually
moving
poor,
detect
several
novel
TE
families.
In
contrast
this,
genic
portion,
or
“gene-ome”,
highly
conserved.
By
annotating
each
using
accession-specific
transcriptome
data,
find
13%
genes
segregating
most
transcriptionally
silenced.
Finally,
show
short-read
previously
massively
underestimated
kinds,
including
SNPs—mostly
regions
where
short
reads
could
not
mapped
reliably,
also
were
incorrectly.
We
demonstrate
SNP-calling
errors
biased
choice
reference
genome,
RNA-seq
BS-seq
results
strongly
affected
mapping
rather
assayed
individual.
conclusion,
while
pose
tremendous
analytical
challenges,
they
will
ultimately
revolutionize
understanding
evolution.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: Nov. 4, 2023
Abstract
As
a
result
of
recombination,
adjacent
nucleotides
can
have
different
paths
genetic
inheritance
and
therefore
the
genealogical
trees
for
sample
DNA
sequences
vary
along
genome.
The
structure
capturing
details
these
intricately
interwoven
is
referred
to
as
an
ancestral
recombination
graph
(ARG).
Classical
formalisms
focused
on
mapping
coalescence
events
nodes
in
ARG.
This
approach
out
step
with
modern
developments,
which
do
not
represent
terms
or
explicitly
infer
them.
We
present
simple
formalism
that
defines
ARG
specific
genomes
their
intervals
inheritance,
show
how
it
generalises
classical
treatments
encompasses
outputs
recent
methods.
discuss
nuances
arising
from
this
more
general
structure,
argue
forms
appropriate
basis
software
standard
rapidly
growing
field.
Spatial
patterns
in
genetic
diversity
are
shaped
by
individuals
dispersing
from
their
parents
and
larger-scale
population
movements.
It
has
long
been
appreciated
that
these
of
movement
shape
the
underlying
genealogies
along
genome
leading
to
geographic
isolation-by-distance
contemporary
data.
However,
extracting
enormous
amount
information
contained
recombining
sequences
has,
until
recently,
not
computationally
feasible.
Here,
we
capitalize
on
important
recent
advances
genome-wide
gene-genealogy
reconstruction
develop
methods
use
thousands
trees
estimate
per-generation
dispersal
rates
locate
ancestors
a
sample
back
through
time.
We
take
likelihood
approach
continuous
space
using
simple
approximate
model
(branching
Brownian
motion)
as
our
prior
distribution
spatial
genealogies.
After
testing
method
with
simulations
apply
it
Arabidopsis
thaliana.
rate
roughly
60
km2/generation,
slightly
higher
across
latitude
than
longitude,
potentially
reflecting
northward
post-glacial
expansion.
Locating
allows
us
visualize
major
movements,
alternative
histories,
admixture.
Our
highlights
huge
about
past
events
movements