bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: April 26, 2024
Abstract
Inferring
the
demographic
history
of
populations
provides
fundamental
insights
into
species
dynamics
and
is
essential
for
developing
a
null
model
to
accurately
study
selective
processes.
However,
background
selection
sweeps
can
produce
genomic
signatures
at
linked
sites
that
mimic
or
mask
signals
associated
with
historical
population
size
change.
While
theoretical
biases
introduced
by
effects
have
been
well
established,
it
unclear
whether
ARG-based
approaches
inference
in
typical
empirical
analyses
are
susceptible
mis-inference
due
these
effects.
To
address
this,
we
developed
highly
realistic
forward
simulations
human
Drosophila
melanogaster
populations,
including
empirically
estimated
variability
gene
density,
mutation
rates,
recombination
purifying
positive
selection,
across
different
scenarios,
broadly
assess
impact
on
using
genealogy-based
approach.
Our
results
indicate
minimally
though
could
cause
similar
genome
architecture
parameters
experiencing
more
frequent
recurrent
sweeps.
We
found
accurate
D.
methods
compromised
presence
pervasive
alone,
leading
spurious
inferences
recent
expansion
which
may
be
further
worsened
sweeps,
depending
proportion
strength
beneficial
mutations.
Caution
additional
testing
species-specific
needed
when
inferring
non-human
avoid
selection.
Genetics,
Journal Year:
2024,
Volume and Issue:
228(1)
Published: July 16, 2024
Abstract
As
a
result
of
recombination,
adjacent
nucleotides
can
have
different
paths
genetic
inheritance
and
therefore
the
genealogical
trees
for
sample
DNA
sequences
vary
along
genome.
The
structure
capturing
details
these
intricately
interwoven
is
referred
to
as
an
ancestral
recombination
graph
(ARG).
Classical
formalisms
focused
on
mapping
coalescence
events
nodes
in
ARG.
However,
this
approach
out
step
with
some
modern
developments,
which
do
not
represent
terms
or
explicitly
infer
them.
We
present
simple
formalism
that
defines
ARG
specific
genomes
their
intervals
inheritance,
show
how
it
generalizes
classical
treatments
encompasses
outputs
recent
methods.
discuss
nuances
arising
from
more
general
structure,
argue
forms
appropriate
basis
software
standard
rapidly
growing
field.
Nature,
Journal Year:
2025,
Volume and Issue:
637(8044), P. 118 - 126
Published: Jan. 1, 2025
Abstract
Many
known
and
unknown
historical
events
have
remained
below
detection
thresholds
of
genetic
studies
because
subtle
ancestry
changes
are
challenging
to
reconstruct.
Methods
based
on
shared
haplotypes
1,2
rare
variants
3,4
improve
power
but
not
explicitly
temporal
been
possible
adopt
in
unbiased
models.
Here
we
develop
Twigstats,
an
approach
time-stratified
analysis
that
can
statistical
by
order
magnitude
focusing
coalescences
recent
times,
while
remaining
population-specific
drift.
We
apply
this
framework
1,556
available
ancient
whole
genomes
from
Europe
the
period.
able
model
individual-level
using
preceding
provide
high
resolution.
During
first
half
millennium
ce
,
observe
at
least
two
different
streams
Scandinavian-related
expanding
across
western,
central
eastern
Europe.
By
contrast,
during
second
patterns
suggest
regional
disappearance
or
substantial
admixture
these
ancestries.
In
Scandinavia,
document
a
major
influx
approximately
800
when
large
proportion
Viking
Age
individuals
carried
groups
related
seen
early
Iron
Age.
Our
findings
higher-resolution
lens
for
history.
PLoS Genetics,
Journal Year:
2025,
Volume and Issue:
21(1), P. e1011537 - e1011537
Published: Jan. 8, 2025
Inference
of
evolutionary
and
demographic
parameters
from
a
sample
genome
sequences
often
proceeds
by
first
inferring
identical-by-descent
(IBD)
segments.
By
exploiting
efficient
data
encoding
based
on
the
ancestral
recombination
graph
(ARG),
we
obtain
three
major
advantages
over
current
approaches:
(i)
no
need
to
impose
length
threshold
IBD
segments,
(ii)
can
be
defined
without
hard-to-verify
requirement
recombination,
(iii)
computation
time
reduced
with
little
loss
statistical
efficiency
using
only
segments
set
sequence
pairs
that
scales
linearly
size.
We
demonstrate
powerful
inferences
when
true
information
is
available
simulated
data.
For
inferred
real
data,
propose
an
approximate
Bayesian
inference
algorithm
use
it
show
even
poorly-inferred
short
improve
estimation.
Our
mutation-rate
estimator
achieves
precision
similar
previously-published
method
despite
4
000-fold
reduction
in
used
for
inference,
identify
significant
differences
between
human
populations.
Computational
cost
limits
model
complexity
our
approach,
but
are
able
incorporate
unknown
nuisance
misspecification,
still
finding
improved
parameter
inference.
Peer Community Journal,
Journal Year:
2024,
Volume and Issue:
4
Published: March 18, 2024
The
reproductive
mechanism
of
a
species
is
key
driver
genome
evolution.
standard
Wright-Fisher
model
for
the
reproduction
individuals
in
population
assumes
that
each
individual
produces
number
offspring
negligible
compared
to
total
size.
Yet
many
plants,
invertebrates,
prokaryotes
or
fish
exhibit
neutrally
skewed
distribution
strong
selection
events
yielding
few
produce
up
same
magnitude
as
As
result,
genealogy
sample
characterized
by
multiple
(more
than
two)
coalescing
simultaneously
common
ancestor.
current
methods
developed
detect
such
merger
do
not
account
complex
demographic
scenarios
recombination,
and
require
large
sizes.
We
tackle
these
limitations
developing
two
novel
different
approaches
infer
from
sequence
data
ancestral
recombination
graph
(ARG):
sequentially
Markovian
coalescent
(SMβC)
neural
network
(GNNcoal).
first
give
proof
accuracy
our
estimate
parameter
past
history
using
simulated
under
β-coalescent
model.
Secondly,
we
show
can
also
recover
effect
positive
selective
sweeps
along
genome.
Finally,
are
able
distinguish
while
inferring
variation
Our
findings
stress
aptitude
networks
leverage
information
ARG
inference
but
urgent
need
more
accurate
approaches.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 14, 2025
Elucidating
ancestry-specific
structures
in
admixed
populations
is
crucial
for
comprehending
population
history
and
mitigating
confounding
effects
genome-wide
association
studies.
Existing
methods
elucidating
the
generally
rely
on
frequency-based
estimates
of
genetic
relationship
matrix
(GRM)
among
individuals
after
masking
segments
from
ancestry
components
not
being
targeted
investigation.
However,
these
approaches
disregard
linkage
information
between
markers,
potentially
limiting
their
resolution
revealing
structure
within
an
component.
We
introduce
expected
GRM
(as-eGRM),
a
novel
framework
relatedness
individuals.
The
key
design
as-eGRM
consists
defining
pairwise
based
genealogical
trees
encoded
Ancestral
Recombination
Graph
(ARG)
local
calls
computing
expectation
across
genome.
Comprehensive
evaluations
using
both
simulated
stepping-stone
models
empirical
datasets
three-way
Latino
cohorts
showed
that
analysis
robustly
outperforms
existing
with
diverse
demographic
histories.
Taken
together,
has
promise
to
better
reveal
fine-scale
component
individuals,
which
can
help
improve
robustness
interpretation
findings
studies
disease
or
complex
traits
understudied
populations.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: March 5, 2025
Abstract
Genetic
relatedness
is
a
central
concept
in
genetics,
underpinning
studies
of
population
and
quantitative
genetics
human,
animal,
plant
settings.
It
typically
stored
as
genetic
matrix
(GRM),
whose
elements
are
pairwise
values
between
individuals.
This
has
been
defined
various
contexts
based
on
pedigree,
genotype,
phylogeny,
coalescent
times,
and,
recently,
ancestral
recombination
graph
(ARG).
ARG-based
GRMs
have
found
to
better
capture
the
structure
improve
association
relative
genotype
GRM.
However,
calculating
further
operations
with
them
fundamentally
challenging
due
inherent
quadratic
time
space
complexity.
Here,
we
first
discuss
different
definitions
unifying
context,
making
use
additive
model
trait
provide
definition
“branch
relatedness”
corresponding
GRM”.
We
explore
relationship
branch
pedigree
through
case
study
French-Canadian
individuals
that
known
pedigree.
Through
tree
sequence
encoding
an
ARG,
then
derive
efficient
algorithm
for
computing
products
GRM
general
vector,
without
explicitly
forming
leverages
sparse
genomes
hence
enables
large-scale
computations
demonstrate
power
this
by
developing
randomized
principal
components
sequences
easily
scales
millions
genomes.
All
algorithms
implemented
open
source
tskit
Python
package.
Taken
together,
work
consolidates
notions
leveraging
ARG
it
provides
enable
scale
mega-scale
genomic
datasets.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Sept. 22, 2024
The
demographic
history
of
a
population,
and
the
distribution
fitness
effects
(DFE)
newly
arising
mutations
in
functional
genomic
regions,
are
fundamental
factors
dictating
both
genetic
variation
evolutionary
trajectories.
Although
DFE
inference
has
been
performed
extensively
humans,
these
approaches
have
generally
either
limited
to
simple
models
involving
single
or,
where
complex
population
inferred,
without
accounting
for
potentially
confounding
selection
at
linked
sites.
Taking
advantage
coding-sparse
nature
genome,
we
propose
2-step
approach
which
coalescent
simulations
first
used
infer
multi-population
model,
utilizing
large
non-functional
regions
that
likely
free
from
background
selection.
We
then
use
forward-in-time
perform
conditional
on
demography
inferred
expected
estimation
procedure.
Throughout,
recombination
mutation
rate
maps
were
account
underlying
empirical
heterogeneity
across
human
genome.
Importantly,
within
this
framework
it
is
possible
utilize
fit
multiple
aspects
data,
scheme
represents
generalized
such
large-scale
species
with
genomes.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: Nov. 4, 2023
Abstract
As
a
result
of
recombination,
adjacent
nucleotides
can
have
different
paths
genetic
inheritance
and
therefore
the
genealogical
trees
for
sample
DNA
sequences
vary
along
genome.
The
structure
capturing
details
these
intricately
interwoven
is
referred
to
as
an
ancestral
recombination
graph
(ARG).
Classical
formalisms
focused
on
mapping
coalescence
events
nodes
in
ARG.
This
approach
out
step
with
modern
developments,
which
do
not
represent
terms
or
explicitly
infer
them.
We
present
simple
formalism
that
defines
ARG
specific
genomes
their
intervals
inheritance,
show
how
it
generalises
classical
treatments
encompasses
outputs
recent
methods.
discuss
nuances
arising
from
this
more
general
structure,
argue
forms
appropriate
basis
software
standard
rapidly
growing
field.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Feb. 21, 2024
ABSTRACT
As
population
genetics
data
increases
in
size
new
methods
have
been
developed
to
store
genetic
information
efficient
ways,
such
as
tree
sequences.
These
structures
are
computationally
and
storage
efficient,
but
not
interchangeable
with
existing
used
for
many
inference
methodologies
the
use
of
convolutional
neural
networks
(CNNs)
applied
alignments.
To
better
utilize
these
we
propose
implement
a
graph
network
(GCN)
directly
learn
from
sequence
topology
node
data,
allowing
applications
without
an
intermediate
step
converting
sequences
alignment
format.
We
then
compare
our
approach
standard
CNN
approaches
on
set
previously
defined
benchmarking
tasks
including
recombination
rate
estimation,
positive
selection
detection,
introgression
demographic
model
parameter
inference.
show
that
can
be
learned
using
GCN
perform
well
common
accuracies
roughly
matching
or
even
exceeding
CNN-based
method.
become
more
widely
research
foresee
developments
optimizations
this
work
provide
foundation
moving
forward.