BMC Bioinformatics,
Год журнала:
2023,
Номер
24(1)
Опубликована: Окт. 11, 2023
Spatial
genetic
variation
is
shaped
in
part
by
an
organism's
dispersal
ability.
We
present
a
deep
learning
tool,
disperseNN2,
for
estimating
the
mean
per-generation
distance
from
georeferenced
polymorphism
data.
Our
neural
network
performs
feature
extraction
on
pairs
of
genotypes,
and
uses
geographic
information
that
comes
with
each
sample.
These
attributes
led
disperseNN2
to
outperform
state-of-the-art
method
does
not
use
explicit
spatial
information:
relative
absolute
error
was
reduced
33%
48%
using
sample
sizes
10
100
individuals,
respectively.
particularly
useful
non-model
organisms
or
systems
sparse
genomic
resources,
as
it
unphased,
single
nucleotide
polymorphisms
its
input.
The
software
open
source
available
https://github.com/kr-colab/disperseNN2
,
documentation
located
at
https://dispersenn2.readthedocs.io/en/latest/
.
PLoS Genetics,
Год журнала:
2024,
Номер
20(2), С. e1010657 - e1010657
Опубликована: Фев. 20, 2024
A
growing
body
of
evidence
suggests
that
gene
flow
between
closely
related
species
is
a
widespread
phenomenon.
Alleles
introgress
from
one
into
close
relative
are
typically
neutral
or
deleterious,
but
sometimes
confer
significant
fitness
advantage.
Given
the
potential
relevance
to
speciation
and
adaptation,
numerous
methods
have
therefore
been
devised
identify
regions
genome
experienced
introgression.
Recently,
supervised
machine
learning
approaches
shown
be
highly
effective
for
detecting
One
especially
promising
approach
treat
population
genetic
inference
as
an
image
classification
problem,
feed
representation
alignment
input
deep
neural
network
distinguishes
among
evolutionary
models
(i.e.
introgression
no
introgression).
However,
if
we
wish
investigate
full
extent
effects
introgression,
merely
identifying
genomic
in
harbor
introgressed
loci
insufficient—ideally
would
able
infer
precisely
which
individuals
material
at
positions
genome.
Here
adapt
algorithm
semantic
segmentation,
task
correctly
type
object
each
individual
pixel
belongs,
alleles.
Our
trained
thus
infer,
two-population
alignment,
those
individual’s
alleles
were
other
population.
We
use
simulated
data
show
this
accurate,
it
can
readily
extended
unsampled
“ghost”
population,
performing
comparably
method
tailored
specifically
task.
Finally,
apply
Drosophila
,
showing
accurately
recover
haplotypes
real
data.
This
analysis
reveals
confined
lower
frequencies
within
genic
regions,
suggestive
purifying
selection,
found
much
higher
region
previously
affected
by
adaptive
method’s
success
recovering
challenging
real-world
scenarios
underscores
utility
making
richer
inferences
Abstract
Understanding
natural
selection
and
other
forms
of
non-neutrality
is
a
major
focus
for
the
use
machine
learning
in
population
genetics.
Existing
methods
rely
on
computationally
intensive
simulated
training
data.
Unlike
efficient
neutral
coalescent
simulations
demographic
inference,
realistic
typically
require
slow
forward
simulations.
Because
there
are
many
possible
modes
selection,
high
dimensional
parameter
space
must
be
explored,
with
no
guarantee
that
models
close
to
real
processes.
Finally,
it
difficult
interpret
trained
neural
networks,
leading
lack
understanding
about
what
features
contribute
classification.
Here
we
develop
new
approach
detect
local
evolutionary
processes
requires
relatively
few
during
training.
We
build
upon
generative
adversarial
network
simulate
This
consists
generator
(fitted
model),
discriminator
(convolutional
network)
predicts
whether
genomic
region
or
fake.
As
can
only
generate
data
under
processes,
regions
recognizes
as
having
probability
being
“real”
do
not
fit
model
therefore
candidates
targets
selection.
To
incentivize
identification
specific
mode
fine-tune
small
number
custom
non-neutral
show
this
has
power
various
simulations,
finds
positive
identified
by
state-of-the-art
genetic
three
human
populations.
how
networks
clustering
hidden
units
based
their
correlation
patterns
known
summary
statistics.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Апрель 14, 2024
Abstract
Spatial
patterns
of
genetic
relatedness
among
samples
reflect
the
past
movements
their
ancestors.
Our
ability
to
untangle
this
history
has
potential
improve
dramatically
given
that
we
can
now
infer
ultimate
description
relatedness,
ancestral
recombination
graph
(ARG).
By
extending
spatial
theory
previously
applied
trees,
generalize
common
model
Brownian
motion
full
ARGs,
thereby
accounting
for
correlations
in
trees
along
a
chromosome
while
efficiently
computing
likelihood-based
estimates
dispersal
rate
and
ancestor
locations,
with
associated
uncertainties.
We
evaluate
model’s
reconstruct
histories
using
individual-based
simulations
unfortunately
find
clear
bias
locations.
investigate
causes
bias,
pinpointing
discrepancy
between
true
process
at
events.
This
highlights
key
hurdle
ubiquitous
analytically-tractable
from
which
otherwise
provide
an
efficient
method
inference,
uncertainties,
all
information
available
ARG.
Spatial
patterns
in
genetic
diversity
are
shaped
by
individuals
dispersing
from
their
parents
and
larger-scale
population
movements.
It
has
long
been
appreciated
that
these
of
movement
shape
the
underlying
genealogies
along
genome
leading
to
geographic
isolation-by-distance
contemporary
data.
However,
extracting
enormous
amount
information
contained
recombining
sequences
has,
until
recently,
not
computationally
feasible.
Here,
we
capitalize
on
important
recent
advances
genome-wide
gene-genealogy
reconstruction
develop
methods
use
thousands
trees
estimate
per-generation
dispersal
rates
locate
ancestors
a
sample
back
through
time.
We
take
likelihood
approach
continuous
space
using
simple
approximate
model
(branching
Brownian
motion)
as
our
prior
distribution
spatial
genealogies.
After
testing
method
with
simulations
apply
it
Arabidopsis
thaliana.
rate
roughly
60
km2/generation,
slightly
higher
across
latitude
than
longitude,
potentially
reflecting
northward
post-glacial
expansion.
Locating
allows
us
visualize
major
movements,
alternative
histories,
admixture.
Our
highlights
huge
about
past
events
movements
Abstract
Numerous
studies
over
the
last
decade
have
demonstrated
utility
of
machine
learning
methods
when
applied
to
population
genetic
tasks.
More
recent
show
potential
deep-learning
in
particular,
which
allow
researchers
approach
problems
without
making
prior
assumptions
about
how
data
should
be
summarized
or
manipulated,
instead
their
own
internal
representation
an
attempt
maximize
inferential
accuracy.
One
type
deep
neural
network,
called
Generative
Adversarial
Networks
(GANs),
can
even
used
generate
new
data,
and
this
has
been
create
individual
artificial
human
genomes
free
from
privacy
concerns.
In
study,
we
further
explore
application
GANs
genetics
by
designing
training
a
network
learn
statistical
distribution
alignments
(i.e.
sets
consisting
sequences
entire
sample)
under
several
diverse
evolutionary
histories—the
first
GAN
capable
performing
task.
After
testing
multiple
different
architectures,
report
results
fully
differentiable
Deep-Convolutional
Wasserstein
with
gradient
penalty
that
is
generating
examples
successfully
mimic
key
aspects
including
site-frequency
spectrum,
differentiation
between
populations,
patterns
linkage
disequilibrium.
We
demonstrate
consistent
success
across
various
models,
models
panmictic
subdivided
populations
at
equilibrium
experiencing
changes
size,
either
no
selection
positive
strengths,
all
need
for
extensive
hyperparameter
tuning.
Overall,
our
findings
highlight
ability
suggest
future
areas
where
work
research
discuss
herein.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Фев. 21, 2024
ABSTRACT
As
population
genetics
data
increases
in
size
new
methods
have
been
developed
to
store
genetic
information
efficient
ways,
such
as
tree
sequences.
These
structures
are
computationally
and
storage
efficient,
but
not
interchangeable
with
existing
used
for
many
inference
methodologies
the
use
of
convolutional
neural
networks
(CNNs)
applied
alignments.
To
better
utilize
these
we
propose
implement
a
graph
network
(GCN)
directly
learn
from
sequence
topology
node
data,
allowing
applications
without
an
intermediate
step
converting
sequences
alignment
format.
We
then
compare
our
approach
standard
CNN
approaches
on
set
previously
defined
benchmarking
tasks
including
recombination
rate
estimation,
positive
selection
detection,
introgression
demographic
model
parameter
inference.
show
that
can
be
learned
using
GCN
perform
well
common
accuracies
roughly
matching
or
even
exceeding
CNN-based
method.
become
more
widely
research
foresee
developments
optimizations
this
work
provide
foundation
moving
forward.
Simulations
are
an
essential
tool
in
all
areas
of
population
genetic
research,
used
tasks
such
as
the
validation
theoretical
analysis
and
study
complex
evolutionary
models.
Forward-in-time
simulations
especially
flexible,
allowing
for
various
types
natural
selection,
architectures,
non-Wright-Fisher
dynamics.
However,
their
intense
computational
requirements
can
be
prohibitive
to
simulating
large
populations
genomes.
A
popular
method
alleviate
this
burden
is
scale
down
size
by
some
scaling
factor
while
up
mutation
rate,
selection
coefficients,
recombination
rate
same
factor.
rescaling
approach
may
cases
bias
simulation
results.
To
investigate
manner
degree
which
impacts
outcomes,
we
carried
out
with
different
demographic
histories
distributions
fitness
effects
using
several
values
factor,
Ǫ,
compared
deviation
key
outcomes
(fixation
times,
allele
frequencies,
linkage
disequilibrium,
fraction
mutations
that
fix
during
simulation)
between
scaled
unscaled
simulations.
Our
results
indicate
introduces
substantial
biases
each
these
measured
even
at
small
Ʈ.
Moreover,
nature
depends
on
model
being
examined.
While
increasing
tends
increase
observed
biases,
relationship
not
always
straightforward,
thus
it
difficult
know
impact
a
priori.
appears
most
models,
only
number
replicates
was
needed
accurately
quantify
produced
given
In
summary,
forward-in-time
necessary
many
cases,
researchers
should
aware
procedure's
consider
investigating
its
magnitude
smaller
desired
model(s)
before
selecting
appropriate
value
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2025,
Номер
unknown
Опубликована: Фев. 9, 2025
Abstract
As
organisms
adapt
to
environmental
changes,
natural
selection
modifies
the
frequency
of
non-neutral
alleles.
For
beneficial
mutations,
outcome
this
process
may
be
a
selective
sweep,
in
which
an
allele
rapidly
increases
and
perhaps
reaches
fixation
within
population.
Selective
sweeps
have
well-studied
effects
on
patterns
local
genetic
variation
panmictic
populations,
but
much
less
is
known
about
dynamics
continuous
space.
In
particular,
because
limited
movement
across
landscape
leads
unique
population
structure,
spatial
influence
trajectory
selected
mutations.
Here,
we
use
forward-in-time,
individual-based
simulations
space
study
impact
mutations
as
they
sweep
through
show
that
changes
joint
distribution
geographic
range
occupied
by
focal
demonstrate
signal
can
used
identify
sweeps.
We
then
leverage
in-progress
malaria
vector
Anopheles
gambiae
,
species
under
strong
pressure
from
control
measures.
By
considering
space,
multiple
previously
undescribed
variants
with
potential
phenotypic
consequences,
including
im-pacting
IR-associated
genes
altering
protein
structure
properties.
Our
results
novel
for
detecting
data
implications
genomic
surveillance
understanding
variation.
Ecology and Evolution,
Год журнала:
2025,
Номер
15(4)
Опубликована: Апрель 1, 2025
ABSTRACT
Individual‐based
simulation
has
become
an
increasingly
crucial
tool
for
many
fields
of
population
biology.
However,
continuous
geography
is
important
to
applications,
and
implementing
realistic
stable
simulations
in
space
presents
a
variety
difficulties,
from
modeling
choices
computational
efficiency.
This
paper
aims
be
practical
guide
spatial
simulation,
helping
researchers
implement
individual‐based
avoid
common
pitfalls.
To
do
this,
we
delve
into
mechanisms
mating,
reproduction,
density‐dependent
feedback,
dispersal,
all
which
may
vary
across
the
landscape,
discuss
how
these
affect
dynamics,
describe
parameterize
convenient
ways
(for
instance,
achieve
desired
density).
We
also
demonstrate
models
using
current
version
simulator,
SLiM.
additionally
natural
selection—in
particular,
genetic
variation
can
demographic
processes.
Finally,
provide
four
short
vignettes:
pikas
that
shift
their
range
up
mountain
as
temperatures
rise;
mosquitoes
live
rivers
juveniles
experience
seasonally
changing
habitat;
cane
toads
expand
Australia,
reaching
120
million
individuals;
monarch
butterflies
whose
populations
are
regulated
by
explicitly
modeled
resource
(milkweed).