Harnessing deep learning for population genetic inference
Nature Reviews Genetics,
Journal Year:
2023,
Volume and Issue:
25(1), P. 61 - 78
Published: Sept. 4, 2023
Language: Английский
Applications of machine learning in phylogenetics
Molecular Phylogenetics and Evolution,
Journal Year:
2024,
Volume and Issue:
196, P. 108066 - 108066
Published: March 31, 2024
Language: Английский
Interpreting generative adversarial networks to infer natural selection from genetic data
Rebecca Riley,
No information about this author
Iain Mathieson,
No information about this author
Sara Mathieson
No information about this author
et al.
Genetics,
Journal Year:
2024,
Volume and Issue:
226(4)
Published: Feb. 22, 2024
Abstract
Understanding
natural
selection
and
other
forms
of
non-neutrality
is
a
major
focus
for
the
use
machine
learning
in
population
genetics.
Existing
methods
rely
on
computationally
intensive
simulated
training
data.
Unlike
efficient
neutral
coalescent
simulations
demographic
inference,
realistic
typically
require
slow
forward
simulations.
Because
there
are
many
possible
modes
selection,
high
dimensional
parameter
space
must
be
explored,
with
no
guarantee
that
models
close
to
real
processes.
Finally,
it
difficult
interpret
trained
neural
networks,
leading
lack
understanding
about
what
features
contribute
classification.
Here
we
develop
new
approach
detect
local
evolutionary
processes
requires
relatively
few
during
training.
We
build
upon
generative
adversarial
network
simulate
This
consists
generator
(fitted
model),
discriminator
(convolutional
network)
predicts
whether
genomic
region
or
fake.
As
can
only
generate
data
under
processes,
regions
recognizes
as
having
probability
being
“real”
do
not
fit
model
therefore
candidates
targets
selection.
To
incentivize
identification
specific
mode
fine-tune
small
number
custom
non-neutral
show
this
has
power
various
simulations,
finds
positive
identified
by
state-of-the-art
genetic
three
human
populations.
how
networks
clustering
hidden
units
based
their
correlation
patterns
known
summary
statistics.
Language: Английский
Tree sequences as a general-purpose tool for population genetic inference
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Feb. 21, 2024
ABSTRACT
As
population
genetics
data
increases
in
size
new
methods
have
been
developed
to
store
genetic
information
efficient
ways,
such
as
tree
sequences.
These
structures
are
computationally
and
storage
efficient,
but
not
interchangeable
with
existing
used
for
many
inference
methodologies
the
use
of
convolutional
neural
networks
(CNNs)
applied
alignments.
To
better
utilize
these
we
propose
implement
a
graph
network
(GCN)
directly
learn
from
sequence
topology
node
data,
allowing
applications
without
an
intermediate
step
converting
sequences
alignment
format.
We
then
compare
our
approach
standard
CNN
approaches
on
set
previously
defined
benchmarking
tasks
including
recombination
rate
estimation,
positive
selection
detection,
introgression
demographic
model
parameter
inference.
show
that
can
be
learned
using
GCN
perform
well
common
accuracies
roughly
matching
or
even
exceeding
CNN-based
method.
become
more
widely
research
foresee
developments
optimizations
this
work
provide
foundation
moving
forward.
Language: Английский
Applications of Machine Learning in Phylogenetics
Published: Oct. 14, 2023
Machine
learning
has
increasingly
been
applied
to
a
wide
range
of
questions
in
phylogeneticinference.
Supervised
machine
approaches
that
rely
on
simulated
training
data
have
beenused
infer
tree
topologies
and
branch
lengths,
select
substitution
models,
performdownstream
inferences
introgression
diversification.
Here,
we
review
how
researchers
haveused
several
promising
make
phylogenetic
inferences.
Despitethe
promise
these
methods,
barriers
prevent
supervised
from
reachingits
full
potential
phylogenetics.
We
discuss
paths
forward.
In
thefuture,
expect
the
application
careful
network
designs
encodings
will
allowsupervised
accommodate
complex
processes
continue
confoundtraditional
methods.
Language: Английский
Estimation of spatial demographic maps from polymorphism data using a neural network
Molecular Ecology Resources,
Journal Year:
2024,
Volume and Issue:
24(7)
Published: Aug. 16, 2024
Abstract
A
fundamental
goal
in
population
genetics
is
to
understand
how
variation
arrayed
over
natural
landscapes.
From
first
principles
we
know
that
common
features
such
as
heterogeneous
densities
and
barriers
dispersal
should
shape
genetic
space,
however
there
are
few
tools
currently
available
can
deal
with
these
ubiquitous
complexities.
Geographically
referenced
single
nucleotide
polymorphism
(SNP)
data
increasingly
accessible,
presenting
an
opportunity
study
across
geographic
space
myriad
species.
We
present
a
new
inference
method
uses
geo‐referenced
SNPs
deep
neural
network
estimate
spatially
maps
of
density
rate.
Our
trains
on
simulated
input
output
pairings,
where
the
consists
genotypes
sampling
locations
generated
from
continuous
simulator,
map
true
demographic
parameters.
benchmark
our
tool
against
existing
methods
discuss
qualitative
differences
between
different
approaches;
particular,
program
unique
because
it
infers
magnitude
both
well
their
landscape,
does
so
using
SNP
data.
Similar
constrained
estimating
relative
migration
rates,
or
require
identity‐by‐descent
blocks
input.
applied
empirical
North
American
grey
wolves,
for
which
estimated
mostly
reasonable
parameters,
but
was
affected
by
incomplete
spatial
sampling.
Genetic
based
like
ours
complement
other,
direct
past
demography,
believe
will
serve
valuable
applications
conservation,
ecology
evolutionary
biology.
An
open
source
software
package
implementing
https://github.com/kr‐colab/mapNN
.
Language: Английский
Timesweeper: Accurately Identifying Selective Sweeps Using Population Genomic Time Series
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2022,
Volume and Issue:
unknown
Published: July 7, 2022
ABSTRACT
Despite
decades
of
research,
identifying
selective
sweeps,
the
genomic
footprints
positive
selection,
remains
a
core
problem
in
population
genetics.
Of
myriad
methods
that
have
been
developed
to
tackle
this
task,
few
are
designed
leverage
potential
time-series
data.
This
is
because
most
genetic
studies
natural
populations
only
single
period
time
can
be
sampled.
Recent
advancements
sequencing
technology,
including
improvements
extracting
and
ancient
DNA,
made
repeated
samplings
possible,
allowing
for
more
direct
analysis
recent
evolutionary
dynamics.
Serial
sampling
organisms
with
shorter
generation
times
has
also
become
feasible
due
cost
throughput
sequencing.
With
these
advances
mind,
here
we
present
Timesweeper,
fast
accurate
convolutional
neural
network-based
tool
sweeps
data
consisting
multiple
over
time.
Timesweeper
by
first
simulating
training
under
demographic
model
appropriate
interest,
one-dimensional
Convolutional
Neural
Network
on
said
simulations,
inferring
which
polymorphisms
serialized
dataset
were
target
completed
or
ongoing
sweep.
We
show
simulated
scenarios,
identifies
selected
variants
high
resolution,
estimates
selection
coefficients
accurately
than
existing
methods.
In
sum,
inferences
about
possible
when
available;
such
will
continue
proliferate
coming
years
both
samples
extant
faster
times,
as
well
experimentally
evolved
where
often
generated.
Methodological
thus
help
resolve
controversy
role
genome.
provide
Python
package
use
community.
Language: Английский