Genetics,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Nov. 23, 2024
Abstract
Pathogen
genomics
is
a
powerful
tool
for
tracking
infectious
disease
transmission.
In
malaria,
identity-by-descent
(IBD)
used
to
assess
the
genetic
relatedness
between
parasites
and
has
been
study
transmission
importation.
theory,
IBD
can
be
distinguish
genealogical
relationships
reconstruct
history
or
identify
quantitative-trait-locus
experiments.
MalKinID
(Malaria
Kinship
Identifier)
new
classification
model
designed
among
malaria
based
on
genome-wide
proportions
segment
distributions.
was
calibrated
genomic
data
from
three
laboratory-based
crosses
(yielding
440
parent-child
[PC]
9060
full-sibling
[FS]
comparisons).
identified
lab
generated
F1
progeny
with
>80%
sensitivity
showed
that
0.39
(95%
CI
0.28,
0.49)
of
second-generation
NF54
NHP4026
cross
were
F1s
0.56
(0.45,
0.67)
backcrosses
an
parental
strain.
simulated
outcrossed
importations,
reconstructs
genealogy
high
precision
sensitivity,
F1-scores
exceeding
0.84.
However,
when
importation
involves
inbreeding,
such
as
during
serial
co-transmission,
declined,
(the
harmonic
mean
sensitivity)
0.76
(0.56,
0.92)
0.23
(0.0,
0.4)
PC
FS
<0.05
second-degree
third-degree
relatives.
Disentangling
inbred
required
adapting
perform
multi-sample
comparisons.
Genealogical
inference
most
powered
1)
outcrossing
norm
2)
comparisons
predefined
pedigree
are
used.
lays
foundations
using
track
parasite
separating
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Feb. 15, 2025
Inference
of
Ancestral
Recombination
Graphs
(ARGs)
is
central
interest
in
the
analysis
genomic
variation.
ARGs
can
be
specified
terms
topologies
and
coalescence
times.
The
times
are
usually
estimated
using
an
informative
prior
derived
from
coalescent
theory,
but
this
may
generate
biased
estimates
also
complicate
downstream
inferences
based
on
ARGs.
Here
we
introduce,
POLEGON,
a
novel
approach
for
estimating
branch
lengths
which
uses
uninformative
prior.
Using
extensive
simulations,
show
that
method
provides
improved
lead
to
more
accurate
effective
population
sizes
under
wide
range
demographic
assumptions.
It
improves
other
including
mutation
rates.
We
apply
data
1000
Genomes
Project
investigate
size
histories
differential
signatures
across
populations.
estimate
HLA
region,
they
exceed
30
million
years
multiple
segments.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 14, 2025
Elucidating
ancestry-specific
structures
in
admixed
populations
is
crucial
for
comprehending
population
history
and
mitigating
confounding
effects
genome-wide
association
studies.
Existing
methods
elucidating
the
generally
rely
on
frequency-based
estimates
of
genetic
relationship
matrix
(GRM)
among
individuals
after
masking
segments
from
ancestry
components
not
being
targeted
investigation.
However,
these
approaches
disregard
linkage
information
between
markers,
potentially
limiting
their
resolution
revealing
structure
within
an
component.
We
introduce
expected
GRM
(as-eGRM),
a
novel
framework
relatedness
individuals.
The
key
design
as-eGRM
consists
defining
pairwise
based
genealogical
trees
encoded
Ancestral
Recombination
Graph
(ARG)
local
calls
computing
expectation
across
genome.
Comprehensive
evaluations
using
both
simulated
stepping-stone
models
empirical
datasets
three-way
Latino
cohorts
showed
that
analysis
robustly
outperforms
existing
with
diverse
demographic
histories.
Taken
together,
has
promise
to
better
reveal
fine-scale
component
individuals,
which
can
help
improve
robustness
interpretation
findings
studies
disease
or
complex
traits
understudied
populations.
Abstract
Numerous
studies
have
revealed
a
signature
of
strong
adaptive
evolution
in
the
piwi-interacting
RNA
(piRNA)
machinery
Drosophila
melanogaster,
but
cause
this
pattern
is
not
understood.
Several
hypotheses
been
proposed.
One
hypothesis
that
transposable
element
(TE)
families
and
piRNA
are
co-evolving
under
an
evolutionary
arms
race,
perhaps
due
to
antagonism
by
TEs
against
machinery.
A
related,
though
co-evolutionary,
recurrent
TE
invasion
drives
adapt
novel
strategies.
third
ongoing
fluctuation
abundance
leads
adaptation
must
constantly
adjust
between
sensitivity
for
detecting
new
elements
specificity
avoid
cost
off-target
gene
silencing.
Rapid
may
also
be
driven
independently
TEs,
instead
from
other
functions
such
as
role
piRNAs
suppressing
sex-chromosome
meiotic
drive.
We
sought
evaluate
impact
on
D.
melanogaster
2
species
with
higher
repeat
content—Drosophila
ananassae
willistoni.
This
comparison
was
achieved
employing
likelihood-based
testing
framework
based
McDonald–Kreitman
test.
show
we
can
reject
faster
rate
these
species.
propose
high
either
recent
influx
occurred
during
range
expansion
or
selection
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: March 5, 2025
Abstract
Genetic
relatedness
is
a
central
concept
in
genetics,
underpinning
studies
of
population
and
quantitative
genetics
human,
animal,
plant
settings.
It
typically
stored
as
genetic
matrix
(GRM),
whose
elements
are
pairwise
values
between
individuals.
This
has
been
defined
various
contexts
based
on
pedigree,
genotype,
phylogeny,
coalescent
times,
and,
recently,
ancestral
recombination
graph
(ARG).
ARG-based
GRMs
have
found
to
better
capture
the
structure
improve
association
relative
genotype
GRM.
However,
calculating
further
operations
with
them
fundamentally
challenging
due
inherent
quadratic
time
space
complexity.
Here,
we
first
discuss
different
definitions
unifying
context,
making
use
additive
model
trait
provide
definition
“branch
relatedness”
corresponding
GRM”.
We
explore
relationship
branch
pedigree
through
case
study
French-Canadian
individuals
that
known
pedigree.
Through
tree
sequence
encoding
an
ARG,
then
derive
efficient
algorithm
for
computing
products
GRM
general
vector,
without
explicitly
forming
leverages
sparse
genomes
hence
enables
large-scale
computations
demonstrate
power
this
by
developing
randomized
principal
components
sequences
easily
scales
millions
genomes.
All
algorithms
implemented
open
source
tskit
Python
package.
Taken
together,
work
consolidates
notions
leveraging
ARG
it
provides
enable
scale
mega-scale
genomic
datasets.
Scientific Reports,
Journal Year:
2025,
Volume and Issue:
15(1)
Published: March 13, 2025
Abstract
Flax
(
Linum
usitatissimum
L.)
is
one
of
the
founder
crops
domesticated
for
oil
and
fiber
uses
in
Near-Eastern
Fertile
Crescent,
but
its
domestication
history
remains
largely
elusive.
Genetic
inferences
so
far
have
expanded
our
knowledge
several
aspects
flax
such
as
wild
progenitor,
first
use
flax,
events.
However,
little
known
about
processes
involving
multiple
This
study
applied
genotyping-by-sequencing
to
infer
processes.
Ninety-three
samples
representing
four
groups
(oilseed,
fiber,
winter
capsular
dehiscence)
progenitor
(or
pale
flax;
L.
bienne
Mill.)
were
sequenced.
SNP
calling
identified
16,998
SNPs
that
widely
distributed
across
15
chromosomes.
Diversity
analysis
found
had
largest
nucleotide
diversity,
followed
by
indehiscent,
winter,
oilseed
cultivated
flax.
Pale
seemed
be
under
population
contraction,
while
other
expansion
after
bottleneck.
Demographic
showed
five
carried
clear
genetic
signals
mixture
events
associated
with
Phylogenetic
revealed
oilseed,
formed
two
separate
phylogenetic
subclades.
One
subclade
abundant
along
some
mainly
originating
Near
East
nearby
regions.
The
from
Europe
parts
world.
Dating
divergences
an
assumption
10,000
years
before
present
(BP)
spread
5800
BP
hardiness
occurred
5100
BP.
These
findings
provide
new
significant
insights
into
Spatial
patterns
in
genetic
diversity
are
shaped
by
individuals
dispersing
from
their
parents
and
larger-scale
population
movements.
It
has
long
been
appreciated
that
these
of
movement
shape
the
underlying
genealogies
along
genome
leading
to
geographic
isolation-by-distance
contemporary
data.
However,
extracting
enormous
amount
information
contained
recombining
sequences
has,
until
recently,
not
computationally
feasible.
Here,
we
capitalize
on
important
recent
advances
genome-wide
gene-genealogy
reconstruction
develop
methods
use
thousands
trees
estimate
per-generation
dispersal
rates
locate
ancestors
a
sample
back
through
time.
We
take
likelihood
approach
continuous
space
using
simple
approximate
model
(branching
Brownian
motion)
as
our
prior
distribution
spatial
genealogies.
After
testing
method
with
simulations
apply
it
Arabidopsis
thaliana.
rate
roughly
60
km2/generation,
slightly
higher
across
latitude
than
longitude,
potentially
reflecting
northward
post-glacial
expansion.
Locating
allows
us
visualize
major
movements,
alternative
histories,
admixture.
Our
highlights
huge
about
past
events
movements
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2021,
Volume and Issue:
unknown
Published: July 14, 2021
Abstract
Spatial
patterns
in
genetic
diversity
are
shaped
by
individuals
dispersing
from
their
parents
and
larger-scale
population
movements.
It
has
long
been
appreciated
that
these
of
movement
shape
the
underlying
genealogies
along
genome
leading
to
geographic
isolation
distance
contemporary
data.
However,
extracting
enormous
amount
information
contained
recombining
sequences
has,
until
recently,
not
computationally
feasible.
Here
we
capitalize
on
important
recent
advances
genome-wide
gene-genealogy
reconstruction
develop
methods
use
thousands
trees
estimate
per-generation
dispersal
rates
locate
ancestors
a
sample
back
through
time.
We
take
likelihood
approach
continuous
space
using
simple
approximate
model
(branching
Brownian
motion)
as
our
prior
distribution
spatial
genealogies.
After
testing
method
with
simulations
apply
it
Arabidopsis
thaliana
.
rate
roughly
60km
2
per
generation,
slightly
higher
across
latitude
than
longitude,
potentially
reflecting
northward
post-glacial
expansion.
Locating
allows
us
visualize
major
movements,
alternative
histories,
admixture.
Our
highlights
huge
about
past
events
movements
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Feb. 24, 2025
Foundation
models
for
single-cell
transcriptomics
have
the
potential
to
augment
(or
replace)
purpose-built
tools
a
variety
of
common
analyses,
especially
when
data
are
sparse.
Recent
work
with
large
language
has
shown
that
training
composition
greatly
shapes
performance;
however,
date,
foundation
ignored
this
aspect,
opting
instead
train
on
largest
possible
corpus.
We
systematically
investigate
consequences
dataset
behavior
deep
learning
transcriptomics,
focusing
human
hematopoiesis
as
tractable
model
system
and
including
cells
from
adult
developing
tissues,
disease
states,
perturbation
atlases.
find
(1)
these
generalize
poorly
unseen
cell
types,
(2)
adding
malignant
healthy
corpus
does
not
necessarily
improve
modeling
cells,
(3)
an
embryonic
stem
differentiation
atlas
during
improves
performance
out-of-distribution
tasks.
Our
results
emphasize
importance
diverse
suggest
strategies
optimize
future
models.
Molecular Ecology Resources,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Feb. 26, 2025
The
StairwayPlot
approach
provides
an
elegant,
flexible
and
powerful
method
to
estimate
complex
demographic
histories
of
single
populations
from
site
frequency
spectrum
data.
It
uses
expected
coalescent
times
compute
the
within
a
multinomial
likelihood
function.
Population
sizes
are
allowed
vary
freely
between
events
but
constant
each
interval.
Here,
we
implement
in
Bayesian
software
package
RevBayes.
We
use
approaches
developed
for
Skyline
Plots,
which
include
independent
identically
distributed
(i.i.d.)
population
sizes,
Gaussian
Markov
random
fields
Horseshoe
as
prior
distributions
on
sizes.
Furthermore,
recently
computing
leave-one-out
cross-validation
probability
efficient
model
selection.
compare
inference
our
implementation
original
Maximum
Likelihood
implementation,
StairwayPlot2.
Our
results
show
that
RevBayes
performs
comparable
StairwayPlot2
terms
parameter
accuracy,
is
given
both
same
underlying
From
set
models,
field
performed
best
smoothly
varying
histories,
while
abruptly
changing
histories.
conclude
study
by
exploring
several
choices
often
faced
empirical
studies,
including
total
sequence
length,
assumed
mutation
rate,
well
biases
through
mis-calling
ancestral
alleles.
using
example
few
10
diploid
individuals
sufficient
infer
at
least
500
k
nucleotide
polymorphisms
(SNPs)
required.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2025,
Volume and Issue:
unknown
Published: Feb. 27, 2025
Ancestral
recombination
graphs
(ARGs)
are
the
focus
of
much
ongoing
research
interest.
Recent
progress
in
inference
has
made
ARG-based
approaches
feasible
across
range
applications,
and
many
new
methods
using
inferred
ARGs
as
input
have
appeared.
This
on
long-standing
problem
ARG
proceeded
two
distinct
directions.
First,
Bayesian
under
Sequentially
Markov
Coalescent
(SMC),
is
now
practical
for
tens-to-hundreds
samples.
Second,
approximate
models
heuristics
can
scale
to
sample
sizes
three
orders
magnitude
larger.
Although
these
heuristic
reasonably
accurate
metrics,
one
significant
drawback
that
they
estimate
do
not
topological
properties
required
compute
a
likelihood
such
SMC
present-day
formulations.
In
particular,
typically
precise
details
about
events,
which
currently
likelihood.
this
paper
we
present
backwards-time
formulation
derive
straightforward
definition
general
class
model.
We
show
does
require
events
be
estimated,
robust
presence
polytomies.
discuss
possibilities
opens.