bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2022,
Volume and Issue:
unknown
Published: Oct. 27, 2022
Abstract
Rooted
species
trees
are
used
in
several
downstream
applications
of
phylogenetics.
Most
tree
estimation
methods
produce
unrooted
and
additional
then
to
root
these
trees.
Recently,
Quintet
Rooting
(QR)
(Tabatabaee
et
al.,
ISMB
Bioinformatics
2022),
a
polynomial-time
method
for
rooting
an
given
gene
under
the
multispecies
coalescent,
was
introduced.
QR,
which
is
based
on
proof
identifiability
rooted
5-taxon
presence
incomplete
lineage
sorting,
shown
have
good
accuracy,
improving
over
other
when
sorting
only
cause
discordance,
except
error
very
high.
However,
statistical
consistency
QR
left
as
open
question.
Here,
we
present
QR-STAR,
variant
that
has
step
determining
shape
each
quintet
tree.
We
prove
QR-STAR
statistically
consistent
coalescent
model.
Our
simulation
study
variety
model
conditions
shows
matches
or
improves
accuracy
QR.
available
source
form
at
https://github.com/ytabatabaee/Quintet-Rooting
.
Algorithms for Molecular Biology,
Journal Year:
2023,
Volume and Issue:
18(1)
Published: Dec. 1, 2023
Cancer
progression
and
treatment
can
be
informed
by
reconstructing
its
evolutionary
history
from
tumor
cells.
Although
many
methods
exist
to
estimate
trees
(called
phylogenies)
molecular
sequences,
traditional
approaches
assume
the
input
data
are
error-free
output
tree
is
fully
resolved.
These
assumptions
challenged
in
phylogenetics
because
single-cell
sequencing
produces
sparse,
error-ridden
tumors
evolve
clonally.
Here,
we
study
theoretical
utility
of
based
on
quartets
(four-leaf,
unrooted
phylogenetic
trees)
light
these
barriers.
We
consider
a
popular
model,
which
mutations
arise
(highly
unresolved)
then
(unbiased)
errors
missing
values
introduced.
Quartets
implied
present
two
cells
absent
Our
main
result
that
most
probable
quartet
identifies
model
four
This
motivates
seeking
such
number
shared
between
it
maximized.
prove
an
optimal
solution
this
problem
consistent
estimator
cell
lineage
tree;
guarantee
includes
case
where
highly
unresolved,
with
error
defined
as
false
negative
branches.
Lastly,
outline
how
quartet-based
might
employed
when
there
copy
aberrations
other
challenges
specific
phylogenetics.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: Dec. 7, 2023
Abstract
Gene
trees
often
differ
from
the
species
that
contain
them
due
to
various
factors,
including
incomplete
lineage
sorting
(ILS),
gene
duplication
and
loss
(GDL),
horizontal
transfer
(HGT).
Several
highly
accurate
tree
estimation
methods
have
been
introduced
explicitly
address
ILS,
AS-TRAL,
a
widely
used
statistically
consistent
method,
wQFM,
quartet
amalgamation
approach
is
experimentally
shown
be
more
than
ASTRAL.
Two
recent
advancements,
ASTRAL-Pro
DISCO,
emerged
in
field
of
phylogenomics
consider
(GDL)
events.
introduces
refined
measure
similarity,
accounting
for
both
orthology
paralogy.
on
other
hand,
offers
general
strategy
decompose
multicopy
family
into
collection
single-copy
trees,
allowing
utilization
previously
designed
inference
context
trees.
In
this
study,
we
first
introduce
some
variants
DISCO
examine
its
underlying
hypotheses
present
analytical
results
statistical
guarantees
DISCO.
particular,
DISCO-R,
variant
with
improved
pruning
provides
robust
results.
We
then
propose
wQFM-DISCO
(wQFM
paired
DISCO)
as
an
adaptation
wQFM
handle
resulting
GDL
Extensive
evaluation
studies
simulated
real
data
sets
demonstrate
significantly
competing
methods.
With
the
increased
availability
of
sequence
data
and
even
fully
sequenced
assembled
genomes,
phylogeny
estimation
very
large
trees
(even
hundreds
thousands
sequences)
is
now
a
goal
for
some
biologists.
Yet,
construction
these
phylogenies
complex
pipeline
presenting
analytical
computational
challenges,
especially
when
number
sequences
large.
In
last
few
years,
new
methods
have
been
developed
that
aim
to
enable
highly
accurate
estimations
on
datasets,
including
divide-and-conquer
techniques
multiple
alignment
and/or
tree
estimation,
can
estimate
species
from
multi-locus
datasets
while
addressing
heterogeneity
due
biological
processes
(e.g.,
incomplete
lineage
sorting
gene
duplication
loss),
add
into
or
trees.
Here
we
present
recent
advances
discuss
opportunities
future
improvements.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2022,
Volume and Issue:
unknown
Published: May 26, 2022
Abstract
Species
tree
estimation
is
a
basic
step
in
many
biological
research
projects,
but
complicated
by
the
fact
that
gene
trees
can
differ
from
species
due
to
processes
such
as
incomplete
lineage
sorting
(ILS),
duplication
and
loss
(GDL),
horizontal
transfer
(HGT),
which
cause
different
regions
within
genome
have
evolutionary
histories
(i.e.,
“gene
heterogeneity”).
One
approach
estimating
presence
of
heterogeneity
resulting
ILS
operates
computing
on
each
genomic
region
trees”)
then
using
these
define
matrix
average
internode
distances,
where
distance
T
between
two
x
y
number
nodes
leaves
corresponding
.
Given
matrix,
be
computed
methods
neighbor
joining.
Methods
ASTRID
NJst
(which
use
this
approach)
are
provably
statistically
consistent,
very
fast
(low
degree
polynomial
time)
had
high
accuracy
under
conditions
makes
them
competitive
with
other
popular
methods.
In
study,
inspired
recent
work
weighted
ASTRAL,
we
present
ASTRID,
variant
takes
branch
uncertainty
into
account
distance.
Our
experimental
study
evaluating
shows
improvements
compared
original
(unweighted)
while
remaining
fast.
Moreover,
against
state
art.
Thus,
provides
new
method
for
improves
upon
has
comparable
art
much
faster.
Weighted
available
at
https://github.com/RuneBlaze/internode
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2022,
Volume and Issue:
unknown
Published: Oct. 27, 2022
Abstract
Rooted
species
trees
are
used
in
several
downstream
applications
of
phylogenetics.
Most
tree
estimation
methods
produce
unrooted
and
additional
then
to
root
these
trees.
Recently,
Quintet
Rooting
(QR)
(Tabatabaee
et
al.,
ISMB
Bioinformatics
2022),
a
polynomial-time
method
for
rooting
an
given
gene
under
the
multispecies
coalescent,
was
introduced.
QR,
which
is
based
on
proof
identifiability
rooted
5-taxon
presence
incomplete
lineage
sorting,
shown
have
good
accuracy,
improving
over
other
when
sorting
only
cause
discordance,
except
error
very
high.
However,
statistical
consistency
QR
left
as
open
question.
Here,
we
present
QR-STAR,
variant
that
has
step
determining
shape
each
quintet
tree.
We
prove
QR-STAR
statistically
consistent
coalescent
model.
Our
simulation
study
variety
model
conditions
shows
matches
or
improves
accuracy
QR.
available
source
form
at
https://github.com/ytabatabaee/Quintet-Rooting
.