bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2022,
Volume and Issue:
unknown
Published: Oct. 27, 2022
Abstract
Rooted
species
trees
are
used
in
several
downstream
applications
of
phylogenetics.
Most
tree
estimation
methods
produce
unrooted
and
additional
then
to
root
these
trees.
Recently,
Quintet
Rooting
(QR)
(Tabatabaee
et
al.,
ISMB
Bioinformatics
2022),
a
polynomial-time
method
for
rooting
an
given
gene
under
the
multispecies
coalescent,
was
introduced.
QR,
which
is
based
on
proof
identifiability
rooted
5-taxon
presence
incomplete
lineage
sorting,
shown
have
good
accuracy,
improving
over
other
when
sorting
only
cause
discordance,
except
error
very
high.
However,
statistical
consistency
QR
left
as
open
question.
Here,
we
present
QR-STAR,
variant
that
has
step
determining
shape
each
quintet
tree.
We
prove
QR-STAR
statistically
consistent
coalescent
model.
Our
simulation
study
variety
model
conditions
shows
matches
or
improves
accuracy
QR.
available
source
form
at
https://github.com/ytabatabaee/Quintet-Rooting
.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: July 31, 2024
Abstract
Summary
methods
are
becoming
increasingly
popular
for
species
tree
estimation
from
multi-locus
data
in
the
presence
of
gene
discordance.
ASTRAL,
a
leading
method
this
class,
solves
Maximum
Quartet
Support
Species
Tree
problem
within
constrained
solution
space
constructed
input
trees.
In
contrast,
alternative
heuristics
such
as
wQFM
and
wQMC
operate
by
taking
set
weighted
quartets
employ
divide-and-conquer
strategy
to
construct
tree.
Recent
studies
showed
be
more
accurate
than
ASTRAL
wQMC,
though
its
scalability
is
hindered
computational
demands
explicitly
generating
weighting
Θ(
n
4
)
quartets.
Here,
we
introduce
wQFM-TREE,
novel
summary
that
enhances
circumventing
need
explicit
quartet
generation
weighting,
thereby
enabling
application
large
datasets.
Unlike
wQFM,
wQFM-TREE
can
also
handle
polytomies.
Extensive
simulations
under
diverse
challenging
model
conditions,
with
hundreds
or
thousands
taxa
genes,
consistently
demonstrate
matches
improves
upon
accuracy
ASTRAL.
Specifically,
outperformed
25
27
conditions
analyzed
study
involving
200-1000
taxa,
statistically
significant
differences
20
these
conditions.
Moreover,
applied
re-analyze
green
plant
dataset
One
Thousand
Plant
Transcriptomes
Initiative.
Its
remarkable
position
highly
competitive
field.
Additionally,
algorithmic
combinatorial
innovations
introduced
will
benefit
various
quartet-based
computations,
advancing
state-of-the-art
phylogenetic
estimations.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Nov. 25, 2024
Species
tree
estimation
is
frequently
based
on
phylogenomic
approaches
that
use
multiple
genes
from
throughout
the
genome.
This
process
becomes
particularly
challenging
due
to
gene
heterogeneity
(discordance),
often
resulting
Incomplete
Lineage
Sorting
(ILS).
Triplet-
and
quartet-based
for
species
have
gained
substantial
attention
as
they
are
provably
statistically
consistent
in
presence
of
ILS.
However,
unlike
methods,
limitation
rooted
triplet-based
methods
handling
unrooted
trees
has
restricted
their
adoption
systematics
community.
Furthermore,
since
induced
triplet
distribution
a
depends
placement
root,
accuracy
rooting.
Despite
progress
developing
rooting
trees,
greatly
understudied
choice
technique
downstream
effects
inference
under
realistic
model
conditions.
study
involves
rigorous
empirical
testing
with
different
establish
nuanced
understanding
impact
accuracy.
Moreover,
we
aim
investigate
conditions
which
provide
more
accurate
estimations
than
widely-used
such
ASTRAL.
Lecture notes in computer science,
Journal Year:
2023,
Volume and Issue:
unknown, P. 41 - 57
Published: Jan. 1, 2023
Abstract
Rooted
species
trees
are
used
in
several
downstream
applications
of
phylogenetics.
Most
tree
estimation
methods
produce
unrooted
and
additional
then
to
root
these
trees.
Recently,
Quintet
Rooting
(QR)
(Tabatabaee
et
al.,
ISMB
Bioinformatics
2022),
a
polynomial-time
method
for
rooting
an
given
gene
under
the
multispecies
coalescent,
was
introduced.
QR,
which
is
based
on
proof
identifiability
rooted
5-taxon
presence
incomplete
lineage
sorting,
shown
have
good
accuracy,
improving
over
other
when
sorting
only
cause
discordance,
except
error
very
high.
However,
statistical
consistency
QR
left
as
open
question.
Here,
we
present
QR-STAR,
variant
that
has
step
determining
shape
each
quintet
tree.
We
prove
QR-STAR
statistically
consistent
coalescent
model,
our
simulation
study
shows
matches
or
improves
accuracy
QR.
available
source
form
at
https://github.com/ytabatabaee/Quintet-Rooting
.
Research Square (Research Square),
Journal Year:
2023,
Volume and Issue:
unknown
Published: Nov. 16, 2023
Abstract
A
terrace
in
a
phylogenetic
tree
space
is
region
where
all
trees
contain
the
same
set
of
subtrees,
due
to
certain
patterns
missing
data
among
taxa
sampled,
resulting
an
identical
optimality
score
for
given
set.
This
was
first
investigated
context
estimation
from
sequence
alignments
using
maximum
likelihood
(ML)
and
parsimony
(MP).
It
later
extended
species
inference
problem
collection
gene
trees,
equally
optimal
referred
as
''pseudo''
which
does
not
consider
topological
proximity
terms
induced
subtrees
data.
In
this
study,
we
mathematically
characterize
terraces
investigate
mathematical
properties
conditions
that
lead
multiple
induce/display
locus-specific
owing
We
report
are
agnostic
heterogeneity.
Therefore,
introduce
special
type
topology-aware
call
''peak
terrace''.
Moreover,
empirically
various
challenges
opportunities
related
through
extensive
empirical
studies
simulated
real
biological
demonstrate
prevalence
ambiguity
created
search
algorithms.
Remarkably,
our
findings
indicate
identification
within
them
can
substantially
enhance
accuracy
summary
methods
provide
reasonably
accurate
branch
support.
BMC Ecology and Evolution,
Journal Year:
2024,
Volume and Issue:
24(1)
Published: Nov. 4, 2024
A
terrace
in
a
phylogenetic
tree
space
is
region
where
all
trees
contain
the
same
set
of
subtrees,
due
to
certain
patterns
missing
data
among
taxa
sampled,
resulting
an
identical
optimality
score
for
given
set.
This
was
first
investigated
context
estimation
from
sequence
alignments
using
maximum
likelihood
(ML)
and
parsimony
(MP).
It
later
extended
species
inference
problem
collection
gene
trees,
equally
optimal
referred
as
"pseudo"
which
does
not
consider
topological
proximity
terms
induced
subtrees
data.
In
this
study,
we
mathematically
characterize
terraces
investigate
mathematical
properties
conditions
that
lead
multiple
induce/display
locus-specific
owing
We
report
are
agnostic
heterogeneity.
Therefore,
introduce
special
type
topology-aware
call
"peak
terrace".
Moreover,
empirically
various
challenges
opportunities
related
through
extensive
empirical
studies
simulated
real
biological
demonstrate
prevalence
ambiguity
created
search
algorithms.
Remarkably,
our
findings
indicate
identification
could
potentially
advances
enhance
accuracy
summary
methods
provide
reasonably
accurate
branch
support.
Journal of Computational Biology,
Journal Year:
2022,
Volume and Issue:
29(7), P. 664 - 678
Published: Feb. 23, 2022
Species
tree
inference
is
a
basic
step
in
biological
discovery,
but
discordance
between
gene
trees
creates
analytical
challenges
and
large
data
sets
create
computational
challenges.
Although
there
generally
some
information
available
about
the
species
that
could
be
used
to
speed
up
estimation,
only
one
estimation
method
addresses
discordance—ASTRAL-J,
recent
development
ASTRAL
family
of
methods—is
able
use
this
information.
Here
we
describe
two
new
methods,
NJst-J
FASTRAL-J,
can
estimate
tree,
given
partial
knowledge
form
nonbinary
unrooted
constraint
tree.
We
show
both
FASTRAL-J
are
much
faster
than
ASTRAL-J
prove
all
three
methods
statistically
consistent
under
multispecies
coalescent
model
subject
constraint.
Our
extensive
simulation
study
shows
provide
advantages
over
ASTRAL-J:
(and
particularly
fast),
at
least
as
accurate
ASTRAL-J.
An
analysis
Avian
Phylogenomics
Project
set
with
48
14,446
genes
presents
additional
evidence
value
ASTRAL),
dramatic
reductions
running
time
(20
hours
for
default
ASTRAL,
minutes
or
seconds
respectively).
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2022,
Volume and Issue:
unknown
Published: June 29, 2022
Abstract
Summary
methods
are
one
of
the
dominant
approaches
for
estimating
species
trees
from
genome-scale
data.
However,
they
can
fail
to
produce
accurate
when
input
gene
highly
discordant
due
tree
estimation
error
as
well
biological
processes,
like
incomplete
lineage
sorting.
Here,
we
introduce
a
new
summary
method
TREE-QMC
that
offers
improved
accuracy
and
scalability
under
these
challenging
scenarios.
builds
upon
algorithmic
framework
QMC
(Snir
Rao
2010)
its
weighted
version
wQMC
(Avni
et
al.
2014).
Their
approach
takes
quartets
(four-leaf
trees)
in
divide-and-conquer
fashion,
at
each
step
constructing
graph
seeking
max
cut.
We
improve
this
methodology
two
ways.
First,
address
by
providing
an
algorithm
construct
directly
trees.
By
skipping
quartet
weighting
step,
has
time
complexity
O
(
n
3
k
)
with
some
assumptions
on
subproblem
sizes,
where
is
number
Second,
normalizing
weights
account
“artificial
taxa,”
which
introduced
during
divide
phase
so
solutions
subproblems
be
combined
conquer
phase.
Together,
contributions
enable
outperform
leading
(ASTRAL-III,
FASTRAL,
wQFM)
extensive
simulation
study.
also
present
application
avian
phylogenomics
data
set.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2022,
Volume and Issue:
unknown
Published: Nov. 24, 2022
Abstract
A
terrace
in
a
phylogenetic
tree
space
is
region
where
all
trees
contain
the
same
set
of
subtrees,
due
to
certain
patterns
missing
data
among
taxa
sampled,
resulting
an
identical
optimality
score
for
given
set.
This
was
first
investigated
context
estimation
from
sequence
alignments
using
maximum
likelihood
(ML)
and
parsimony
(MP).
The
concept
terraces
later
extended
species
inference
problem
collection
gene
trees,
equally
optimal
referred
as
“pseudo”
terrace.
Pseudo
do
not
consider
topological
proximity
terms
induced
subtrees
data.
In
this
study,
we
mathematically
characterize
investigate
mathematical
properties
conditions
that
lead
multiple
induce/display
locus-specific
owing
We
report
are
agnostic
topologies
discordance
therein.
Therefore,
introduce
special
type
topology-aware
which
call
“peak
terrace”,
on
give
rise
peak
terraces.
addition
theoretical
analytical
results,
empirically
different
challenges
well
various
opportunities
pertaining
multiplicity
good
terraced
landscapes.
Based
extensive
experimental
study
involving
both
simulated
real
biological
datasets,
present
prevalence
ambiguity
created
search
algorithms.
Remarkably,
our
findings
indicate
identification
within
them
can
substantially
enhance
accuracy
summary
methods.
Furthermore,
demonstrate
reasonably
accurate
branch
support
be
computed
by
leveraging
sourced
these
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2023,
Volume and Issue:
unknown
Published: April 6, 2023
Abstract
Cancer
progression
and
treatment
can
be
informed
by
reconstructing
its
evolutionary
history
from
tumor
cells.
However,
traditional
methods
assume
the
input
data
are
error-free
output
tree
is
fully
resolved.
These
assumptions
challenged
in
phylogenetics
because
single-cell
sequencing
produces
sparse,
error-ridden
tumors
evolve
clonally.
Here,
we
find
that
based
on
quartets
(four-leaf,
unrooted
trees)
withstand
these
barriers.
We
consider
a
popular
model,
which
mutations
arise
(highly
unresolved)
then
(unbiased)
errors
missing
values
introduced.
Quartets
implied
present
two
cells
absent
Our
main
result
most
probable
quartet
identifies
model
four
This
motivates
seeking
such
number
of
shared
between
it
maximized.
prove
an
optimal
solution
consistent
estimator
cell
lineage
tree;
this
guarantee
includes
case
where
highly
unresolved,
with
error
defined
as
false
negative
branches.
Lastly,
outline
how
quartet-based
might
employed
when
there
copy
aberrations
other
challenges
specific
to
phylogenetics.
Journal of Computational Biology,
Journal Year:
2023,
Volume and Issue:
30(11), P. 1146 - 1181
Published: Oct. 30, 2023
We
address
the
problem
of
rooting
an
unrooted
species
tree
given
a
set
gene
trees,
under
assumption
that
trees
evolve
within
model
multispecies
coalescent
(MSC)
model.
Quintet
Rooting
(QR)
is
polynomial
time
algorithm
was
recently
proposed
for
this
problem,
which
based
on
theory
developed
by
Allman,
Degnan,
and
Rhodes
proves
identifiability
rooted
5-taxon
from
MSC.
However,
although
QR
had
good
accuracy
in
simulations,
its
statistical
consistency
left
as
open
problem.
present
QR-STAR,
variant
with
additional
step
different
cost
function,
prove
it
statistically
consistent
Moreover,
we
derive
sample
complexity
bounds
QR-STAR
show
particular
"short
quintets"
has
complexity.
Finally,
our
simulation
study
variety
conditions
shows
matches
or
improves
QR.
available
open-source
form
github.