Statistically Consistent Rooting of Species Trees under the Multispecies Coalescent Model DOI Creative Commons
Yasamin Tabatabaee, Sébastien Roch, Tandy Warnow

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2022, Volume and Issue: unknown

Published: Oct. 27, 2022

Abstract Rooted species trees are used in several downstream applications of phylogenetics. Most tree estimation methods produce unrooted and additional then to root these trees. Recently, Quintet Rooting (QR) (Tabatabaee et al., ISMB Bioinformatics 2022), a polynomial-time method for rooting an given gene under the multispecies coalescent, was introduced. QR, which is based on proof identifiability rooted 5-taxon presence incomplete lineage sorting, shown have good accuracy, improving over other when sorting only cause discordance, except error very high. However, statistical consistency QR left as open question. Here, we present QR-STAR, variant that has step determining shape each quintet tree. We prove QR-STAR statistically consistent coalescent model. Our simulation study variety model conditions shows matches or improves accuracy QR. available source form at https://github.com/ytabatabaee/Quintet-Rooting .

Language: Английский

wQFM-TREE: highly accurate and scalable quartet-based species tree inference from gene trees DOI Open Access
Abdur Rafi, Ahmed Mahir Sultan Rumi, Sheikh Azizul Hakim

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: July 31, 2024

Abstract Summary methods are becoming increasingly popular for species tree estimation from multi-locus data in the presence of gene discordance. ASTRAL, a leading method this class, solves Maximum Quartet Support Species Tree problem within constrained solution space constructed input trees. In contrast, alternative heuristics such as wQFM and wQMC operate by taking set weighted quartets employ divide-and-conquer strategy to construct tree. Recent studies showed be more accurate than ASTRAL wQMC, though its scalability is hindered computational demands explicitly generating weighting Θ( n 4 ) quartets. Here, we introduce wQFM-TREE, novel summary that enhances circumventing need explicit quartet generation weighting, thereby enabling application large datasets. Unlike wQFM, wQFM-TREE can also handle polytomies. Extensive simulations under diverse challenging model conditions, with hundreds or thousands taxa genes, consistently demonstrate matches improves upon accuracy ASTRAL. Specifically, outperformed 25 27 conditions analyzed study involving 200-1000 taxa, statistically significant differences 20 these conditions. Moreover, applied re-analyze green plant dataset One Thousand Plant Transcriptomes Initiative. Its remarkable position highly competitive field. Additionally, algorithmic combinatorial innovations introduced will benefit various quartet-based computations, advancing state-of-the-art phylogenetic estimations.

Language: Английский

Citations

0

On the robustness to gene tree rooting (or lack thereof) of triplet-based species tree estimation methods DOI Open Access
Tanjeem Azwad Zaman, Rabib Jahin Ibn Momin, Md. Shamsuzzoha Bayzid

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 25, 2024

Species tree estimation is frequently based on phylogenomic approaches that use multiple genes from throughout the genome. This process becomes particularly challenging due to gene heterogeneity (discordance), often resulting Incomplete Lineage Sorting (ILS). Triplet- and quartet-based for species have gained substantial attention as they are provably statistically consistent in presence of ILS. However, unlike methods, limitation rooted triplet-based methods handling unrooted trees has restricted their adoption systematics community. Furthermore, since induced triplet distribution a depends placement root, accuracy rooting. Despite progress developing rooting trees, greatly understudied choice technique downstream effects inference under realistic model conditions. study involves rigorous empirical testing with different establish nuanced understanding impact accuracy. Moreover, we aim investigate conditions which provide more accurate estimations than widely-used such ASTRAL.

Language: Английский

Citations

0

Statistically Consistent Rooting of Species Trees Under the Multispecies Coalescent Model DOI Creative Commons
Yasamin Tabatabaee, Sébastien Roch, Tandy Warnow

et al.

Lecture notes in computer science, Journal Year: 2023, Volume and Issue: unknown, P. 41 - 57

Published: Jan. 1, 2023

Abstract Rooted species trees are used in several downstream applications of phylogenetics. Most tree estimation methods produce unrooted and additional then to root these trees. Recently, Quintet Rooting (QR) (Tabatabaee et al., ISMB Bioinformatics 2022), a polynomial-time method for rooting an given gene under the multispecies coalescent, was introduced. QR, which is based on proof identifiability rooted 5-taxon presence incomplete lineage sorting, shown have good accuracy, improving over other when sorting only cause discordance, except error very high. However, statistical consistency QR left as open question. Here, we present QR-STAR, variant that has step determining shape each quintet tree. We prove QR-STAR statistically consistent coalescent model, our simulation study shows matches or improves accuracy QR. available source form at https://github.com/ytabatabaee/Quintet-Rooting .

Language: Английский

Citations

1

Terraces in Species Tree Inference from Gene Trees DOI Creative Commons

Mursalin Habib,

Kowshic Roy,

Saem Hasan

et al.

Research Square (Research Square), Journal Year: 2023, Volume and Issue: unknown

Published: Nov. 16, 2023

Abstract A terrace in a phylogenetic tree space is region where all trees contain the same set of subtrees, due to certain patterns missing data among taxa sampled, resulting an identical optimality score for given set. This was first investigated context estimation from sequence alignments using maximum likelihood (ML) and parsimony (MP). It later extended species inference problem collection gene trees, equally optimal referred as ''pseudo'' which does not consider topological proximity terms induced subtrees data. In this study, we mathematically characterize terraces investigate mathematical properties conditions that lead multiple induce/display locus-specific owing We report are agnostic heterogeneity. Therefore, introduce special type topology-aware call ''peak terrace''. Moreover, empirically various challenges opportunities related through extensive empirical studies simulated real biological demonstrate prevalence ambiguity created search algorithms. Remarkably, our findings indicate identification within them can substantially enhance accuracy summary methods provide reasonably accurate branch support.

Language: Английский

Citations

1

Terraces in species tree inference from gene trees DOI Creative Commons

Mursalin Habib,

Kowshic Roy,

Saem Hasan

et al.

BMC Ecology and Evolution, Journal Year: 2024, Volume and Issue: 24(1)

Published: Nov. 4, 2024

A terrace in a phylogenetic tree space is region where all trees contain the same set of subtrees, due to certain patterns missing data among taxa sampled, resulting an identical optimality score for given set. This was first investigated context estimation from sequence alignments using maximum likelihood (ML) and parsimony (MP). It later extended species inference problem collection gene trees, equally optimal referred as "pseudo" which does not consider topological proximity terms induced subtrees data. In this study, we mathematically characterize terraces investigate mathematical properties conditions that lead multiple induce/display locus-specific owing We report are agnostic heterogeneity. Therefore, introduce special type topology-aware call "peak terrace". Moreover, empirically various challenges opportunities related through extensive empirical studies simulated real biological demonstrate prevalence ambiguity created search algorithms. Remarkably, our findings indicate identification could potentially advances enhance accuracy summary methods provide reasonably accurate branch support.

Language: Английский

Citations

0

Scalable Species Tree Inference with External Constraints DOI
Baqiao Liu, Tandy Warnow

Journal of Computational Biology, Journal Year: 2022, Volume and Issue: 29(7), P. 664 - 678

Published: Feb. 23, 2022

Species tree inference is a basic step in biological discovery, but discordance between gene trees creates analytical challenges and large data sets create computational challenges. Although there generally some information available about the species that could be used to speed up estimation, only one estimation method addresses discordance—ASTRAL-J, recent development ASTRAL family of methods—is able use this information. Here we describe two new methods, NJst-J FASTRAL-J, can estimate tree, given partial knowledge form nonbinary unrooted constraint tree. We show both FASTRAL-J are much faster than ASTRAL-J prove all three methods statistically consistent under multispecies coalescent model subject constraint. Our extensive simulation study shows provide advantages over ASTRAL-J: (and particularly fast), at least as accurate ASTRAL-J. An analysis Avian Phylogenomics Project set with 48 14,446 genes presents additional evidence value ASTRAL), dramatic reductions running time (20 hours for default ASTRAL, minutes or seconds respectively).

Language: Английский

Citations

2

TREE-QMC: Improving quartet graph construction for scalable and accurate species tree estimation from gene trees DOI Open Access
Yunheng Han, Erin K. Molloy

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2022, Volume and Issue: unknown

Published: June 29, 2022

Abstract Summary methods are one of the dominant approaches for estimating species trees from genome-scale data. However, they can fail to produce accurate when input gene highly discordant due tree estimation error as well biological processes, like incomplete lineage sorting. Here, we introduce a new summary method TREE-QMC that offers improved accuracy and scalability under these challenging scenarios. builds upon algorithmic framework QMC (Snir Rao 2010) its weighted version wQMC (Avni et al. 2014). Their approach takes quartets (four-leaf trees) in divide-and-conquer fashion, at each step constructing graph seeking max cut. We improve this methodology two ways. First, address by providing an algorithm construct directly trees. By skipping quartet weighting step, has time complexity O ( n 3 k ) with some assumptions on subproblem sizes, where is number Second, normalizing weights account “artificial taxa,” which introduced during divide phase so solutions subproblems be combined conquer phase. Together, contributions enable outperform leading (ASTRAL-III, FASTRAL, wQFM) extensive simulation study. also present application avian phylogenomics data set.

Language: Английский

Citations

1

Terraces in Species Tree Inference from Gene Trees DOI Creative Commons

Mursalin Habib,

Kowshic Roy,

Saem Hasan

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2022, Volume and Issue: unknown

Published: Nov. 24, 2022

Abstract A terrace in a phylogenetic tree space is region where all trees contain the same set of subtrees, due to certain patterns missing data among taxa sampled, resulting an identical optimality score for given set. This was first investigated context estimation from sequence alignments using maximum likelihood (ML) and parsimony (MP). The concept terraces later extended species inference problem collection gene trees, equally optimal referred as “pseudo” terrace. Pseudo do not consider topological proximity terms induced subtrees data. In this study, we mathematically characterize investigate mathematical properties conditions that lead multiple induce/display locus-specific owing We report are agnostic topologies discordance therein. Therefore, introduce special type topology-aware which call “peak terrace”, on give rise peak terraces. addition theoretical analytical results, empirically different challenges well various opportunities pertaining multiplicity good terraced landscapes. Based extensive experimental study involving both simulated real biological datasets, present prevalence ambiguity created search algorithms. Remarkably, our findings indicate identification within them can substantially enhance accuracy summary methods. Furthermore, demonstrate reasonably accurate branch support be computed by leveraging sourced these

Language: Английский

Citations

1

Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model DOI Open Access
Yunheng Han, Erin K. Molloy

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: April 6, 2023

Abstract Cancer progression and treatment can be informed by reconstructing its evolutionary history from tumor cells. However, traditional methods assume the input data are error-free output tree is fully resolved. These assumptions challenged in phylogenetics because single-cell sequencing produces sparse, error-ridden tumors evolve clonally. Here, we find that based on quartets (four-leaf, unrooted trees) withstand these barriers. We consider a popular model, which mutations arise (highly unresolved) then (unbiased) errors missing values introduced. Quartets implied present two cells absent Our main result most probable quartet identifies model four This motivates seeking such number of shared between it maximized. prove an optimal solution consistent estimator cell lineage tree; this guarantee includes case where highly unresolved, with error defined as false negative branches. Lastly, outline how quartet-based might employed when there copy aberrations other challenges specific to phylogenetics.

Language: Английский

Citations

0

QR-STAR: A Polynomial-Time Statistically Consistent Method for Rooting Species Trees Under the Coalescent DOI
Yasamin Tabatabaee, Sébastien Roch, Tandy Warnow

et al.

Journal of Computational Biology, Journal Year: 2023, Volume and Issue: 30(11), P. 1146 - 1181

Published: Oct. 30, 2023

We address the problem of rooting an unrooted species tree given a set gene trees, under assumption that trees evolve within model multispecies coalescent (MSC) model. Quintet Rooting (QR) is polynomial time algorithm was recently proposed for this problem, which based on theory developed by Allman, Degnan, and Rhodes proves identifiability rooted 5-taxon from MSC. However, although QR had good accuracy in simulations, its statistical consistency left as open problem. present QR-STAR, variant with additional step different cost function, prove it statistically consistent Moreover, we derive sample complexity bounds QR-STAR show particular "short quintets" has complexity. Finally, our simulation study variety conditions shows matches or improves QR. available open-source form github.

Language: Английский

Citations

0