QR-STAR: A Polynomial-Time Statistically Consistent Method for Rooting Species Trees Under the Coalescent DOI
Yasamin Tabatabaee, Sébastien Roch, Tandy Warnow

et al.

Journal of Computational Biology, Journal Year: 2023, Volume and Issue: 30(11), P. 1146 - 1181

Published: Oct. 30, 2023

We address the problem of rooting an unrooted species tree given a set gene trees, under assumption that trees evolve within model multispecies coalescent (MSC) model. Quintet Rooting (QR) is polynomial time algorithm was recently proposed for this problem, which based on theory developed by Allman, Degnan, and Rhodes proves identifiability rooted 5-taxon from MSC. However, although QR had good accuracy in simulations, its statistical consistency left as open problem. present QR-STAR, variant with additional step different cost function, prove it statistically consistent Moreover, we derive sample complexity bounds QR-STAR show particular "short quintets" has complexity. Finally, our simulation study variety conditions shows matches or improves QR. available open-source form github.

Language: Английский

Phylogenomic branch length estimation using quartets DOI Creative Commons
Yasamin Tabatabaee, Chao Zhang, Tandy Warnow

et al.

Bioinformatics, Journal Year: 2023, Volume and Issue: 39(Supplement_1), P. i185 - i193

Published: June 1, 2023

Branch lengths and topology of a species tree are essential in most downstream analyses, including estimation diversification dates, characterization selection, understanding adaptation, comparative genomics. Modern phylogenomic analyses often use methods that account for the heterogeneity evolutionary histories across genome due to processes such as incomplete lineage sorting. However, these typically do not generate branch units usable by applications, forcing resort alternative shortcuts estimating concatenating gene alignments into supermatrix. Yet, concatenation other available approaches fail address genome.

Language: Английский

Citations

6

DISCO+QR: rooting species trees in the presence of GDL and ILS DOI Creative Commons
James K. V. Willson, Yasamin Tabatabaee, Baqiao Liu

et al.

Bioinformatics Advances, Journal Year: 2023, Volume and Issue: 3(1)

Published: Jan. 1, 2023

Genes evolve under processes such as gene duplication and loss (GDL), so that family trees are multi-copy, well incomplete lineage sorting (ILS); both produce differ from the species tree. The estimation of sets is challenging, rooted presents additional analytical challenges. Two methods developed for this problem STRIDE, which roots by considering GDL events, Quintet Rooting (QR), ILS.We present DISCO+QR, a new approach to rooting first uses DISCO address then QR perform in presence ILS. DISCO+QR operates taking input decomposing them into single-copy using given tree information QR. We show relative accuracy STRIDE depend on properties dataset (number species, genes, rate duplication, degree ILS error), each provides advantages over other some conditions.DISCO available github.Supplementary data at Bioinformatics Advances online.

Language: Английский

Citations

3

On the robustness to gene tree rooting (or lack thereof) of triplet-based species tree estimation methods DOI Open Access
Tanjeem Azwad Zaman, Rabib Jahin Ibn Momin, Md. Shamsuzzoha Bayzid

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 25, 2024

Species tree estimation is frequently based on phylogenomic approaches that use multiple genes from throughout the genome. This process becomes particularly challenging due to gene heterogeneity (discordance), often resulting Incomplete Lineage Sorting (ILS). Triplet- and quartet-based for species have gained substantial attention as they are provably statistically consistent in presence of ILS. However, unlike methods, limitation rooted triplet-based methods handling unrooted trees has restricted their adoption systematics community. Furthermore, since induced triplet distribution a depends placement root, accuracy rooting. Despite progress developing rooting trees, greatly understudied choice technique downstream effects inference under realistic model conditions. study involves rigorous empirical testing with different establish nuanced understanding impact accuracy. Moreover, we aim investigate conditions which provide more accurate estimations than widely-used such ASTRAL.

Language: Английский

Citations

0

DISCO+QR: Rooting Species Trees in the Presence of GDL and ILS DOI Creative Commons
James K. V. Willson, Yasamin Tabatabaee, Baqiao Liu

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Jan. 3, 2023

A bstract Genes evolve under processes such as gene duplication and loss (GDL), so that family trees are multi-copy, well incomplete lineage sorting (ILS); both produce differ from the species tree. The estimation of sets is challenging, rooted presents additional analytical challenges. Two methods developed for this problem STRIDE (Emms Kelly, MBE 2017), which roots by considering GDL events, Quintet Rooting (Tabatabaee et al., ISMB 2022 Bioinformatics 2022), ILS. We present DISCO+QR, a new method rooting in presence operates taking input decomposing them into single-copy using DISCO (Willson Systematic Biology 2022) then given tree information (QR). show relative accuracy DISCO+QR depend on properties dataset (number species, genes, rate duplication, degree ILS, error), each provides advantages over other some conditions. Availability: QR available GitHub. supplementary materials at http://tandy.cs.illinois.edu/discoqr-suppl.pdf .

Language: Английский

Citations

1

Statistically Consistent Rooting of Species Trees Under the Multispecies Coalescent Model DOI Creative Commons
Yasamin Tabatabaee, Sébastien Roch, Tandy Warnow

et al.

Lecture notes in computer science, Journal Year: 2023, Volume and Issue: unknown, P. 41 - 57

Published: Jan. 1, 2023

Abstract Rooted species trees are used in several downstream applications of phylogenetics. Most tree estimation methods produce unrooted and additional then to root these trees. Recently, Quintet Rooting (QR) (Tabatabaee et al., ISMB Bioinformatics 2022), a polynomial-time method for rooting an given gene under the multispecies coalescent, was introduced. QR, which is based on proof identifiability rooted 5-taxon presence incomplete lineage sorting, shown have good accuracy, improving over other when sorting only cause discordance, except error very high. However, statistical consistency QR left as open question. Here, we present QR-STAR, variant that has step determining shape each quintet tree. We prove QR-STAR statistically consistent coalescent model, our simulation study shows matches or improves accuracy QR. available source form at https://github.com/ytabatabaee/Quintet-Rooting .

Language: Английский

Citations

1

QR-STAR: A Polynomial-Time Statistically Consistent Method for Rooting Species Trees Under the Coalescent DOI
Yasamin Tabatabaee, Sébastien Roch, Tandy Warnow

et al.

Journal of Computational Biology, Journal Year: 2023, Volume and Issue: 30(11), P. 1146 - 1181

Published: Oct. 30, 2023

We address the problem of rooting an unrooted species tree given a set gene trees, under assumption that trees evolve within model multispecies coalescent (MSC) model. Quintet Rooting (QR) is polynomial time algorithm was recently proposed for this problem, which based on theory developed by Allman, Degnan, and Rhodes proves identifiability rooted 5-taxon from MSC. However, although QR had good accuracy in simulations, its statistical consistency left as open problem. present QR-STAR, variant with additional step different cost function, prove it statistically consistent Moreover, we derive sample complexity bounds QR-STAR show particular "short quintets" has complexity. Finally, our simulation study variety conditions shows matches or improves QR. available open-source form github.

Language: Английский

Citations

0