Statistically Consistent Rooting of Species Trees under the Multispecies Coalescent Model DOI Creative Commons
Yasamin Tabatabaee, Sébastien Roch, Tandy Warnow

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2022, Volume and Issue: unknown

Published: Oct. 27, 2022

Abstract Rooted species trees are used in several downstream applications of phylogenetics. Most tree estimation methods produce unrooted and additional then to root these trees. Recently, Quintet Rooting (QR) (Tabatabaee et al., ISMB Bioinformatics 2022), a polynomial-time method for rooting an given gene under the multispecies coalescent, was introduced. QR, which is based on proof identifiability rooted 5-taxon presence incomplete lineage sorting, shown have good accuracy, improving over other when sorting only cause discordance, except error very high. However, statistical consistency QR left as open question. Here, we present QR-STAR, variant that has step determining shape each quintet tree. We prove QR-STAR statistically consistent coalescent model. Our simulation study variety model conditions shows matches or improves accuracy QR. available source form at https://github.com/ytabatabaee/Quintet-Rooting .

Language: Английский

Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model DOI Creative Commons
Yunheng Han, Erin K. Molloy

Algorithms for Molecular Biology, Journal Year: 2023, Volume and Issue: 18(1)

Published: Dec. 1, 2023

Cancer progression and treatment can be informed by reconstructing its evolutionary history from tumor cells. Although many methods exist to estimate trees (called phylogenies) molecular sequences, traditional approaches assume the input data are error-free output tree is fully resolved. These assumptions challenged in phylogenetics because single-cell sequencing produces sparse, error-ridden tumors evolve clonally. Here, we study theoretical utility of based on quartets (four-leaf, unrooted phylogenetic trees) light these barriers. We consider a popular model, which mutations arise (highly unresolved) then (unbiased) errors missing values introduced. Quartets implied present two cells absent Our main result that most probable quartet identifies model four This motivates seeking such number shared between it maximized. prove an optimal solution this problem consistent estimator cell lineage tree; guarantee includes case where highly unresolved, with error defined as false negative branches. Lastly, outline how quartet-based might employed when there copy aberrations other challenges specific phylogenetics.

Language: Английский

Citations

0

wQFM-DISCO: DISCO-enabled wQFM improves phylogenomic analyses despite the presence of paralogs DOI Creative Commons
Sheikh Azizul Hakim, Md. Rownok Zahan Ratul, Md. Shamsuzzoha Bayzid

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Dec. 7, 2023

Abstract Gene trees often differ from the species that contain them due to various factors, including incomplete lineage sorting (ILS), gene duplication and loss (GDL), horizontal transfer (HGT). Several highly accurate tree estimation methods have been introduced explicitly address ILS, AS-TRAL, a widely used statistically consistent method, wQFM, quartet amalgamation approach is experimentally shown be more than ASTRAL. Two recent advancements, ASTRAL-Pro DISCO, emerged in field of phylogenomics consider (GDL) events. introduces refined measure similarity, accounting for both orthology paralogy. on other hand, offers general strategy decompose multicopy family into collection single-copy trees, allowing utilization previously designed inference context trees. In this study, we first introduce some variants DISCO examine its underlying hypotheses present analytical results statistical guarantees DISCO. particular, DISCO-R, variant with improved pruning provides robust results. We then propose wQFM-DISCO (wQFM paired DISCO) as an adaptation wQFM handle resulting GDL Extensive evaluation studies simulated real data sets demonstrate significantly competing methods.

Language: Английский

Citations

0

<strong></strong> Recent Progress on Methods for Estimating and Updating Large Phylogenies DOI Open Access
Paul Zaharias, Tandy Warnow

Published: Jan. 12, 2022

With the increased availability of sequence data and even fully sequenced assembled genomes, phylogeny estimation very large trees (even hundreds thousands sequences) is now a goal for some biologists. Yet, construction these phylogenies complex pipeline presenting analytical computational challenges, especially when number sequences large. In last few years, new methods have been developed that aim to enable highly accurate estimations on datasets, including divide-and-conquer techniques multiple alignment and/or tree estimation, can estimate species from multi-locus datasets while addressing heterogeneity due biological processes (e.g., incomplete lineage sorting gene duplication loss), add into or trees. Here we present recent advances discuss opportunities future improvements.

Language: Английский

Citations

0

Fast and Accurate Species Trees from Weighted Internode Distances DOI Open Access
Baqiao Liu, Tandy Warnow

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2022, Volume and Issue: unknown

Published: May 26, 2022

Abstract Species tree estimation is a basic step in many biological research projects, but complicated by the fact that gene trees can differ from species due to processes such as incomplete lineage sorting (ILS), duplication and loss (GDL), horizontal transfer (HGT), which cause different regions within genome have evolutionary histories (i.e., “gene heterogeneity”). One approach estimating presence of heterogeneity resulting ILS operates computing on each genomic region trees”) then using these define matrix average internode distances, where distance T between two x y number nodes leaves corresponding . Given matrix, be computed methods neighbor joining. Methods ASTRID NJst (which use this approach) are provably statistically consistent, very fast (low degree polynomial time) had high accuracy under conditions makes them competitive with other popular methods. In study, inspired recent work weighted ASTRAL, we present ASTRID, variant takes branch uncertainty into account distance. Our experimental study evaluating shows improvements compared original (unweighted) while remaining fast. Moreover, against state art. Thus, provides new method for improves upon has comparable art much faster. Weighted available at https://github.com/RuneBlaze/internode

Language: Английский

Citations

0

Statistically Consistent Rooting of Species Trees under the Multispecies Coalescent Model DOI Creative Commons
Yasamin Tabatabaee, Sébastien Roch, Tandy Warnow

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2022, Volume and Issue: unknown

Published: Oct. 27, 2022

Abstract Rooted species trees are used in several downstream applications of phylogenetics. Most tree estimation methods produce unrooted and additional then to root these trees. Recently, Quintet Rooting (QR) (Tabatabaee et al., ISMB Bioinformatics 2022), a polynomial-time method for rooting an given gene under the multispecies coalescent, was introduced. QR, which is based on proof identifiability rooted 5-taxon presence incomplete lineage sorting, shown have good accuracy, improving over other when sorting only cause discordance, except error very high. However, statistical consistency QR left as open question. Here, we present QR-STAR, variant that has step determining shape each quintet tree. We prove QR-STAR statistically consistent coalescent model. Our simulation study variety model conditions shows matches or improves accuracy QR. available source form at https://github.com/ytabatabaee/Quintet-Rooting .

Language: Английский

Citations

0