Statistically Consistent Rooting of Species Trees under the Multispecies Coalescent Model DOI Creative Commons
Yasamin Tabatabaee, Sébastien Roch, Tandy Warnow

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2022, Volume and Issue: unknown

Published: Oct. 27, 2022

Abstract Rooted species trees are used in several downstream applications of phylogenetics. Most tree estimation methods produce unrooted and additional then to root these trees. Recently, Quintet Rooting (QR) (Tabatabaee et al., ISMB Bioinformatics 2022), a polynomial-time method for rooting an given gene under the multispecies coalescent, was introduced. QR, which is based on proof identifiability rooted 5-taxon presence incomplete lineage sorting, shown have good accuracy, improving over other when sorting only cause discordance, except error very high. However, statistical consistency QR left as open question. Here, we present QR-STAR, variant that has step determining shape each quintet tree. We prove QR-STAR statistically consistent coalescent model. Our simulation study variety model conditions shows matches or improves accuracy QR. available source form at https://github.com/ytabatabaee/Quintet-Rooting .

Language: Английский

Cycles of fusion and fission enabled rapid parallel adaptive radiations in African cichlids DOI
Joana I. Meier,

Matthew D. McGee,

David A. Marques

et al.

Science, Journal Year: 2023, Volume and Issue: 381(6665)

Published: Sept. 28, 2023

Although some lineages of animals and plants have made impressive adaptive radiations when provided with ecological opportunity, the propensities to radiate vary profoundly among for unknown reasons. In Africa's Lake Victoria region, one cichlid lineage radiated in every lake, largest radiation taking place a lake less than 16,000 years old. We show that all its guilds evolved situ. Cycles fusion through admixture fission speciation characterize history radiation. It was jump-started several swamp-dwelling refugial populations, each which were older hybrid descent, met newly forming where they fused into single population, resuspending old variation. Each population contributed different set ancient alleles from new assembled record time, involving additional fusion-fission cycles. argue repeated cycles make fast predictable.

Language: Английский

Citations

48

Quartet-based Genome-scale Species Tree Inference using Multicopy Gene Family Trees DOI
Abdur Rafi, Ahmed Mahir Sultan Rumi, Sheikh Azizul Hakim

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: April 10, 2025

Abstract Species tree estimation from genome-wide data has transformed evolutionary studies, particularly in the presence of gene discordance. Gene trees often differ species due to factors like incomplete lineage sorting (ILS) and duplication loss (GDL). Quartet-based methods have gained substantial popularity for their accuracy statistical guarantee. However, most these (e.g., ASTRAL, wQFM, wQMC) rely on single-copy models ILS not GDL, limiting applicability large genomic datasets. ASTRAL-Pro, a recent advancement, refined quartet similarity measures incorporate both orthology paralogy, improving inference under GDL. Among other quartet-based methods, wQFM-DISCO converts multicopy family into using DISCO applies wQFM algorithm trees. ASTRAL-Pro remained only summary method explicitly model loss. In this study, we extend (which requires decomposition) wQFM-TREE operates directly trees) by modeling loss, leveraging concept speciation-driven quartets introduced ASTRAL-Pro. Our consistently outperforms across conditions, offering promising alternative

Language: Английский

Citations

0

Quartet Based Gene Tree Imputation Using Deep Learning Improves Phylogenomic Analyses Despite Missing Data DOI
Sazan Mahbub, Shashata Sawmya, Arpita Saha

et al.

Journal of Computational Biology, Journal Year: 2022, Volume and Issue: 29(11), P. 1156 - 1172

Published: Sept. 1, 2022

Species tree estimation is frequently based on phylogenomic approaches that use multiple genes from throughout the genome. However, for a combination of reasons (ranging sampling biases to more biological causes, as in gene birth and loss), trees are often incomplete, meaning not all species interest have common set genes. Incomplete can potentially impact accuracy inference. We, first time, introduce problem imputing quartet distribution induced by incomplete trees, which involves adding missing quartets back distribution. We present Quartet Gene Imputation using Deep Learning (QT-GILD), an automated specially tailored unsupervised deep learning technique, accompanied cues natural language processing, learns given generates complete accordingly. QT-GILD general-purpose technique needing no explicit modeling subject system or data heterogeneity. Experimental studies collection simulated empirical datasets suggest effectively impute distribution, results dramatic improvement accuracy. Remarkably, only imputes but also account error. Therefore, advances state-of-the-art face data.

Language: Английский

Citations

9

Weighted ASTRID: fast and accurate species trees from weighted internode distances DOI Creative Commons
Baqiao Liu, Tandy Warnow

Algorithms for Molecular Biology, Journal Year: 2023, Volume and Issue: 18(1)

Published: July 19, 2023

Species tree estimation is a basic step in many biological research projects, but complicated by the fact that gene trees can differ from species due to processes such as incomplete lineage sorting (ILS), duplication and loss (GDL), horizontal transfer (HGT), which cause different regions within genome have evolutionary histories (i.e., "gene heterogeneity"). One approach estimating presence of heterogeneity resulting ILS operates computing on each genomic region trees") then using these define matrix average internode distances, where distance T between two x y number nodes leaves corresponding y. Given matrix, be computed methods neighbor joining. Methods ASTRID NJst (which use this approach) are provably statistically consistent, very fast (low degree polynomial time) had high accuracy under conditions makes them competitive with other popular methods. In study, inspired recent work weighted ASTRAL, we present ASTRID, variant takes branch uncertainty into account distance.Our experimental study evaluating typically shows improvements compared original (unweighted) against state art. Our re-implementation also improves runtime, marked large datasets.Weighted new method for upon has comparable while remaining much faster. Weighted available at https://github.com/RuneBlaze/internode .

Language: Английский

Citations

5

Quartet Fiduccia–Mattheyses revisited for larger phylogenetic studies DOI Creative Commons
Sharmin Akter Mim, Md Zarif-Ul-Alam, Rezwana Reaz

et al.

Bioinformatics, Journal Year: 2023, Volume and Issue: 39(6)

Published: June 1, 2023

Abstract Motivation With the recent breakthroughs in sequencing technology, phylogeny estimation at a larger scale has become huge opportunity. For accurate of large-scale phylogeny, substantial endeavor is being devoted introducing new algorithms or upgrading current approaches. In this work, we to improve Quartet Fiduccia and Mattheyses (QFM) algorithm resolve phylogenetic trees better quality with running time. QFM was already appreciated by researchers for its good tree quality, but fell short phylogenomic studies due excessively slow Results We have re-designed so that it can amalgamate millions quartets over thousands taxa into species great level accuracy within amount Named “QFM Fast Improved (QFM-FI)”, our version 20 000× faster than previous 400× widely used variant implemented PAUP* on datasets. also provided theoretical analysis time memory requirements QFM-FI. conducted comparative study QFM-FI other state-of-the-art reconstruction methods, such as QFM, QMC, wQMC, wQFM, ASTRAL, simulated well real biological Our results show improves produces are comparable methods. Availability implementation open source available https://github.com/sharmin-mim/qfm_java.

Language: Английский

Citations

4

wQFM-DISCO: DISCO-enabled wQFM improves phylogenomic analyses despite the presence of paralogs DOI Creative Commons
Sheikh Azizul Hakim, Md. Rownok Zahan Ratul, Md. Shamsuzzoha Bayzid

et al.

Bioinformatics Advances, Journal Year: 2024, Volume and Issue: 4(1)

Published: Jan. 1, 2024

Abstract Motivation Gene trees often differ from the species that contain them due to various factors, including incomplete lineage sorting (ILS) and gene duplication loss (GDL). Several highly accurate tree estimation methods have been introduced explicitly address ILS, ASTRAL, a widely used statistically consistent method, wQFM, quartet amalgamation approach experimentally shown be more than ASTRAL. Two recent advancements, ASTRAL-Pro DISCO, emerged in phylogenomics consider GDL. introduces refined similarity measure, accounting for orthology paralogy. On other hand, DISCO offers general strategy decompose multi-copy into collection of single-copy trees, allowing utilization previously designed inference context trees. Results In this study, we first introduce some variants examine its underlying hypotheses present analytical results on statistical guarantees DISCO. particular, DISCO-R, variant with improved pruning provides robust results. We then demonstrate extensive evaluation studies simulated real data sets wQFM paired consistently matches or outperforms competing methods. Availability implementation DISCO-R are freely available at https://github.com/skhakim/DISCO-variants.

Language: Английский

Citations

1

Quintet Rooting: rooting species trees under the multi-species coalescent model DOI Creative Commons
Yasamin Tabatabaee, Kowshika Sarker, Tandy Warnow

et al.

Bioinformatics, Journal Year: 2022, Volume and Issue: 38(Supplement_1), P. i109 - i117

Published: April 14, 2022

Rooted species trees are a basic model with multiple applications throughout biology, including understanding adaptation, biodiversity, phylogeography and co-evolution. Because most tree estimation methods produce unrooted trees, for rooting these have been developed. However, either rely on prior biological knowledge or assume that evolution is close to clock-like, which not usually the case. Furthermore, do account processes create discordance between gene trees.We present Quintet Rooting (QR), method based proof of identifiability rooted under multi-species coalescent established by Allman, Degnan Rhodes (J. Math. Biol., 2011). We show QR generally more accurate than other methods, except extreme levels error.Quintet available in open source form at https://github.com/ytabatabaee/Quintet-Rooting. The simulated datasets used this study from https://www.ideals.illinois.edu/handle/2142/55319. dataset also http://gigadb.org/dataset/101041.Supplementary data Bioinformatics online.

Language: Английский

Citations

6

Improving quartet graph construction for scalable and accurate species tree estimation from gene trees DOI Creative Commons
Yunheng Han, Erin K. Molloy

Genome Research, Journal Year: 2023, Volume and Issue: unknown

Published: May 17, 2023

Summary methods are widely used to estimate species trees from genome-scale data. However, they can fail produce accurate when the input gene highly discordant because of estimation error and biological processes, such as incomplete lineage sorting. Here, we introduce TREE-QMC, a new summary method that offers accuracy scalability under these challenging scenarios. TREE-QMC builds upon weighted Quartet Max Cut, which takes quartets then constructs tree in divide-and-conquer fashion, at each step forming graph seeking its max cut. The wQMC has been successfully leveraged context by weighting their frequencies trees; improve this approach two ways. First, address normalizing quartet weights account for “artificial taxa” introduced during divide phase so subproblem solutions be combined conquer phase. Second, introducing an algorithm construct directly gives time complexity O ( n 3 k ), where is number trees, assuming decomposition perfectly balanced. These contributions enable competitive terms empirical runtime with leading quartet-based methods, even outperforming them on some model conditions explored our simulation study. We also present application avian phylogenomics data set.

Language: Английский

Citations

3

Designing Weights for Quartet-Based Methods When Data are Heterogeneous Across Lineages DOI Creative Commons
Marta Casanellas, Jesús Fernández-Sánchez, Marina Garrote-López

et al.

Bulletin of Mathematical Biology, Journal Year: 2023, Volume and Issue: 85(7)

Published: June 13, 2023

Abstract Homogeneity across lineages is a general assumption in phylogenetics according to which nucleotide substitution rates are common all lineages. Many phylogenetic methods relax this hypothesis but keep simple enough model make the process of sequence evolution more tractable. On other hand, dealing successfully with case (heterogeneity lineages) one key features reconstruction based on algebraic tools. The goal paper twofold. First, we present new weighting system for quartets () and semi-algebraic tools, thus especially indicated deal data evolving under heterogeneous rates. This method combines weights two previous by means test positivity branch lengths estimated paralinear distance. statistically consistent when applied generated Markov model, considers rate base composition heterogeneity among does not assume stationarity nor time-reversibility. Second, compare performance several quartet-based tree (namely QFM, wQFM, quartet puzzling, weight optimization Willson’s method) combination systems weights, including or These tests both simulated real support as reliable successful that improves upon accuracy global (such neighbor-joining maximum likelihood) presence long branches mixtures distributions trees.

Language: Английский

Citations

2

QT-GILD: Quartet Based Gene Tree Imputation Using Deep Learning Improves Phylogenomic Analyses Despite Missing Data DOI
Sazan Mahbub, Shashata Sawmya, Arpita Saha

et al.

Lecture notes in computer science, Journal Year: 2022, Volume and Issue: unknown, P. 159 - 176

Published: Jan. 1, 2022

Language: Английский

Citations

3