Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model DOI Creative Commons
Yunheng Han, Erin K. Molloy

Algorithms for Molecular Biology, Journal Year: 2023, Volume and Issue: 18(1)

Published: Dec. 1, 2023

Cancer progression and treatment can be informed by reconstructing its evolutionary history from tumor cells. Although many methods exist to estimate trees (called phylogenies) molecular sequences, traditional approaches assume the input data are error-free output tree is fully resolved. These assumptions challenged in phylogenetics because single-cell sequencing produces sparse, error-ridden tumors evolve clonally. Here, we study theoretical utility of based on quartets (four-leaf, unrooted phylogenetic trees) light these barriers. We consider a popular model, which mutations arise (highly unresolved) then (unbiased) errors missing values introduced. Quartets implied present two cells absent Our main result that most probable quartet identifies model four This motivates seeking such number shared between it maximized. prove an optimal solution this problem consistent estimator cell lineage tree; guarantee includes case where highly unresolved, with error defined as false negative branches. Lastly, outline how quartet-based might employed when there copy aberrations other challenges specific phylogenetics.

Language: Английский

PhyKIT: A Multitool for Phylogenomics DOI Creative Commons
Jacob L. Steenwyk, Gemma I. Martínez‐Redondo, Thomas J. Buida

et al.

Current Protocols, Journal Year: 2024, Volume and Issue: 4(11)

Published: Oct. 30, 2024

Abstract Multiple sequence alignments and phylogenetic trees are rich in biological information fundamental to research biology. PhyKIT is a tool for processing analyzing the content of multiple trees. Here, we describe how use diverse analyses, including (i) constructing phylogenomic supermatrix, (ii) detecting errors orthology inference, (iii) quantifying biases data sets, (iv) identifying radiation events or lack resolution using gene support frequencies, (v) conducting evolution‐based screens facilitate function prediction. Several functions that streamline alignment processing—such as renaming FASTA entries tree tips—are also discussed. These protocols demonstrate simple command‐line operations unified framework analysis processing, from supermatrix construction diagnosis gaining clues about function. © 2024 The Author(s). Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1 : Installing syntax usage 2 Constructing 3 Detecting anomalies relationships 4 Quantifying matrices related measures 5 Identifying polytomies 6 Assessing gene‐gene coevolution genetic screen

Language: Английский

Citations

2

wQFM-TREE: highly accurate and scalable quartet-based species tree inference from gene trees DOI Open Access
Abdur Rafi, Ahmed Mahir Sultan Rumi, Sheikh Azizul Hakim

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: July 31, 2024

Abstract Summary methods are becoming increasingly popular for species tree estimation from multi-locus data in the presence of gene discordance. ASTRAL, a leading method this class, solves Maximum Quartet Support Species Tree problem within constrained solution space constructed input trees. In contrast, alternative heuristics such as wQFM and wQMC operate by taking set weighted quartets employ divide-and-conquer strategy to construct tree. Recent studies showed be more accurate than ASTRAL wQMC, though its scalability is hindered computational demands explicitly generating weighting Θ( n 4 ) quartets. Here, we introduce wQFM-TREE, novel summary that enhances circumventing need explicit quartet generation weighting, thereby enabling application large datasets. Unlike wQFM, wQFM-TREE can also handle polytomies. Extensive simulations under diverse challenging model conditions, with hundreds or thousands taxa genes, consistently demonstrate matches improves upon accuracy ASTRAL. Specifically, outperformed 25 27 conditions analyzed study involving 200-1000 taxa, statistically significant differences 20 these conditions. Moreover, applied re-analyze green plant dataset One Thousand Plant Transcriptomes Initiative. Its remarkable position highly competitive field. Additionally, algorithmic combinatorial innovations introduced will benefit various quartet-based computations, advancing state-of-the-art phylogenetic estimations.

Language: Английский

Citations

0

Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model DOI Creative Commons
Yunheng Han, Erin K. Molloy

Algorithms for Molecular Biology, Journal Year: 2023, Volume and Issue: 18(1)

Published: Dec. 1, 2023

Cancer progression and treatment can be informed by reconstructing its evolutionary history from tumor cells. Although many methods exist to estimate trees (called phylogenies) molecular sequences, traditional approaches assume the input data are error-free output tree is fully resolved. These assumptions challenged in phylogenetics because single-cell sequencing produces sparse, error-ridden tumors evolve clonally. Here, we study theoretical utility of based on quartets (four-leaf, unrooted phylogenetic trees) light these barriers. We consider a popular model, which mutations arise (highly unresolved) then (unbiased) errors missing values introduced. Quartets implied present two cells absent Our main result that most probable quartet identifies model four This motivates seeking such number shared between it maximized. prove an optimal solution this problem consistent estimator cell lineage tree; guarantee includes case where highly unresolved, with error defined as false negative branches. Lastly, outline how quartet-based might employed when there copy aberrations other challenges specific phylogenetics.

Language: Английский

Citations

0