bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown
Published: Dec. 21, 2023
A bstract Methods for rapidly inferring the evolutionary history of species or populations with genome-wide data are progressing, but computational constraints still limit our abilities in this area. We developed an alignment-free method to infer phylogenies and implemented it Python package T opic C ontml . The uses probabilistic topic modeling (specifically, Latent Dirichlet Allocation LDA) extract ‘topic’ frequencies from k -mers, which derived multilocus DNA sequences. These extracted then serve as input program PHYLIP package, is used generate a tree. evaluated performance on simulated datasets gaps three biological datasets: (1) 14 sequence loci two Australian bird distributed across nine populations, (2) 5162 80 mammal species, (3) raw, unaligned, non-orthologous P ac B io sequences 12 species. Our empirical results suggest that efficient statistically robust. also assessed uncertainty estimated relationships among clades using bootstrap procedure.
Language: Английский