Timesweeper: Accurately Identifying Selective Sweeps Using Population Genomic Time Series DOI Creative Commons
Logan S. Whitehouse, Daniel R. Schrider

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2022, Volume and Issue: unknown

Published: July 7, 2022

ABSTRACT Despite decades of research, identifying selective sweeps, the genomic footprints positive selection, remains a core problem in population genetics. Of myriad methods that have been developed to tackle this task, few are designed leverage potential time-series data. This is because most genetic studies natural populations only single period time can be sampled. Recent advancements sequencing technology, including improvements extracting and ancient DNA, made repeated samplings possible, allowing for more direct analysis recent evolutionary dynamics. Serial sampling organisms with shorter generation times has also become feasible due cost throughput sequencing. With these advances mind, here we present Timesweeper, fast accurate convolutional neural network-based tool sweeps data consisting multiple over time. Timesweeper by first simulating training under demographic model appropriate interest, one-dimensional Convolutional Neural Network on said simulations, inferring which polymorphisms serialized dataset were target completed or ongoing sweep. We show simulated scenarios, identifies selected variants high resolution, estimates selection coefficients accurately than existing methods. In sum, inferences about possible when available; such will continue proliferate coming years both samples extant faster times, as well experimentally evolved where often generated. Methodological thus help resolve controversy role genome. provide Python package use community.

Language: Английский

Harnessing deep learning for population genetic inference DOI
Xin Huang, Aigerim Rymbekova, Olga Dolgova

et al.

Nature Reviews Genetics, Journal Year: 2023, Volume and Issue: 25(1), P. 61 - 78

Published: Sept. 4, 2023

Language: Английский

Citations

29

Applications of machine learning in phylogenetics DOI
Mo Yu, Matthew W. Hahn, Megan L. Smith

et al.

Molecular Phylogenetics and Evolution, Journal Year: 2024, Volume and Issue: 196, P. 108066 - 108066

Published: March 31, 2024

Language: Английский

Citations

14

Interpreting generative adversarial networks to infer natural selection from genetic data DOI Creative Commons

Rebecca Riley,

Iain Mathieson, Sara Mathieson

et al.

Genetics, Journal Year: 2024, Volume and Issue: 226(4)

Published: Feb. 22, 2024

Abstract Understanding natural selection and other forms of non-neutrality is a major focus for the use machine learning in population genetics. Existing methods rely on computationally intensive simulated training data. Unlike efficient neutral coalescent simulations demographic inference, realistic typically require slow forward simulations. Because there are many possible modes selection, high dimensional parameter space must be explored, with no guarantee that models close to real processes. Finally, it difficult interpret trained neural networks, leading lack understanding about what features contribute classification. Here we develop new approach detect local evolutionary processes requires relatively few during training. We build upon generative adversarial network simulate This consists generator (fitted model), discriminator (convolutional network) predicts whether genomic region or fake. As can only generate data under processes, regions recognizes as having probability being “real” do not fit model therefore candidates targets selection. To incentivize identification specific mode fine-tune small number custom non-neutral show this has power various simulations, finds positive identified by state-of-the-art genetic three human populations. how networks clustering hidden units based their correlation patterns known summary statistics.

Language: Английский

Citations

6

Tree sequences as a general-purpose tool for population genetic inference DOI Creative Commons
Logan S. Whitehouse,

Dylan D. Ray,

Daniel R. Schrider

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Feb. 21, 2024

ABSTRACT As population genetics data increases in size new methods have been developed to store genetic information efficient ways, such as tree sequences. These structures are computationally and storage efficient, but not interchangeable with existing used for many inference methodologies the use of convolutional neural networks (CNNs) applied alignments. To better utilize these we propose implement a graph network (GCN) directly learn from sequence topology node data, allowing applications without an intermediate step converting sequences alignment format. We then compare our approach standard CNN approaches on set previously defined benchmarking tasks including recombination rate estimation, positive selection detection, introgression demographic model parameter inference. show that can be learned using GCN perform well common accuracies roughly matching or even exceeding CNN-based method. become more widely research foresee developments optimizations this work provide foundation moving forward.

Language: Английский

Citations

4

Applications of Machine Learning in Phylogenetics DOI Creative Commons
Mo Yu, Matthew W. Hahn, Megan Smith

et al.

Published: Oct. 14, 2023

Machine learning has increasingly been applied to a wide range of questions in phylogeneticinference. Supervised machine approaches that rely on simulated training data have beenused infer tree topologies and branch lengths, select substitution models, performdownstream inferences introgression diversification. Here, we review how researchers haveused several promising make phylogenetic inferences. Despitethe promise these methods, barriers prevent supervised from reachingits full potential phylogenetics. We discuss paths forward. In thefuture, expect the application careful network designs encodings will allowsupervised accommodate complex processes continue confoundtraditional methods.

Language: Английский

Citations

4

Estimation of spatial demographic maps from polymorphism data using a neural network DOI Creative Commons
Chris C. R. Smith,

Gilia Patterson,

Peter L. Ralph

et al.

Molecular Ecology Resources, Journal Year: 2024, Volume and Issue: 24(7)

Published: Aug. 16, 2024

Abstract A fundamental goal in population genetics is to understand how variation arrayed over natural landscapes. From first principles we know that common features such as heterogeneous densities and barriers dispersal should shape genetic space, however there are few tools currently available can deal with these ubiquitous complexities. Geographically referenced single nucleotide polymorphism (SNP) data increasingly accessible, presenting an opportunity study across geographic space myriad species. We present a new inference method uses geo‐referenced SNPs deep neural network estimate spatially maps of density rate. Our trains on simulated input output pairings, where the consists genotypes sampling locations generated from continuous simulator, map true demographic parameters. benchmark our tool against existing methods discuss qualitative differences between different approaches; particular, program unique because it infers magnitude both well their landscape, does so using SNP data. Similar constrained estimating relative migration rates, or require identity‐by‐descent blocks input. applied empirical North American grey wolves, for which estimated mostly reasonable parameters, but was affected by incomplete spatial sampling. Genetic based like ours complement other, direct past demography, believe will serve valuable applications conservation, ecology evolutionary biology. An open source software package implementing https://github.com/kr‐colab/mapNN .

Language: Английский

Citations

1

Timesweeper: Accurately Identifying Selective Sweeps Using Population Genomic Time Series DOI Creative Commons
Logan S. Whitehouse, Daniel R. Schrider

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2022, Volume and Issue: unknown

Published: July 7, 2022

ABSTRACT Despite decades of research, identifying selective sweeps, the genomic footprints positive selection, remains a core problem in population genetics. Of myriad methods that have been developed to tackle this task, few are designed leverage potential time-series data. This is because most genetic studies natural populations only single period time can be sampled. Recent advancements sequencing technology, including improvements extracting and ancient DNA, made repeated samplings possible, allowing for more direct analysis recent evolutionary dynamics. Serial sampling organisms with shorter generation times has also become feasible due cost throughput sequencing. With these advances mind, here we present Timesweeper, fast accurate convolutional neural network-based tool sweeps data consisting multiple over time. Timesweeper by first simulating training under demographic model appropriate interest, one-dimensional Convolutional Neural Network on said simulations, inferring which polymorphisms serialized dataset were target completed or ongoing sweep. We show simulated scenarios, identifies selected variants high resolution, estimates selection coefficients accurately than existing methods. In sum, inferences about possible when available; such will continue proliferate coming years both samples extant faster times, as well experimentally evolved where often generated. Methodological thus help resolve controversy role genome. provide Python package use community.

Language: Английский

Citations

4