Timesweeper: Accurately Identifying Selective Sweeps Using Population Genomic Time Series DOI Creative Commons
Logan S. Whitehouse, Daniel R. Schrider

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2022, Номер unknown

Опубликована: Июль 7, 2022

ABSTRACT Despite decades of research, identifying selective sweeps, the genomic footprints positive selection, remains a core problem in population genetics. Of myriad methods that have been developed to tackle this task, few are designed leverage potential time-series data. This is because most genetic studies natural populations only single period time can be sampled. Recent advancements sequencing technology, including improvements extracting and ancient DNA, made repeated samplings possible, allowing for more direct analysis recent evolutionary dynamics. Serial sampling organisms with shorter generation times has also become feasible due cost throughput sequencing. With these advances mind, here we present Timesweeper, fast accurate convolutional neural network-based tool sweeps data consisting multiple over time. Timesweeper by first simulating training under demographic model appropriate interest, one-dimensional Convolutional Neural Network on said simulations, inferring which polymorphisms serialized dataset were target completed or ongoing sweep. We show simulated scenarios, identifies selected variants high resolution, estimates selection coefficients accurately than existing methods. In sum, inferences about possible when available; such will continue proliferate coming years both samples extant faster times, as well experimentally evolved where often generated. Methodological thus help resolve controversy role genome. provide Python package use community.

Язык: Английский

Harnessing deep learning for population genetic inference DOI
Xin Huang, Aigerim Rymbekova, Olga Dolgova

и другие.

Nature Reviews Genetics, Год журнала: 2023, Номер 25(1), С. 61 - 78

Опубликована: Сен. 4, 2023

Язык: Английский

Процитировано

29

Applications of machine learning in phylogenetics DOI
Mo Yu, Matthew W. Hahn, Megan L. Smith

и другие.

Molecular Phylogenetics and Evolution, Год журнала: 2024, Номер 196, С. 108066 - 108066

Опубликована: Март 31, 2024

Язык: Английский

Процитировано

16

Interpreting generative adversarial networks to infer natural selection from genetic data DOI Creative Commons

Rebecca Riley,

Iain Mathieson, Sara Mathieson

и другие.

Genetics, Год журнала: 2024, Номер 226(4)

Опубликована: Фев. 22, 2024

Abstract Understanding natural selection and other forms of non-neutrality is a major focus for the use machine learning in population genetics. Existing methods rely on computationally intensive simulated training data. Unlike efficient neutral coalescent simulations demographic inference, realistic typically require slow forward simulations. Because there are many possible modes selection, high dimensional parameter space must be explored, with no guarantee that models close to real processes. Finally, it difficult interpret trained neural networks, leading lack understanding about what features contribute classification. Here we develop new approach detect local evolutionary processes requires relatively few during training. We build upon generative adversarial network simulate This consists generator (fitted model), discriminator (convolutional network) predicts whether genomic region or fake. As can only generate data under processes, regions recognizes as having probability being “real” do not fit model therefore candidates targets selection. To incentivize identification specific mode fine-tune small number custom non-neutral show this has power various simulations, finds positive identified by state-of-the-art genetic three human populations. how networks clustering hidden units based their correlation patterns known summary statistics.

Язык: Английский

Процитировано

6

Tree sequences as a general-purpose tool for population genetic inference DOI Creative Commons
Logan S. Whitehouse,

Dylan D. Ray,

Daniel R. Schrider

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Фев. 21, 2024

ABSTRACT As population genetics data increases in size new methods have been developed to store genetic information efficient ways, such as tree sequences. These structures are computationally and storage efficient, but not interchangeable with existing used for many inference methodologies the use of convolutional neural networks (CNNs) applied alignments. To better utilize these we propose implement a graph network (GCN) directly learn from sequence topology node data, allowing applications without an intermediate step converting sequences alignment format. We then compare our approach standard CNN approaches on set previously defined benchmarking tasks including recombination rate estimation, positive selection detection, introgression demographic model parameter inference. show that can be learned using GCN perform well common accuracies roughly matching or even exceeding CNN-based method. become more widely research foresee developments optimizations this work provide foundation moving forward.

Язык: Английский

Процитировано

4

Applications of Machine Learning in Phylogenetics DOI Creative Commons
Mo Yu, Matthew W. Hahn, Megan Smith

и другие.

Опубликована: Окт. 14, 2023

Machine learning has increasingly been applied to a wide range of questions in phylogeneticinference. Supervised machine approaches that rely on simulated training data have beenused infer tree topologies and branch lengths, select substitution models, performdownstream inferences introgression diversification. Here, we review how researchers haveused several promising make phylogenetic inferences. Despitethe promise these methods, barriers prevent supervised from reachingits full potential phylogenetics. We discuss paths forward. In thefuture, expect the application careful network designs encodings will allowsupervised accommodate complex processes continue confoundtraditional methods.

Язык: Английский

Процитировано

4

Estimation of spatial demographic maps from polymorphism data using a neural network DOI Creative Commons
Chris C. R. Smith,

Gilia Patterson,

Peter L. Ralph

и другие.

Molecular Ecology Resources, Год журнала: 2024, Номер 24(7)

Опубликована: Авг. 16, 2024

Abstract A fundamental goal in population genetics is to understand how variation arrayed over natural landscapes. From first principles we know that common features such as heterogeneous densities and barriers dispersal should shape genetic space, however there are few tools currently available can deal with these ubiquitous complexities. Geographically referenced single nucleotide polymorphism (SNP) data increasingly accessible, presenting an opportunity study across geographic space myriad species. We present a new inference method uses geo‐referenced SNPs deep neural network estimate spatially maps of density rate. Our trains on simulated input output pairings, where the consists genotypes sampling locations generated from continuous simulator, map true demographic parameters. benchmark our tool against existing methods discuss qualitative differences between different approaches; particular, program unique because it infers magnitude both well their landscape, does so using SNP data. Similar constrained estimating relative migration rates, or require identity‐by‐descent blocks input. applied empirical North American grey wolves, for which estimated mostly reasonable parameters, but was affected by incomplete spatial sampling. Genetic based like ours complement other, direct past demography, believe will serve valuable applications conservation, ecology evolutionary biology. An open source software package implementing https://github.com/kr‐colab/mapNN .

Язык: Английский

Процитировано

1

Timesweeper: Accurately Identifying Selective Sweeps Using Population Genomic Time Series DOI Creative Commons
Logan S. Whitehouse, Daniel R. Schrider

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2022, Номер unknown

Опубликована: Июль 7, 2022

ABSTRACT Despite decades of research, identifying selective sweeps, the genomic footprints positive selection, remains a core problem in population genetics. Of myriad methods that have been developed to tackle this task, few are designed leverage potential time-series data. This is because most genetic studies natural populations only single period time can be sampled. Recent advancements sequencing technology, including improvements extracting and ancient DNA, made repeated samplings possible, allowing for more direct analysis recent evolutionary dynamics. Serial sampling organisms with shorter generation times has also become feasible due cost throughput sequencing. With these advances mind, here we present Timesweeper, fast accurate convolutional neural network-based tool sweeps data consisting multiple over time. Timesweeper by first simulating training under demographic model appropriate interest, one-dimensional Convolutional Neural Network on said simulations, inferring which polymorphisms serialized dataset were target completed or ongoing sweep. We show simulated scenarios, identifies selected variants high resolution, estimates selection coefficients accurately than existing methods. In sum, inferences about possible when available; such will continue proliferate coming years both samples extant faster times, as well experimentally evolved where often generated. Methodological thus help resolve controversy role genome. provide Python package use community.

Язык: Английский

Процитировано

4