disperseNN2: a neural network for estimating dispersal distance from georeferenced polymorphism data DOI Creative Commons
Chris C. R. Smith, Andrew D. Kern

BMC Bioinformatics, Год журнала: 2023, Номер 24(1)

Опубликована: Окт. 11, 2023

Spatial genetic variation is shaped in part by an organism's dispersal ability. We present a deep learning tool, disperseNN2, for estimating the mean per-generation distance from georeferenced polymorphism data. Our neural network performs feature extraction on pairs of genotypes, and uses geographic information that comes with each sample. These attributes led disperseNN2 to outperform state-of-the-art method does not use explicit spatial information: relative absolute error was reduced 33% 48% using sample sizes 10 100 individuals, respectively. particularly useful non-model organisms or systems sparse genomic resources, as it unphased, single nucleotide polymorphisms its input. The software open source available https://github.com/kr-colab/disperseNN2 , documentation located at https://dispersenn2.readthedocs.io/en/latest/ .

Язык: Английский

Harnessing deep learning for population genetic inference DOI
Xin Huang, Aigerim Rymbekova, Olga Dolgova

и другие.

Nature Reviews Genetics, Год журнала: 2023, Номер 25(1), С. 61 - 78

Опубликована: Сен. 4, 2023

Язык: Английский

Процитировано

27

IntroUNET: Identifying introgressed alleles via semantic segmentation DOI Creative Commons

Dylan D. Ray,

Lex Flagel, Daniel R. Schrider

и другие.

PLoS Genetics, Год журнала: 2024, Номер 20(2), С. e1010657 - e1010657

Опубликована: Фев. 20, 2024

A growing body of evidence suggests that gene flow between closely related species is a widespread phenomenon. Alleles introgress from one into close relative are typically neutral or deleterious, but sometimes confer significant fitness advantage. Given the potential relevance to speciation and adaptation, numerous methods have therefore been devised identify regions genome experienced introgression. Recently, supervised machine learning approaches shown be highly effective for detecting One especially promising approach treat population genetic inference as an image classification problem, feed representation alignment input deep neural network distinguishes among evolutionary models (i.e. introgression no introgression). However, if we wish investigate full extent effects introgression, merely identifying genomic in harbor introgressed loci insufficient—ideally would able infer precisely which individuals material at positions genome. Here adapt algorithm semantic segmentation, task correctly type object each individual pixel belongs, alleles. Our trained thus infer, two-population alignment, those individual’s alleles were other population. We use simulated data show this accurate, it can readily extended unsampled “ghost” population, performing comparably method tailored specifically task. Finally, apply Drosophila , showing accurately recover haplotypes real data. This analysis reveals confined lower frequencies within genic regions, suggestive purifying selection, found much higher region previously affected by adaptive method’s success recovering challenging real-world scenarios underscores utility making richer inferences

Язык: Английский

Процитировано

6

Interpreting generative adversarial networks to infer natural selection from genetic data DOI Creative Commons

Rebecca Riley,

Iain Mathieson, Sara Mathieson

и другие.

Genetics, Год журнала: 2024, Номер 226(4)

Опубликована: Фев. 22, 2024

Abstract Understanding natural selection and other forms of non-neutrality is a major focus for the use machine learning in population genetics. Existing methods rely on computationally intensive simulated training data. Unlike efficient neutral coalescent simulations demographic inference, realistic typically require slow forward simulations. Because there are many possible modes selection, high dimensional parameter space must be explored, with no guarantee that models close to real processes. Finally, it difficult interpret trained neural networks, leading lack understanding about what features contribute classification. Here we develop new approach detect local evolutionary processes requires relatively few during training. We build upon generative adversarial network simulate This consists generator (fitted model), discriminator (convolutional network) predicts whether genomic region or fake. As can only generate data under processes, regions recognizes as having probability being “real” do not fit model therefore candidates targets selection. To incentivize identification specific mode fine-tune small number custom non-neutral show this has power various simulations, finds positive identified by state-of-the-art genetic three human populations. how networks clustering hidden units based their correlation patterns known summary statistics.

Язык: Английский

Процитировано

6

Inferring the geographic history of recombinant lineages using the full ancestral recombination graph DOI
Puneeth Deraje, James Kitchens, Graham Coop

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Апрель 14, 2024

Abstract Spatial patterns of genetic relatedness among samples reflect the past movements their ancestors. Our ability to untangle this history has potential improve dramatically given that we can now infer ultimate description relatedness, ancestral recombination graph (ARG). By extending spatial theory previously applied trees, generalize common model Brownian motion full ARGs, thereby accounting for correlations in trees along a chromosome while efficiently computing likelihood-based estimates dispersal rate and ancestor locations, with associated uncertainties. We evaluate model’s reconstruct histories using individual-based simulations unfortunately find clear bias locations. investigate causes bias, pinpointing discrepancy between true process at events. This highlights key hurdle ubiquitous analytically-tractable from which otherwise provide an efficient method inference, uncertainties, all information available ARG.

Язык: Английский

Процитировано

6

Estimating dispersal rates and locating genetic ancestors with genome-wide genealogies DOI Creative Commons
Matthew M. Osmond, Graham Coop

eLife, Год журнала: 2024, Номер 13

Опубликована: Ноя. 26, 2024

Spatial patterns in genetic diversity are shaped by individuals dispersing from their parents and larger-scale population movements. It has long been appreciated that these of movement shape the underlying genealogies along genome leading to geographic isolation-by-distance contemporary data. However, extracting enormous amount information contained recombining sequences has, until recently, not computationally feasible. Here, we capitalize on important recent advances genome-wide gene-genealogy reconstruction develop methods use thousands trees estimate per-generation dispersal rates locate ancestors a sample back through time. We take likelihood approach continuous space using simple approximate model (branching Brownian motion) as our prior distribution spatial genealogies. After testing method with simulations apply it Arabidopsis thaliana. rate roughly 60 km2/generation, slightly higher across latitude than longitude, potentially reflecting northward post-glacial expansion. Locating allows us visualize major movements, alternative histories, admixture. Our highlights huge about past events movements

Язык: Английский

Процитировано

5

This population does not exist: learning the distribution of evolutionary histories with generative adversarial networks DOI
William W. Booker,

Dylan D. Ray,

Daniel R. Schrider

и другие.

Genetics, Год журнала: 2023, Номер 224(2)

Опубликована: Апрель 17, 2023

Abstract Numerous studies over the last decade have demonstrated utility of machine learning methods when applied to population genetic tasks. More recent show potential deep-learning in particular, which allow researchers approach problems without making prior assumptions about how data should be summarized or manipulated, instead their own internal representation an attempt maximize inferential accuracy. One type deep neural network, called Generative Adversarial Networks (GANs), can even used generate new data, and this has been create individual artificial human genomes free from privacy concerns. In study, we further explore application GANs genetics by designing training a network learn statistical distribution alignments (i.e. sets consisting sequences entire sample) under several diverse evolutionary histories—the first GAN capable performing task. After testing multiple different architectures, report results fully differentiable Deep-Convolutional Wasserstein with gradient penalty that is generating examples successfully mimic key aspects including site-frequency spectrum, differentiation between populations, patterns linkage disequilibrium. We demonstrate consistent success across various models, models panmictic subdivided populations at equilibrium experiencing changes size, either no selection positive strengths, all need for extensive hyperparameter tuning. Overall, our findings highlight ability suggest future areas where work research discuss herein.

Язык: Английский

Процитировано

12

Tree sequences as a general-purpose tool for population genetic inference DOI Creative Commons
Logan S. Whitehouse,

Dylan D. Ray,

Daniel R. Schrider

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Фев. 21, 2024

ABSTRACT As population genetics data increases in size new methods have been developed to store genetic information efficient ways, such as tree sequences. These structures are computationally and storage efficient, but not interchangeable with existing used for many inference methodologies the use of convolutional neural networks (CNNs) applied alignments. To better utilize these we propose implement a graph network (GCN) directly learn from sequence topology node data, allowing applications without an intermediate step converting sequences alignment format. We then compare our approach standard CNN approaches on set previously defined benchmarking tasks including recombination rate estimation, positive selection detection, introgression demographic model parameter inference. show that can be learned using GCN perform well common accuracies roughly matching or even exceeding CNN-based method. become more widely research foresee developments optimizations this work provide foundation moving forward.

Язык: Английский

Процитировано

4

Population size rescaling significantly biases outcomes of forward-in-time population genetic simulations DOI
Amjad Dabi, Daniel R. Schrider

Genetics, Год журнала: 2024, Номер unknown

Опубликована: Ноя. 6, 2024

Simulations are an essential tool in all areas of population genetic research, used tasks such as the validation theoretical analysis and study complex evolutionary models. Forward-in-time simulations especially flexible, allowing for various types natural selection, architectures, non-Wright-Fisher dynamics. However, their intense computational requirements can be prohibitive to simulating large populations genomes. A popular method alleviate this burden is scale down size by some scaling factor while up mutation rate, selection coefficients, recombination rate same factor. rescaling approach may cases bias simulation results. To investigate manner degree which impacts outcomes, we carried out with different demographic histories distributions fitness effects using several values factor, Ǫ, compared deviation key outcomes (fixation times, allele frequencies, linkage disequilibrium, fraction mutations that fix during simulation) between scaled unscaled simulations. Our results indicate introduces substantial biases each these measured even at small Ʈ. Moreover, nature depends on model being examined. While increasing tends increase observed biases, relationship not always straightforward, thus it difficult know impact a priori. appears most models, only number replicates was needed accurately quantify produced given In summary, forward-in-time necessary many cases, researchers should aware procedure's consider investigating its magnitude smaller desired model(s) before selecting appropriate value

Язык: Английский

Процитировано

4

Sweeps in space: leveraging geographic data to identify beneficial alleles inAnopheles gambiae DOI Creative Commons
Clara T. Rehmann, Scott T. Small, Peter L. Ralph

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2025, Номер unknown

Опубликована: Фев. 9, 2025

Abstract As organisms adapt to environmental changes, natural selection modifies the frequency of non-neutral alleles. For beneficial mutations, outcome this process may be a selective sweep, in which an allele rapidly increases and perhaps reaches fixation within population. Selective sweeps have well-studied effects on patterns local genetic variation panmictic populations, but much less is known about dynamics continuous space. In particular, because limited movement across landscape leads unique population structure, spatial influence trajectory selected mutations. Here, we use forward-in-time, individual-based simulations space study impact mutations as they sweep through show that changes joint distribution geographic range occupied by focal demonstrate signal can used identify sweeps. We then leverage in-progress malaria vector Anopheles gambiae , species under strong pressure from control measures. By considering space, multiple previously undescribed variants with potential phenotypic consequences, including im-pacting IR-associated genes altering protein structure properties. Our results novel for detecting data implications genomic surveillance understanding variation.

Язык: Английский

Процитировано

0

Population Genetics Meets Ecology: A Guide to Individual‐Based Simulations in Continuous Landscapes DOI Creative Commons
Elizabeth T. Chevy, Jiseon Min, Victoria Caudill

и другие.

Ecology and Evolution, Год журнала: 2025, Номер 15(4)

Опубликована: Апрель 1, 2025

ABSTRACT Individual‐based simulation has become an increasingly crucial tool for many fields of population biology. However, continuous geography is important to applications, and implementing realistic stable simulations in space presents a variety difficulties, from modeling choices computational efficiency. This paper aims be practical guide spatial simulation, helping researchers implement individual‐based avoid common pitfalls. To do this, we delve into mechanisms mating, reproduction, density‐dependent feedback, dispersal, all which may vary across the landscape, discuss how these affect dynamics, describe parameterize convenient ways (for instance, achieve desired density). We also demonstrate models using current version simulator, SLiM. additionally natural selection—in particular, genetic variation can demographic processes. Finally, provide four short vignettes: pikas that shift their range up mountain as temperatures rise; mosquitoes live rivers juveniles experience seasonally changing habitat; cane toads expand Australia, reaching 120 million individuals; monarch butterflies whose populations are regulated by explicitly modeled resource (milkweed).

Язык: Английский

Процитировано

0