Inferring multi-locus selection in admixed populations DOI Creative Commons
Nicolás Ayala, Maximilian Genetti, Russell Corbett‐Detig

et al.

PLoS Genetics, Journal Year: 2023, Volume and Issue: 19(11), P. e1011062 - e1011062

Published: Nov. 28, 2023

Admixture, the exchange of genetic information between distinct source populations, is thought to be a major adaptive variation. Unlike mutation events, which periodically generate single alleles, admixture can introduce many selected alleles simultaneously. As such, effects linkage may especially pronounced in admixed populations. However, existing tools for identifying mutations within populations only account selection at site, overlooking phenomena such as among proximal alleles. Here, we develop and extensively validate method quantifying individual multiple linked sites on chromosome Our approach numerically calculates expected local ancestry landscape an population given multi-locus model, then maximizes likelihood model. After applying this Drosophila melanogaster Passer italiae, found that impacts important contributor natural Furthermore, situations considered, coefficients number are overestimated analyses do not consider sites. results imply evolutionary force This tool provides powerful generalized investigate these crucial diverse

Language: Английский

Harnessing deep learning for population genetic inference DOI
Xin Huang, Aigerim Rymbekova, Olga Dolgova

et al.

Nature Reviews Genetics, Journal Year: 2023, Volume and Issue: 25(1), P. 61 - 78

Published: Sept. 4, 2023

Language: Английский

Citations

29

IntroUNET: Identifying introgressed alleles via semantic segmentation DOI Creative Commons

Dylan D. Ray,

Lex Flagel, Daniel R. Schrider

et al.

PLoS Genetics, Journal Year: 2024, Volume and Issue: 20(2), P. e1010657 - e1010657

Published: Feb. 20, 2024

A growing body of evidence suggests that gene flow between closely related species is a widespread phenomenon. Alleles introgress from one into close relative are typically neutral or deleterious, but sometimes confer significant fitness advantage. Given the potential relevance to speciation and adaptation, numerous methods have therefore been devised identify regions genome experienced introgression. Recently, supervised machine learning approaches shown be highly effective for detecting One especially promising approach treat population genetic inference as an image classification problem, feed representation alignment input deep neural network distinguishes among evolutionary models (i.e. introgression no introgression). However, if we wish investigate full extent effects introgression, merely identifying genomic in harbor introgressed loci insufficient—ideally would able infer precisely which individuals material at positions genome. Here adapt algorithm semantic segmentation, task correctly type object each individual pixel belongs, alleles. Our trained thus infer, two-population alignment, those individual’s alleles were other population. We use simulated data show this accurate, it can readily extended unsampled “ghost” population, performing comparably method tailored specifically task. Finally, apply Drosophila , showing accurately recover haplotypes real data. This analysis reveals confined lower frequencies within genic regions, suggestive purifying selection, found much higher region previously affected by adaptive method’s success recovering challenging real-world scenarios underscores utility making richer inferences

Language: Английский

Citations

6

Interpreting generative adversarial networks to infer natural selection from genetic data DOI Creative Commons

Rebecca Riley,

Iain Mathieson, Sara Mathieson

et al.

Genetics, Journal Year: 2024, Volume and Issue: 226(4)

Published: Feb. 22, 2024

Abstract Understanding natural selection and other forms of non-neutrality is a major focus for the use machine learning in population genetics. Existing methods rely on computationally intensive simulated training data. Unlike efficient neutral coalescent simulations demographic inference, realistic typically require slow forward simulations. Because there are many possible modes selection, high dimensional parameter space must be explored, with no guarantee that models close to real processes. Finally, it difficult interpret trained neural networks, leading lack understanding about what features contribute classification. Here we develop new approach detect local evolutionary processes requires relatively few during training. We build upon generative adversarial network simulate This consists generator (fitted model), discriminator (convolutional network) predicts whether genomic region or fake. As can only generate data under processes, regions recognizes as having probability being “real” do not fit model therefore candidates targets selection. To incentivize identification specific mode fine-tune small number custom non-neutral show this has power various simulations, finds positive identified by state-of-the-art genetic three human populations. how networks clustering hidden units based their correlation patterns known summary statistics.

Language: Английский

Citations

6

The Biorepository and Integrative Genomics resource for inclusive genomics: insights from a diverse pediatric and admixed cohort DOI Creative Commons
Silvia Buonaiuto, Franco Mársico, Akram Mohammed

et al.

medRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 3, 2025

Abstract The Biorepository and Integrative Genomics (BIG) Initiative in Tennessee has developed a pioneering resource to address gaps genomic research by linking genomic, phenotypic, environmental data from diverse Mid-South population, including underrepresented groups. We analyzed 13,152 genomes BIG found significant genetic diversity, with 50% of participants inferred have non-European or several types admixed ancestry. Ancestry within the cohort is stratified, distinct geographic demographic patterns, as African ancestry more common urban areas, while European suburban regions. observe ancestry-specific rates novel variants, which are enriched for functional clinical relevance. Disease prevalence analysis linked factors, showing higher odds ratios asthma obesity minority groups, particularly area. Finally, we discrepancies between self-reported race ancestry, related individuals self-identifying differing racial categories. These findings underscore limitations biomedical variable. proven be an effective model community-centered precision medicine. integrated genomics education, fostered great trust among contributing communities. Future goals include expansion, enhanced analysis, ensure equitable healthcare outcomes.

Language: Английский

Citations

0

Admixture Increases Genetic Diversity and Adaptive Potential in Australasian Killer Whales DOI Creative Commons
Isabella Reeves, John Totterdell, Jonathan Sandoval‐Castillo

et al.

Molecular Ecology, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 28, 2025

Admixture is the exchange of genetic variation between differentiated demes, resulting in ancestry within a population coalescing multiple ancestral source populations. Low-latitude killer whales (Orcinus orca) populations typically have higher diversity than those more densely populated, high productivity and high-latitude regions. This has been hypothesized to be due episodic admixture with distinct backgrounds. We test this hypothesis by estimating local whole genome sequences from three genetically differentiated, low-latitude whale comparing them global variation. find 'Antarctic-like' tracts genomes southwestern Australia (SWA) including recent (within last 2-4 generations) admixture. Admixed individuals had, on average, shorter fewer runs homozygosity unadmixed increased effective size (Ne). Thus, connectivity demes results maintenance Ne relatively small at level comparable sum across demes. A subset admixed regions was inferred evolving under selection SWA population, suggesting that may contributing population's adaptive potential. study provides important rare empirical evidence can maintain sporadic different backgrounds promote long-term stability Ne.

Language: Английский

Citations

0

PhyloCoalSimulations: A Simulator for Network Multispecies Coalescent Models, Including a New Extension for the Inheritance of Gene Flow DOI

John Fogg,

Elizabeth S. Allman, Cécile Ané

et al.

Systematic Biology, Journal Year: 2023, Volume and Issue: 72(5), P. 1171 - 1179

Published: May 31, 2023

We consider the evolution of phylogenetic gene trees along species networks, according to network multispecies coalescent process, and introduce a new model with correlated inheritance flow. This generalizes two traditional versions coalescent: independent or common inheritance. At each reticulation, multiple lineages given locus are inherited from parental populations chosen at random, either independently across positive correlation Dirichlet process. process may account for locus-specific probabilities inheritance, example. implemented simulation under these models in Julia package PhyloCoalSimulations, which depends on PhyloNetworks its powerful manipulation tools. Input phylogenies can be read extended Newick format, numbers generations units. Simulated written way that preserves information about their embedding within network. used downstream purposes, such as simulate species-specific processes like rate variation species, other scenarios illustrated this note. should useful studies simulation-based inference methods. The software is available open source documentation tutorial https://github.com/cecileane/PhyloCoalSimulations.jl.

Language: Английский

Citations

9

IntroUNET: identifying introgressed alleles via semantic segmentation DOI Creative Commons

Dylan D. Ray,

Lex Flagel, Daniel R. Schrider

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Feb. 7, 2023

A growing body of evidence suggests that gene flow between closely related species is a widespread phenomenon. Alleles introgress from one into close relative are typically neutral or deleterious, but sometimes confer significant fitness advantage. Given the potential relevance to speciation and adaptation, numerous methods have therefore been devised identify regions genome experienced introgression. Recently, supervised machine learning approaches shown be highly effective for detecting One especially promising approach treat population genetic inference as an image classification problem, feed representation alignment input deep neural network distinguishes among evolutionary models (i.e. introgression no introgression). However, if we wish investigate full extent effects introgression, merely identifying genomic in harbor introgressed loci insufficient-ideally would able infer precisely which individuals material at positions genome. Here adapt algorithm semantic segmentation, task correctly type object each individual pixel belongs, alleles. Our trained thus infer, two-population alignment, those individual's alleles were other population. We use simulated data show this accurate, it can readily extended unsampled "ghost" population, performing comparably method tailored specifically task. Finally, apply

Language: Английский

Citations

7

The genomic footprint of social stratification in admixing American populations DOI Creative Commons
Àlex Mas-Sandoval, Sara Mathieson, Matteo Fumagalli

et al.

eLife, Journal Year: 2023, Volume and Issue: 12

Published: Dec. 1, 2023

Cultural and socioeconomic differences stratify human societies shape their genetic structure beyond the sole effect of geography. Despite mating being limited by sociocultural stratification, most demographic models in population genetics often assume random mating. Taking advantage correlation between stratification proportion ancestry admixed populations, we sought to infer former process Americas. To this aim, define a model where individual proportions genome inherited from Native American, European, sub-Saharan African ancestral populations constrain probabilities through ancestry-related assortative sex bias parameters. We simulate wide range admixture scenarios under model. Then, train deep neural network retrieve good performance predicting parameters genomic data. Our results show how shaped socially constructed racial gender hierarchies, has constrained processes Americas since European colonization subsequent Atlantic slave trade.

Language: Английский

Citations

6

Interpreting Generative Adversarial Networks to Infer Natural Selection from Genetic Data DOI Creative Commons

Rebecca Riley,

Iain Mathieson, Sara Mathieson

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: March 8, 2023

Understanding natural selection in humans and other species is a major focus for the use of machine learning population genetics. Existing methods rely on computationally intensive simulated training data. Unlike efficient neutral coalescent simulations demographic inference, realistic typically requires slow forward simulations. Because there are many possible modes selection, high dimensional parameter space must be explored, with no guarantee that models close to real processes. Mismatches between data test can lead incorrect inference. Finally, it difficult interpret trained neural networks, leading lack understanding about what features contribute classification. Here we develop new approach detect relatively few during training. We Generative Adversarial Network (GAN) simulate The resulting GAN consists generator (fitted model) discriminator (convolutional network). For genomic region, predicts whether "real" or "fake" sense could have been by generator. As includes regions experienced cannot produce such regions, probability being likely selection. To further incentivize this behavior, "fine-tune" small number show has power simulations, finds under identified state-of-the art genetic three human populations. how networks clustering hidden units based their correlation patterns known summary statistics. In summary, our novel, efficient, powerful way

Language: Английский

Citations

4

Fast and accurate local ancestry inference with Recomb-Mix DOI Creative Commons
Yuan Wei, Degui Zhi, Shaojie Zhang

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Nov. 18, 2023

The availability of large genotyped cohorts brings new opportunities for revealing the high-resolution genetic structure admixed populations via local ancestry inference (LAI), process identifying each segment an individual haplotype. Though current methods achieve high accuracy in standard cases, LAI is still challenging when reference are more similar (e.g., intra-continental), number too numerous, or admixture events deep time, all which increasingly unavoidable biobanks. Here, we present a method, Recomb-Mix. Recomb-Mix integrates elements existing site-based Li and Stephens model introduces graph collapsing trick to simplify counting paths with same label readout. Through comprehensive benchmarking on various simulated datasets, show that accurate than diverse sets scenarios while being competitive terms resource efficiency. We expect will be useful method advancing genetics studies populations.

Language: Английский

Citations

4