Biases in ARG-based inference of historical population size in populations experiencing selection DOI Creative Commons
Jacob I. Marsh, Parul Johri

Molecular Biology and Evolution, Journal Year: 2024, Volume and Issue: 41(7)

Published: June 14, 2024

Inferring the demographic history of populations provides fundamental insights into species dynamics and is essential for developing a null model to accurately study selective processes. However, background selection sweeps can produce genomic signatures at linked sites that mimic or mask signals associated with historical population size change. While theoretical biases introduced by effects have been well established, it unclear whether ancestral recombination graph (ARG)-based approaches inference in typical empirical analyses are susceptible misinference due these effects. To address this, we developed highly realistic forward simulations human Drosophila melanogaster populations, including empirically estimated variability gene density, mutation rates, purifying, positive selection, across different scenarios, broadly assess impact on using genealogy-based approach. Our results indicate minimally although could cause similar genome architecture parameters experiencing more frequent recurrent sweeps. We found accurate D. ARG-based methods compromised presence pervasive alone, leading spurious inferences recent expansion, which may be further worsened sweeps, depending proportion strength beneficial mutations. Caution additional testing species-specific needed when inferring non-human avoid selection.

Language: Английский

A general and efficient representation of ancestral recombination graphs DOI Creative Commons
Yan Wong, Anastasia Ignatieva, Jere Koskela

et al.

Genetics, Journal Year: 2024, Volume and Issue: 228(1)

Published: July 16, 2024

Abstract As a result of recombination, adjacent nucleotides can have different paths genetic inheritance and therefore the genealogical trees for sample DNA sequences vary along genome. The structure capturing details these intricately interwoven is referred to as an ancestral recombination graph (ARG). Classical formalisms focused on mapping coalescence events nodes in ARG. However, this approach out step with some modern developments, which do not represent terms or explicitly infer them. We present simple formalism that defines ARG specific genomes their intervals inheritance, show how it generalizes classical treatments encompasses outputs recent methods. discuss nuances arising from more general structure, argue forms appropriate basis software standard rapidly growing field.

Language: Английский

Citations

20

High-resolution genomic history of early medieval Europe DOI Creative Commons
Leo Speidel, Marina Silva, Thomas J. Booth

et al.

Nature, Journal Year: 2025, Volume and Issue: 637(8044), P. 118 - 126

Published: Jan. 1, 2025

Abstract Many known and unknown historical events have remained below detection thresholds of genetic studies because subtle ancestry changes are challenging to reconstruct. Methods based on shared haplotypes 1,2 rare variants 3,4 improve power but not explicitly temporal been possible adopt in unbiased models. Here we develop Twigstats, an approach time-stratified analysis that can statistical by order magnitude focusing coalescences recent times, while remaining population-specific drift. We apply this framework 1,556 available ancient whole genomes from Europe the period. able model individual-level using preceding provide high resolution. During first half millennium ce , observe at least two different streams Scandinavian-related expanding across western, central eastern Europe. By contrast, during second patterns suggest regional disappearance or substantial admixture these ancestries. In Scandinavia, document a major influx approximately 800 when large proportion Viking Age individuals carried groups related seen early Iron Age. Our findings higher-resolution lens for history.

Language: Английский

Citations

2

Simultaneous Inference of Past Demography and Selection from the Ancestral Recombination Graph under the Beta Coalescent DOI Creative Commons
Kevin Korfmann, Thibaut Sellinger, Fabian Freund

et al.

Peer Community Journal, Journal Year: 2024, Volume and Issue: 4

Published: March 18, 2024

The reproductive mechanism of a species is key driver genome evolution. standard Wright-Fisher model for the reproduction individuals in population assumes that each individual produces number offspring negligible compared to total size. Yet many plants, invertebrates, prokaryotes or fish exhibit neutrally skewed distribution strong selection events yielding few produce up same magnitude as As result, genealogy sample characterized by multiple (more than two) coalescing simultaneously common ancestor. current methods developed detect such merger do not account complex demographic scenarios recombination, and require large sizes. We tackle these limitations developing two novel different approaches infer from sequence data ancestral recombination graph (ARG): sequentially Markovian coalescent (SMβC) neural network (GNNcoal). first give proof accuracy our estimate parameter past history using simulated under β-coalescent model. Secondly, we show can also recover effect positive selective sweeps along genome. Finally, are able distinguish while inferring variation Our findings stress aptitude networks leverage information ARG inference but urgent need more accurate approaches.

Language: Английский

Citations

11

A geographic history of human genetic ancestry DOI Creative Commons
Michael C. Gründler, Jonathan Terhorst, Gideon S. Bradburd

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: March 29, 2024

Describing the distribution of genetic variation across individuals is a fundamental goal population genetics. In humans, traditional approaches for describing often rely on discrete ancestry labels, which, despite their utility, can obscure complex, multi-faceted nature human history. These labels risk oversimplifying by ignoring its temporal depth and geographic continuity, may therefore conflate notions race, ethnicity, geography, ancestry. Here, we present method that capitalizes rich genealogical information encoded in genomic tree sequences to infer locations shared ancestors sample sequenced individuals. We use this history set genomes sampled from Europe, Asia, Africa, accurately recovering major movements those continents. Our findings demonstrate importance defining spatial-temporal context caution against oversimplified interpretations data prevalent contemporary discussions race

Language: Английский

Citations

9

Estimating evolutionary and demographic parameters via ARG-derived IBD DOI Creative Commons
Zhendong Huang, Jerome Kelleher, Yao-ban Chan

et al.

PLoS Genetics, Journal Year: 2025, Volume and Issue: 21(1), P. e1011537 - e1011537

Published: Jan. 8, 2025

Inference of evolutionary and demographic parameters from a sample genome sequences often proceeds by first inferring identical-by-descent (IBD) segments. By exploiting efficient data encoding based on the ancestral recombination graph (ARG), we obtain three major advantages over current approaches: (i) no need to impose length threshold IBD segments, (ii) can be defined without hard-to-verify requirement recombination, (iii) computation time reduced with little loss statistical efficiency using only segments set sequence pairs that scales linearly size. We demonstrate powerful inferences when true information is available simulated data. For inferred real data, propose an approximate Bayesian inference algorithm use it show even poorly-inferred short improve estimation. Our mutation-rate estimator achieves precision similar previously-published method despite 4 000-fold reduction in used for inference, identify significant differences between human populations. Computational cost limits model complexity our approach, but are able incorporate unknown nuisance misspecification, still finding improved parameter inference.

Language: Английский

Citations

1

A genealogy-based approach for revealing ancestry-specific structures in admixed populations DOI Creative Commons

Ji Tang,

Charleston W. K. Chiang

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 14, 2025

Elucidating ancestry-specific structures in admixed populations is crucial for comprehending population history and mitigating confounding effects genome-wide association studies. Existing methods elucidating the generally rely on frequency-based estimates of genetic relationship matrix (GRM) among individuals after masking segments from ancestry components not being targeted investigation. However, these approaches disregard linkage information between markers, potentially limiting their resolution revealing structure within an component. We introduce expected GRM (as-eGRM), a novel framework relatedness individuals. The key design as-eGRM consists defining pairwise based genealogical trees encoded Ancestral Recombination Graph (ARG) local calls computing expectation across genome. Comprehensive evaluations using both simulated stepping-stone models empirical datasets three-way Latino cohorts showed that analysis robustly outperforms existing with diverse demographic histories. Taken together, has promise to better reveal fine-scale component individuals, which can help improve robustness interpretation findings studies disease or complex traits understudied populations.

Language: Английский

Citations

1

A geographic history of human genetic ancestry DOI
Michael C. Gründler, Jonathan Terhorst, Gideon S. Bradburd

et al.

Science, Journal Year: 2025, Volume and Issue: 387(6741), P. 1391 - 1397

Published: March 27, 2025

Describing the distribution of genetic variation across individuals is a fundamental goal population genetics. We present method that capitalizes on rich genealogical information encoded in genomic tree sequences to infer geographic locations shared ancestors sample sequenced individuals. used this history ancestry set human genomes sampled from Europe, Asia, and Africa, accurately recovering major movements those continents. Our findings demonstrate importance defining spatiotemporal context when describing caution against oversimplified interpretations data prevalent contemporary discussions race ancestry.

Language: Английский

Citations

1

Towards an unbiased characterization of genetic polymorphism DOI Creative Commons
Anna A. Igolkina, Sebastian Vorbrugg, Fernando A. Rabanal

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: May 30, 2024

Our view of genetic polymorphism is shaped by methods that provide a limited and reference-biased picture. Long-read sequencing technologies, which are starting to nearly complete genome sequences for population samples, should solve the problem—except characterizing making sense non-SNP variation difficult even with perfect sequence data. Here, we analyze 27 genomes Arabidopsis thaliana in an attempt address these issues, illustrate what can be learned analyzing whole-genome data unbiased manner. Estimated sizes range from 135 155 Mb, differences almost entirely due centromeric rDNA repeats. The completely assembled chromosome arms comprise roughly 120 Mb all accessions, but full structural variants, many caused insertions transposable elements (TEs) subsequent partial deletions such insertions. Even only pan-genome coordinate system includes resulting ends up being 40% larger than size any one genome. analysis reveals incompletely annotated mobile-ome: our ability predict actually moving poor, detect several novel TE families. In contrast this, genic portion, or “gene-ome”, highly conserved. By annotating each using accession-specific transcriptome data, find 13% genes segregating most transcriptionally silenced. Finally, show short-read previously massively underestimated kinds, including SNPs—mostly regions where short reads could not mapped reliably, also were incorrectly. We demonstrate SNP-calling errors biased choice reference genome, RNA-seq BS-seq results strongly affected mapping rather assayed individual. conclusion, while pose tremendous analytical challenges, they will ultimately revolutionize understanding evolution.

Language: Английский

Citations

6

A general and efficient representation of ancestral recombination graphs DOI Creative Commons
Yan Wong, Anastasia Ignatieva, Jere Koskela

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Nov. 4, 2023

Abstract As a result of recombination, adjacent nucleotides can have different paths genetic inheritance and therefore the genealogical trees for sample DNA sequences vary along genome. The structure capturing details these intricately interwoven is referred to as an ancestral recombination graph (ARG). Classical formalisms focused on mapping coalescence events nodes in ARG. This approach out step with modern developments, which do not represent terms or explicitly infer them. We present simple formalism that defines ARG specific genomes their intervals inheritance, show how it generalises classical treatments encompasses outputs recent methods. discuss nuances arising from this more general structure, argue forms appropriate basis software standard rapidly growing field.

Language: Английский

Citations

15

Estimating dispersal rates and locating genetic ancestors with genome-wide genealogies DOI Creative Commons
Matthew M. Osmond, Graham Coop

eLife, Journal Year: 2024, Volume and Issue: 13

Published: Nov. 26, 2024

Spatial patterns in genetic diversity are shaped by individuals dispersing from their parents and larger-scale population movements. It has long been appreciated that these of movement shape the underlying genealogies along genome leading to geographic isolation-by-distance contemporary data. However, extracting enormous amount information contained recombining sequences has, until recently, not computationally feasible. Here, we capitalize on important recent advances genome-wide gene-genealogy reconstruction develop methods use thousands trees estimate per-generation dispersal rates locate ancestors a sample back through time. We take likelihood approach continuous space using simple approximate model (branching Brownian motion) as our prior distribution spatial genealogies. After testing method with simulations apply it Arabidopsis thaliana. rate roughly 60 km2/generation, slightly higher across latitude than longitude, potentially reflecting northward post-glacial expansion. Locating allows us visualize major movements, alternative histories, admixture. Our highlights huge about past events movements

Language: Английский

Citations

5