Biases in ARG-based inference of historical population size in populations experiencing selection DOI Creative Commons
Jacob I. Marsh, Parul Johri

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: April 26, 2024

Abstract Inferring the demographic history of populations provides fundamental insights into species dynamics and is essential for developing a null model to accurately study selective processes. However, background selection sweeps can produce genomic signatures at linked sites that mimic or mask signals associated with historical population size change. While theoretical biases introduced by effects have been well established, it unclear whether ARG-based approaches inference in typical empirical analyses are susceptible mis-inference due these effects. To address this, we developed highly realistic forward simulations human Drosophila melanogaster populations, including empirically estimated variability gene density, mutation rates, recombination purifying positive selection, across different scenarios, broadly assess impact on using genealogy-based approach. Our results indicate minimally though could cause similar genome architecture parameters experiencing more frequent recurrent sweeps. We found accurate D. methods compromised presence pervasive alone, leading spurious inferences recent expansion which may be further worsened sweeps, depending proportion strength beneficial mutations. Caution additional testing species-specific needed when inferring non-human avoid selection.

Language: Английский

A general and efficient representation of ancestral recombination graphs DOI Creative Commons
Yan Wong, Anastasia Ignatieva, Jere Koskela

et al.

Genetics, Journal Year: 2024, Volume and Issue: 228(1)

Published: July 16, 2024

Abstract As a result of recombination, adjacent nucleotides can have different paths genetic inheritance and therefore the genealogical trees for sample DNA sequences vary along genome. The structure capturing details these intricately interwoven is referred to as an ancestral recombination graph (ARG). Classical formalisms focused on mapping coalescence events nodes in ARG. However, this approach out step with some modern developments, which do not represent terms or explicitly infer them. We present simple formalism that defines ARG specific genomes their intervals inheritance, show how it generalizes classical treatments encompasses outputs recent methods. discuss nuances arising from more general structure, argue forms appropriate basis software standard rapidly growing field.

Language: Английский

Citations

21

High-resolution genomic history of early medieval Europe DOI Creative Commons
Leo Speidel, Marina Silva, Thomas J. Booth

et al.

Nature, Journal Year: 2025, Volume and Issue: 637(8044), P. 118 - 126

Published: Jan. 1, 2025

Abstract Many known and unknown historical events have remained below detection thresholds of genetic studies because subtle ancestry changes are challenging to reconstruct. Methods based on shared haplotypes 1,2 rare variants 3,4 improve power but not explicitly temporal been possible adopt in unbiased models. Here we develop Twigstats, an approach time-stratified analysis that can statistical by order magnitude focusing coalescences recent times, while remaining population-specific drift. We apply this framework 1,556 available ancient whole genomes from Europe the period. able model individual-level using preceding provide high resolution. During first half millennium ce , observe at least two different streams Scandinavian-related expanding across western, central eastern Europe. By contrast, during second patterns suggest regional disappearance or substantial admixture these ancestries. In Scandinavia, document a major influx approximately 800 when large proportion Viking Age individuals carried groups related seen early Iron Age. Our findings higher-resolution lens for history.

Language: Английский

Citations

3

Estimating evolutionary and demographic parameters via ARG-derived IBD DOI Creative Commons
Zhendong Huang, Jerome Kelleher, Yao-ban Chan

et al.

PLoS Genetics, Journal Year: 2025, Volume and Issue: 21(1), P. e1011537 - e1011537

Published: Jan. 8, 2025

Inference of evolutionary and demographic parameters from a sample genome sequences often proceeds by first inferring identical-by-descent (IBD) segments. By exploiting efficient data encoding based on the ancestral recombination graph (ARG), we obtain three major advantages over current approaches: (i) no need to impose length threshold IBD segments, (ii) can be defined without hard-to-verify requirement recombination, (iii) computation time reduced with little loss statistical efficiency using only segments set sequence pairs that scales linearly size. We demonstrate powerful inferences when true information is available simulated data. For inferred real data, propose an approximate Bayesian inference algorithm use it show even poorly-inferred short improve estimation. Our mutation-rate estimator achieves precision similar previously-published method despite 4 000-fold reduction in used for inference, identify significant differences between human populations. Computational cost limits model complexity our approach, but are able incorporate unknown nuisance misspecification, still finding improved parameter inference.

Language: Английский

Citations

2

Simultaneous Inference of Past Demography and Selection from the Ancestral Recombination Graph under the Beta Coalescent DOI Creative Commons
Kevin Korfmann, Thibaut Sellinger, Fabian Freund

et al.

Peer Community Journal, Journal Year: 2024, Volume and Issue: 4

Published: March 18, 2024

The reproductive mechanism of a species is key driver genome evolution. standard Wright-Fisher model for the reproduction individuals in population assumes that each individual produces number offspring negligible compared to total size. Yet many plants, invertebrates, prokaryotes or fish exhibit neutrally skewed distribution strong selection events yielding few produce up same magnitude as As result, genealogy sample characterized by multiple (more than two) coalescing simultaneously common ancestor. current methods developed detect such merger do not account complex demographic scenarios recombination, and require large sizes. We tackle these limitations developing two novel different approaches infer from sequence data ancestral recombination graph (ARG): sequentially Markovian coalescent (SMβC) neural network (GNNcoal). first give proof accuracy our estimate parameter past history using simulated under β-coalescent model. Secondly, we show can also recover effect positive selective sweeps along genome. Finally, are able distinguish while inferring variation Our findings stress aptitude networks leverage information ARG inference but urgent need more accurate approaches.

Language: Английский

Citations

13

A genealogy-based approach for revealing ancestry-specific structures in admixed populations DOI Creative Commons

Ji Tang,

Charleston W. K. Chiang

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 14, 2025

Elucidating ancestry-specific structures in admixed populations is crucial for comprehending population history and mitigating confounding effects genome-wide association studies. Existing methods elucidating the generally rely on frequency-based estimates of genetic relationship matrix (GRM) among individuals after masking segments from ancestry components not being targeted investigation. However, these approaches disregard linkage information between markers, potentially limiting their resolution revealing structure within an component. We introduce expected GRM (as-eGRM), a novel framework relatedness individuals. The key design as-eGRM consists defining pairwise based genealogical trees encoded Ancestral Recombination Graph (ARG) local calls computing expectation across genome. Comprehensive evaluations using both simulated stepping-stone models empirical datasets three-way Latino cohorts showed that analysis robustly outperforms existing with diverse demographic histories. Taken together, has promise to better reveal fine-scale component individuals, which can help improve robustness interpretation findings studies disease or complex traits understudied populations.

Language: Английский

Citations

1

On ARGs, pedigrees, and genetic relatedness matrices DOI Creative Commons
Brieuc Lehmann, Hanbin Lee, Luke Anderson-Trocmé

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown

Published: March 5, 2025

Abstract Genetic relatedness is a central concept in genetics, underpinning studies of population and quantitative genetics human, animal, plant settings. It typically stored as genetic matrix (GRM), whose elements are pairwise values between individuals. This has been defined various contexts based on pedigree, genotype, phylogeny, coalescent times, and, recently, ancestral recombination graph (ARG). ARG-based GRMs have found to better capture the structure improve association relative genotype GRM. However, calculating further operations with them fundamentally challenging due inherent quadratic time space complexity. Here, we first discuss different definitions unifying context, making use additive model trait provide definition “branch relatedness” corresponding GRM”. We explore relationship branch pedigree through case study French-Canadian individuals that known pedigree. Through tree sequence encoding an ARG, then derive efficient algorithm for computing products GRM general vector, without explicitly forming leverages sparse genomes hence enables large-scale computations demonstrate power this by developing randomized principal components sequences easily scales millions genomes. All algorithms implemented open source tskit Python package. Taken together, work consolidates notions leveraging ARG it provides enable scale mega-scale genomic datasets.

Language: Английский

Citations

1

A whole-genome scan for evidence of recent positive and balancing selection in aye-ayes (Daubentonia madagascariensis) utilizing a well-fit evolutionary baseline model DOI Creative Commons
Vivak Soni, John W. Terbot, Cyril J. Versoza

et al.

Published: Nov. 11, 2024

The aye-aye (

Language: Английский

Citations

8

Inferring demographic and selective histories from population genomic data using a two-step approach in species with coding-sparse genomes: an application to human data DOI Creative Commons
Vivak Soni, Jeffrey D. Jensen

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Sept. 22, 2024

The demographic history of a population, and the distribution fitness effects (DFE) newly arising mutations in functional genomic regions, are fundamental factors dictating both genetic variation evolutionary trajectories. Although DFE inference has been performed extensively humans, these approaches have generally either limited to simple models involving single or, where complex population inferred, without accounting for potentially confounding selection at linked sites. Taking advantage coding-sparse nature genome, we propose 2-step approach which coalescent simulations first used infer multi-population model, utilizing large non-functional regions that likely free from background selection. We then use forward-in-time perform conditional on demography inferred expected estimation procedure. Throughout, recombination mutation rate maps were account underlying empirical heterogeneity across human genome. Importantly, within this framework it is possible utilize fit multiple aspects data, scheme represents generalized such large-scale species with genomes.

Language: Английский

Citations

6

A general and efficient representation of ancestral recombination graphs DOI Creative Commons
Yan Wong, Anastasia Ignatieva, Jere Koskela

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: Nov. 4, 2023

Abstract As a result of recombination, adjacent nucleotides can have different paths genetic inheritance and therefore the genealogical trees for sample DNA sequences vary along genome. The structure capturing details these intricately interwoven is referred to as an ancestral recombination graph (ARG). Classical formalisms focused on mapping coalescence events nodes in ARG. This approach out step with modern developments, which do not represent terms or explicitly infer them. We present simple formalism that defines ARG specific genomes their intervals inheritance, show how it generalises classical treatments encompasses outputs recent methods. discuss nuances arising from this more general structure, argue forms appropriate basis software standard rapidly growing field.

Language: Английский

Citations

15

Tree sequences as a general-purpose tool for population genetic inference DOI Creative Commons
Logan S. Whitehouse,

Dylan D. Ray,

Daniel R. Schrider

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Feb. 21, 2024

ABSTRACT As population genetics data increases in size new methods have been developed to store genetic information efficient ways, such as tree sequences. These structures are computationally and storage efficient, but not interchangeable with existing used for many inference methodologies the use of convolutional neural networks (CNNs) applied alignments. To better utilize these we propose implement a graph network (GCN) directly learn from sequence topology node data, allowing applications without an intermediate step converting sequences alignment format. We then compare our approach standard CNN approaches on set previously defined benchmarking tasks including recombination rate estimation, positive selection detection, introgression demographic model parameter inference. show that can be learned using GCN perform well common accuracies roughly matching or even exceeding CNN-based method. become more widely research foresee developments optimizations this work provide foundation moving forward.

Language: Английский

Citations

4