Memoization on Shared Subtrees Accelerates Computations on Genealogical Forests DOI Creative Commons
Lukas Hübner, Alexandros Stamatakis

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Май 27, 2024

Abstract The field of population genetics attempts to advance our understanding evolutionary processes. It has applications, for example, in medical research, wildlife conservation, and – conjunction with recent advances ancient DNA sequencing technology studying human migration patterns over the past few thousand years. basic toolbox includes genealogical tress, which describe shared history among individuals same species. They are calculated on basis genetic variations. However, recombining organisms, a single tree is insufficient whole genome. Instead, collection correlated trees can be used, where each describes consecutive region current corresponding state of-the-art data structure, sequences, compresses these via edit operations when moving from one next along genome instead storing full, often redundant, description tree. We propose new forests, set into DAG. In this DAG identical subtrees that across input encoded only once, thereby allowing straight-forward memoization intermediate results. Additionally, we provide C++ implementation proposed called gfkit , 2.1 11.2 (median 4.0) times faster than state-of-the-art tool empirical simulated datasets at computing important statistics such as Allele Frequency Spectrum, Patterson’s f Fixation Index, Tajima’s D pairwise Lowest Common Ancestors, others. On Ancestor queries more two samples input, scales asymptotically better state-of-the-art, thus up 990 faster. conclusion, structure by enabling results, yielding substantial runtime reduction potentially intuitive representation state-of-the-art. Our improvements will boost development novel analyses models increases scalability ever-growing genomic datasets. 2012 ACM Subject Classification Applied → Computational genomics; Molecular sequence analysis; Bioinformatics; Population

Язык: Английский

The era of the ARG: An introduction to ancestral recombination graphs and their significance in empirical evolutionary genomics DOI Creative Commons
Alexander L. Lewanski, Michael C. Gründler, Gideon S. Bradburd

и другие.

PLoS Genetics, Год журнала: 2024, Номер 20(1), С. e1011110 - e1011110

Опубликована: Янв. 18, 2024

In the presence of recombination, evolutionary relationships between a set sampled genomes cannot be described by single genealogical tree. Instead, are related complex, interwoven collection genealogies formalized in structure called an ancestral recombination graph (ARG). An ARG extensively encodes ancestry genome(s) and thus is replete with valuable information for addressing diverse questions biology. Despite its potential utility, technological methodological limitations, along lack approachable literature, have severely restricted awareness application ARGs evolution research. Excitingly, recent progress reconstruction simulation made ARG-based approaches feasible many systems. this review, we provide accessible introduction exploration ARGs, survey breakthroughs, describe to further existing goals open avenues inquiry that were previously inaccessible genomics. Through discussion, aim more widely disseminate promise genomics encourage broader development adoption inference.

Язык: Английский

Процитировано

35

Simultaneous Inference of Past Demography and Selection from the Ancestral Recombination Graph under the Beta Coalescent DOI Creative Commons
Kevin Korfmann, Thibaut Sellinger, Fabian Freund

и другие.

Peer Community Journal, Год журнала: 2024, Номер 4

Опубликована: Март 18, 2024

The reproductive mechanism of a species is key driver genome evolution. standard Wright-Fisher model for the reproduction individuals in population assumes that each individual produces number offspring negligible compared to total size. Yet many plants, invertebrates, prokaryotes or fish exhibit neutrally skewed distribution strong selection events yielding few produce up same magnitude as As result, genealogy sample characterized by multiple (more than two) coalescing simultaneously common ancestor. current methods developed detect such merger do not account complex demographic scenarios recombination, and require large sizes. We tackle these limitations developing two novel different approaches infer from sequence data ancestral recombination graph (ARG): sequentially Markovian coalescent (SMβC) neural network (GNNcoal). first give proof accuracy our estimate parameter past history using simulated under β-coalescent model. Secondly, we show can also recover effect positive selective sweeps along genome. Finally, are able distinguish while inferring variation Our findings stress aptitude networks leverage information ARG inference but urgent need more accurate approaches.

Язык: Английский

Процитировано

11

A geographic history of human genetic ancestry DOI Creative Commons
Michael C. Gründler, Jonathan Terhorst, Gideon S. Bradburd

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Март 29, 2024

Describing the distribution of genetic variation across individuals is a fundamental goal population genetics. In humans, traditional approaches for describing often rely on discrete ancestry labels, which, despite their utility, can obscure complex, multi-faceted nature human history. These labels risk oversimplifying by ignoring its temporal depth and geographic continuity, may therefore conflate notions race, ethnicity, geography, ancestry. Here, we present method that capitalizes rich genealogical information encoded in genomic tree sequences to infer locations shared ancestors sample sequenced individuals. We use this history set genomes sampled from Europe, Asia, Africa, accurately recovering major movements those continents. Our findings demonstrate importance defining spatial-temporal context caution against oversimplified interpretations data prevalent contemporary discussions race

Язык: Английский

Процитировано

9

A General Framework for Branch Length Estimation in Ancestral Recombination Graphs DOI Creative Commons
Yun Deng, Yun S. Song, Rasmus Nielsen

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2025, Номер unknown

Опубликована: Фев. 15, 2025

Inference of Ancestral Recombination Graphs (ARGs) is central interest in the analysis genomic variation. ARGs can be specified terms topologies and coalescence times. The times are usually estimated using an informative prior derived from coalescent theory, but this may generate biased estimates also complicate downstream inferences based on ARGs. Here we introduce, POLEGON, a novel approach for estimating branch lengths which uses uninformative prior. Using extensive simulations, show that method provides improved lead to more accurate effective population sizes under wide range demographic assumptions. It improves other including mutation rates. We apply data 1000 Genomes Project investigate size histories differential signatures across populations. estimate HLA region, they exceed 30 million years multiple segments.

Язык: Английский

Процитировано

1

Inferring the geographic history of recombinant lineages using the full ancestral recombination graph DOI
Puneeth Deraje, James Kitchens, Graham Coop

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Апрель 14, 2024

Abstract Spatial patterns of genetic relatedness among samples reflect the past movements their ancestors. Our ability to untangle this history has potential improve dramatically given that we can now infer ultimate description relatedness, ancestral recombination graph (ARG). By extending spatial theory previously applied trees, generalize common model Brownian motion full ARGs, thereby accounting for correlations in trees along a chromosome while efficiently computing likelihood-based estimates dispersal rate and ancestor locations, with associated uncertainties. We evaluate model’s reconstruct histories using individual-based simulations unfortunately find clear bias locations. investigate causes bias, pinpointing discrepancy between true process at events. This highlights key hurdle ubiquitous analytically-tractable from which otherwise provide an efficient method inference, uncertainties, all information available ARG.

Язык: Английский

Процитировано

6

Tree sequences as a general-purpose tool for population genetic inference DOI Creative Commons
Logan S. Whitehouse,

Dylan D. Ray,

Daniel R. Schrider

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Фев. 21, 2024

ABSTRACT As population genetics data increases in size new methods have been developed to store genetic information efficient ways, such as tree sequences. These structures are computationally and storage efficient, but not interchangeable with existing used for many inference methodologies the use of convolutional neural networks (CNNs) applied alignments. To better utilize these we propose implement a graph network (GCN) directly learn from sequence topology node data, allowing applications without an intermediate step converting sequences alignment format. We then compare our approach standard CNN approaches on set previously defined benchmarking tasks including recombination rate estimation, positive selection detection, introgression demographic model parameter inference. show that can be learned using GCN perform well common accuracies roughly matching or even exceeding CNN-based method. become more widely research foresee developments optimizations this work provide foundation moving forward.

Язык: Английский

Процитировано

4

Towards Pandemic-Scale Ancestral Recombination Graphs of SARS-CoV-2 DOI Creative Commons
Shing H. Zhan, Anastasia Ignatieva, Yan Wong

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2023, Номер unknown

Опубликована: Июнь 8, 2023

Abstract Recombination is an ongoing and increasingly important feature of circulating lineages SARS-CoV-2, challenging how we represent the evolutionary history this virus giving rise to new variants potential public health concern by combining transmission immune evasion properties different lineages. Detection recombinant strains challenging, with most methods looking for breaks between sets mutations that characterise distinct In addition, many basic approaches fundamental study viral evolution assume recombination negligible, in a single phylogenetic tree can genetic ancestry strains. Here present initial version sc2ts, method automatically detect recombinants real time cohesively integrate them into genealogy form ancestral graph (ARG), which jointly records mutation, inheritance. We infer two ARGs under sampling strategies, their properties. One contains 1.27 million sequences sampled up June 30, 2021, second more sparsely sampled, consisting 657K 2022. find both are highly consistent known features SARS-CoV-2 evolution, recovering backbone phylogeny, mutational spectra, recapitulating details on majority Using well-established feature-rich tskit library, also be stored concisely processed efficiently using standard Python tools. For example, ARG sequences—encoding inferred reticulate ancestry, variation, extensive metadata—requires 58MB storage, loads less than second. The ability fully effects downstream analyses, quickly recombinants, utilise efficient convenient platform computation based well-engineered technologies makes sc2ts promising approach.

Язык: Английский

Процитировано

9

Estimating evolutionary and demographic parameters via ARG-derived IBD DOI Creative Commons
Zhen‐Dong Huang, Jerome Kelleher, Yao-ban Chan

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Март 12, 2024

Inference of demographic and evolutionary parameters from a sample genome sequences often proceeds by first inferring identical-by-descent (IBD) segments. By exploiting efficient data encoding based on the ancestral recombination graph (ARG), we obtain three major advantages over current approaches: (i) no need to impose length threshold IBD segments, (ii) can be defined without hard-to-verify requirement recombination, (iii) computation time reduced with little loss statistical efficiency using only segments set sequence pairs that scales linearly size. We demonstrate powerful inferences when true information is available simulated data. For inferred real data, propose an approximate Bayesian inference algorithm use it show poorly-inferred short improve estimation precision. precision similar previously-published estimator despite 4 000-fold reduction in used for inference. Computational cost limits model complexity our approach, but are able incorporate unknown nuisance misspecification, still finding improved parameter

Язык: Английский

Процитировано

3

tstrait: a quantitative trait simulator for ancestral recombination graphs DOI Creative Commons
Daiki Tagami, Gertjan Bisschop, Jerome Kelleher

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Март 14, 2024

Abstract Summary Ancestral recombination graphs (ARGs) encode the ensemble of correlated genealogical trees arising from in a compact and efficient structure, are fundamental importance population statistical genetics. Recent breakthroughs have made it possible to simulate infer ARGs at biobank scale, there is now intense interest using ARG-based methods across broad range applications, particularly genome-wide association studies (GWAS). Sophisticated exist genetics models, but currently no software quantitative traits directly these ARGs. To apply existing trait simulators users must export genotype data, losing important information about ancestral processes producing prohibitively large files when applied biobank-scale datasets GWAS. We present tstrait , an open-source Python library on ARGs, show how this user-friendly can quickly phenotypes for laptop computer. Availability Implementation available download Package Index. Full documentation with examples workflow templates https://tskit.dev/tstrait/docs/ development version maintained GitHub ( https://github.com/tskit-dev/tstrait ). Contact [email protected]

Язык: Английский

Процитировано

3

The length of haplotype blocks and signals of structural variation in reconstructed genealogies DOI Creative Commons
Anastasia Ignatieva, Martina Favero, Jere Koskela

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2023, Номер unknown

Опубликована: Июль 11, 2023

Abstract Recent breakthroughs have enabled the inference of genealogies from large sequencing data-sets, accurately reconstructing local trees that describe genetic ancestry at each locus. These should also capture correlation structure along genome, reflecting historical recombination events and factors like demography natural selection. However, whether reconstructed do this has not been rigorously explored. This is important to address, since uncovering regions depart expectations can drive discovery new biological phenomena. Addressing crucial, as deviate reveal phenomena, such suppression allowing linked selection over broad regions, evidenced in humans adaptive introgression various species. We use a theoretical framework characterise properties genealogies, distribution genomic spans clades edges, demonstrate our results match observations simulated scenarios. Testing using leading approaches, we find departures for all methods. method Relate, set simple corrections almost complete recovery target distributions. Applying these Relate 2504 human genomes, observe an excess with unexpectedly long (125 p < 1 · 10 − 12 clustering into 50 regions), indicating localised recombination. The strongest signal corresponds known inversion on chromosome 17, while second represents previously unknown 10, which most common (21%) S. Asians correlates GWAS hits range phenotypes including immunological traits. Other signals suggest additional inversions (4), copy number changes (2), complex rearrangements or other variants (12), well 28 strong support but no clear classification. Our approach be readily applied species, show offer untapped potential study structural variation its impacts population level, revealing phenomena impacting evolution.

Язык: Английский

Процитировано

8