Deep Learning in Population Genetics DOI Creative Commons
Kevin Korfmann, Oscar E. Gaggiotti, Matteo Fumagalli

et al.

Genome Biology and Evolution, Journal Year: 2023, Volume and Issue: 15(2)

Published: Jan. 23, 2023

Abstract Population genetics is transitioning into a data-driven discipline thanks to the availability of large-scale genomic data and need study increasingly complex evolutionary scenarios. With likelihood Bayesian approaches becoming either intractable or computationally unfeasible, machine learning, in particular deep algorithms are emerging as popular techniques for population genetic inferences. These rely on that learn non-linear relationships between input model parameters being estimated through representation learning from training sets. Deep currently employed field comprise discriminative generative models with fully connected, convolutional, recurrent layers. Additionally, wide range powerful simulators generate under scenarios now available. The application empirical sets mostly replicates previous findings demography reconstruction signals natural selection organisms. To showcase feasibility tackle new challenges, we designed branched architecture detect recent balancing temporal haplotypic data, which exhibited good predictive performance simulated data. Investigations interpretability neural networks, their robustness uncertain creative will provide further opportunities technological advancements field.

Language: Английский

The complete sequence of a human genome DOI
Sergey Nurk, Sergey Koren, Arang Rhie

et al.

Science, Journal Year: 2022, Volume and Issue: 376(6588), P. 44 - 53

Published: March 31, 2022

Since its initial release in 2000, the human reference genome has covered only euchromatic fraction of genome, leaving important heterochromatic regions unfinished. Addressing remaining 8% Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion–base pair sequence T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors prior references, and introduces nearly 200 million base pairs containing 1956 gene predictions, 99 which are predicted to be protein coding. The completed include centromeric satellite arrays, recent segmental duplications, short arms five acrocentric chromosomes, unlocking these complex variational functional studies.

Language: Английский

Citations

2196

A draft human pangenome reference DOI Creative Commons
Wen‐Wei Liao, Mobin Asri, Jana Ebler

et al.

Nature, Journal Year: 2023, Volume and Issue: 617(7960), P. 312 - 324

Published: May 10, 2023

Abstract Here the Human Pangenome Reference Consortium presents a first draft of human pangenome reference. The contains 47 phased, diploid assemblies from cohort genetically diverse individuals 1 . These cover more than 99% expected sequence in each genome and are accurate at structural base pair levels. Based on alignments assemblies, we generate that captures known variants haplotypes reveals new alleles structurally complex loci. We also add 119 million pairs euchromatic polymorphic sequences 1,115 gene duplications relative to existing reference GRCh38. Roughly 90 additional derived variation. Using our analyse short-read data reduced small variant discovery errors by 34% increased number detected per haplotype 104% compared with GRCh38-based workflows, which enabled typing vast majority sample.

Language: Английский

Citations

589

Complete genomic and epigenetic maps of human centromeres DOI
Nicolas Altemose, Glennis A. Logsdon, Andrey V. Bzikadze

et al.

Science, Journal Year: 2022, Volume and Issue: 376(6588)

Published: March 31, 2022

Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric centromeric repeats, constitute 6.2% the (189.9 megabases). Detailed maps these regions revealed multimegabase structural rearrangements, including in active repeat arrays. Analysis centromere-associated uncovered strong relationship between position centromere evolution surrounding DNA through layered expansions. Furthermore, comparisons X centromeres across diverse panel individuals illuminated high degrees structural, epigenetic, sequence variation complex rapidly evolving regions.

Language: Английский

Citations

379

A complete reference genome improves analysis of human genetic variation DOI
Sergey Aganezov, Stephanie M. Yan, Daniela C. Soto

et al.

Science, Journal Year: 2022, Volume and Issue: 376(6588)

Published: March 31, 2022

Compared to its predecessors, the Telomere-to-Telomere CHM13 genome adds nearly 200 million base pairs of sequence, corrects thousands structural errors, and unlocks most complex regions human for clinical functional study. We show how this reference universally improves read mapping variant calling 3202 17 globally diverse samples sequenced with short long reads, respectively. identify hundreds variants per sample in previously unresolved regions, showcasing promise T2T-CHM13 evolutionary biomedical discovery. Simultaneously, eliminates tens spurious sample, including reduction false positives 269 medically relevant genes by up a factor 12. Because these improvements discovery coupled population genomic resources, is positioned replace GRCh38 as prevailing genetics.

Language: Английский

Citations

275

The complete sequence of a human Y chromosome DOI
Arang Rhie, Sergey Nurk, Monika Čechová

et al.

Nature, Journal Year: 2023, Volume and Issue: 621(7978), P. 344 - 354

Published: Aug. 23, 2023

Language: Английский

Citations

238

A cross-disorder dosage sensitivity map of the human genome DOI Creative Commons
Ryan L. Collins, Joseph Glessner, Eleonora Porcu

et al.

Cell, Journal Year: 2022, Volume and Issue: 185(16), P. 3041 - 3055.e25

Published: Aug. 1, 2022

Language: Английский

Citations

228

Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes DOI Creative Commons
Jana Ebler, Peter Ebert, Wayne E. Clarke

et al.

Nature Genetics, Journal Year: 2022, Volume and Issue: 54(4), P. 518 - 525

Published: April 1, 2022

Abstract Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability characterize repetitive genomic regions, which are particularly challenging for fast k -mer-based genotypers. In present study, we propose new algorithm, PanGenie, that leverages haplotype-resolved pangenome together -mer counts from sequencing data genotype wide spectrum of variation—a process refer as inference. Compared mapping-based approaches, PanGenie is more than 4 times faster at 30-fold coverage achieves better concordances almost all variant types coverages tested. Improvements especially pronounced large insertions (≥50 bp) variants in enabling inclusion these classes genome-wide association studies. efficiently increasing amount assemblies unravel functional impact previously inaccessible while being compared alignment-based workflows.

Language: Английский

Citations

165

Genomic architecture of autism from comprehensive whole-genome sequence annotation DOI Creative Commons
Brett Trost, Bhooma Thiruvahindrapuram,

Ada J. S. Chan

et al.

Cell, Journal Year: 2022, Volume and Issue: 185(23), P. 4409 - 4427.e18

Published: Nov. 1, 2022

Language: Английский

Citations

157

Protein-coding repeat polymorphisms strongly shape diverse human phenotypes DOI
Ronen E. Mukamel, Robert E. Handsaker, Maxwell A. Sherman

et al.

Science, Journal Year: 2021, Volume and Issue: 373(6562), P. 1499 - 1505

Published: Sept. 23, 2021

Repeats associated with phenotype The degree to which repeated sequences within a genome affect human phenotypes has been difficult establish. Mukamel et al . examined thousands of genomes in the UK Biobank and found that some largest effects common genetic variants on phenotypes, including those clinical relevance, arise from protein-coding repeat polymorphisms (see Perspective by Gymrek Goren). Mapping size copy number these protein domains links variation lipoprotein(a) concentration, height, male pattern balding. Furthermore, alleles frequencies differ between individuals African European descent, resulting differences populations relevance for traits levels, risk factor coronary artery disease. —LMZ

Language: Английский

Citations

136

Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders DOI Creative Commons
David Porubský,

Wolfram Höps,

Hufsah Ashraf

et al.

Cell, Journal Year: 2022, Volume and Issue: 185(11), P. 1986 - 2005.e26

Published: May 1, 2022

Unlike copy number variants (CNVs), inversions remain an underexplored genetic variation class. By integrating multiple genomic technologies, we discover 729 in 41 human genomes. Approximately 85% of <2 kbp form by twin-priming during L1 retrotransposition; 80% the larger are balanced and affect twice as many nucleotides CNVs. Balanced show excess common variants, 72% flanked segmental duplications (SDs) or retrotransposons. Since flanking repeats promote non-allelic homologous recombination, developed complementary approaches to identify recurrent inversion formation. We describe 40 encompassing 0.6% genome, showing rates up 2.7 × 10−4 per locus generation. Recurrent exhibit a sex-chromosomal bias co-localize with disorder critical regions. propose that recurrence results elevated heterozygous carriers structural SD diversity, which increases mutability population predisposes specific haplotypes disease-causing

Language: Английский

Citations

113