A de novo evolved gene contributes to rice grain shape difference between indica and japonica DOI Creative Commons
Rujia Chen, Ning Xiao, Yue Lu

et al.

Nature Communications, Journal Year: 2023, Volume and Issue: 14(1)

Published: Sept. 22, 2023

The role of de novo evolved genes from non-coding sequences in regulating morphological differentiation between species/subspecies remains largely unknown. Here, we show that a rice gene GSE9 contributes to grain shape difference indica/xian and japonica/geng varieties. evolves previous region wild Oryza rufipogon through the acquisition start codon. This is inherited by most japonica varieties, while original sequence (absence codon, gse9) present majority indica Knockout varieties leads slender grains, whereas introgression background results round grains. Population evolutionary analyses reveal gse9 are derived Or-I Or-III groups, respectively. Our findings uncover genetic divergence subspecies, provide target for precise manipulation shape.

Language: Английский

NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads DOI Creative Commons
Jiang Hu, Zhuo Wang, Zongyi Sun

et al.

Genome biology, Journal Year: 2024, Volume and Issue: 25(1)

Published: April 26, 2024

Long-read sequencing data, particularly those derived from the Oxford Nanopore platform, tend to exhibit high error rates. Here, we present NextDenovo, an efficient correction and assembly tool for noisy long reads, which achieves a level of accuracy in genome assembly. We apply NextDenovo assemble 35 diverse human genomes around world using long-read data. These allow us identify landscape segmental duplication gene copy number variation modern populations. The use should pave way population-scale

Language: Английский

Citations

148

An efficient error correction and accurate assembly tool for noisy long reads DOI Creative Commons
Jiang Hu, Zhuo Wang, Zongyi Sun

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2023, Volume and Issue: unknown

Published: March 12, 2023

Abstract Long read sequencing data, particularly those derived from the Oxford Nanopore (ONT) platform, tend to exhibit a high error rate. Here, we present NextDenovo, highly efficient correction and assembly tool for noisy long reads, which achieves level of accuracy in genome assembly. NextDenovo can rapidly correct reads; these corrected reads contain fewer errors than other comparable tools are characterized by chimeric alignments. We applied quality reference genomes 35 diverse humans across world using ONT data. Based on de novo assemblies, were able identify landscape segmental duplications gene copy number variation modern human population. The use program should pave way population-scale long-read assembly, thereby facilitating construction pan-genomes,

Language: Английский

Citations

103

A complete assembly of the rice Nipponbare reference genome DOI Creative Commons

Lianguang Shang,

Wenchuang He, Tianyi Wang

et al.

Molecular Plant, Journal Year: 2023, Volume and Issue: 16(8), P. 1232 - 1236

Published: Aug. 1, 2023

In 2005, the current commonly used rice reference genome (Oryza sativa ssp. japonica cv. Nipponbare) was initially released by International Rice Genome Sequencing Project (International Project, 2005International ProjectThe map-based sequence of genome.Nature. 2005; 436: 793-800https://doi.org/10.1038/nature03895Crossref PubMed Scopus (3053) Google Scholar). Thereafter, further updated in 2013 with improved assembly (IRGSP-1.0) and gene annotations (MSU7, RAP-DB) (Kawahara et al., 2013Kawahara Y. de la Bastide M. Hamilton J.P. Kanamori H. McCombie W.R. Ouyang S. Schwartz D.C. Tanaka T. Wu J. Zhou al.Improvement Oryza Nipponbare using next generation optical map data.Rice. 2013; 6: 4https://doi.org/10.1186/1939-8433-6-4Crossref (1108) Scholar; Sakai 2013Sakai Lee S.S. Numa Kim Kawahara Wakimoto Yang C.C. Iwamoto Abe al.Rice Annotation Database (RAP-DB): an integrative interactive database for genomics.Plant Cell Physiol. 54: e6https://doi.org/10.1093/pcp/pcs183Crossref (489) past 10 years, this has been serving as one most important genetic resources subsequent functional genomics efforts. As several genomes had assembled into gapless chromosomes only 2–5 telomeres absent (Li 2021Li K. Jiang W. Hui Kong Feng L.Y. Gao L.Z. Li P. Lu Gapless indica reveals synergistic contributions active transposable elements segmental duplications to evolution.Mol. Plant. 2021; 14: 1745-1756https://doi.org/10.1016/j.molp.2021.06.017Abstract Full Text PDF (31) Song 2021Song J.M. Xie W.Z. Wang Guo Y.X. Koo D.H. Kudrna D. Gong C. Huang J.W. Zhang al.Two gap-free a global view centromere architecture rice.Mol. 1757-1767https://doi.org/10.1016/j.molp.2021.06.018Abstract (77) 2022Zhang Fu Han X. Yan Su Lin Z. Qin al.The telomere-to-telomere four parents SV PAV patterns hybrid breeding.Plant Biotechnol. 2022; 20: 1642-1644https://doi.org/10.1111/pbi.13880Crossref (13) Scholar), IRGSP-1.0 its still performed widely reference. However, limitations sequencing technology intricate genomic organization led under-representation complex regions reference, leaving total 72 major gaps (including 19 telomeres), 167 minor gaps, 779 unknown bases estimated length ∼3% unsolved. To pursue complete foundational genome, we applied strategy that integrated Pacbio HiFi Oxford Nanopore Technology (ONT) ultra-long reads generate original contigs, which were then scaffolded onto chromosome-level support Hi-C dataset. Gap filling terminal extension conducted resolve remaining seven telomere region within scaffolds. All gap-closure supported uniform coverage ONT (Supplemental Figure 1). A large rDNA array identified beside short arm chromosome 9 nearly identical repeats 45S 2), artificially filled consecutive blocks reflecting their copy number (see supplemental materials methods). This captured 93.8% 93.9% containing full-length mapping, but should be treated model sequences. Following polishing employing Illumina PE (next-generation [NGS]) reads, produced T2T-NIP (version AGIS-1.0), all 12 24 resolved (Figure 1A). Multiple strategies evaluate accuracy completeness T2T-NIP. available primary data—including HiFi, ONT, NGS, Hi-C—were remapped high mapping rates >99.6% datasets except (93.1%). displayed across whole dataset because centromeres near two 1B). Chromatin immunoprecipitation (ChIP-seq) CENH3 antibody identify location 1A, Supplemental Table 1, 3). CentO-enriched also homology 155- 165-bp CentO satellite 1A 1), eight showed similar or consistent size previous report determined fluorescence situ hybridization (Cheng 2002Cheng Dong F. Langdon Buell C.R. Gu Blattner F.R. Functional are marked repeat centromere-specific retrotransposon.Plant Cell. 2002; 1691-1704https://doi.org/10.1105/tpc.003079Crossref (321) The consensus approximately error per 5 million (Q63), much higher 2). For content assessment, 99.88% BUSCO 1614 set 3), equal than previously reported 1747 ribosomal RNA (rRNA) genes T2T-NIP, whereas hundred IRGSP-1.0. 57 359 protein-coding 325 794 (51.1%) identified, both represent more Tables 4 5). array, 1022 annotated transcriptome data 6). Among 314 gap-filling excluding 142 confirmed expressed tissue-specific 4). With achieved 385.7 base pairs (Mbp), including abundant improvements compared prior 4–6). Compared IRGSP-1.0, contains 12.5 Mbp newly sequence, arrays (33.2%), pericentromeric centromeric (32.1%), (27.1%), subtelomeric (5.1%), necessary fundamental cellular processes 1C–1E). Some largest covered nine chromosomes, telomeric repetitive three represented unresolved sequences 7). addition these apparent other gap found artificial otherwise incorrect 8). We investigated possible 500 kb flanking adjacent far from (39/44) excellent synteny while almost close (11/12) contained additional extensive structural differences (e.g., deletions inversions lengths >20 kb) 1D). Additionally, could well resulting continuous 100–117 1D These results demonstrated significant update resolving misassembled structures probably caused removes long-standing barrier hidden 3% sequence-based analysis, regions. Therefore, it is describe initial analysis truly discuss potential applications. have rich collection omics models transposon (TEs), sequencing, methylation datasets, presented online (http://www.ricesuperpir.com/web/nip). highlight utility resources, demonstrate examples duplicated 11 associated gaps. AGIS_Os10g035850 (denoted LOC_Os10g43075 IRGSP-1.0/MSU7) traversed boundary at 10, incomplete annotation 76.3% entire some misannotated exons version. thus correction model, six new each splicing alternatives Most TE-related multiple copies (paralogs) sequences, always complicated analysis. When NGS absence paralogs causes incorrectly align LOC_Os11g12240 (AGIS_Os11g010790), many false-positive variants 1F). mapped show expected typical heterozygous variation pattern small region. Any paralogs, others like them, will overlooked when thereby promoting importance release investigate how affects short-read variant calling, collected 230 cultivated sativa) wild rufipogon) accessions our study (Shang 2022Shang L. He Yuan Q. Wei Hu Zhao al.A super pan-genomic landscape rice.Cell Res. 32: 878-896https://doi.org/10.1038/s41422-022-00685-zCrossref (39) consisted populations: Xian/indica (XI), Geng/japonica (GJ), Aus (cA). same pipeline calling based on eliminate interferences software parameters. On average, BWA-MEM 1.04 × 107 (6.9%) properly paired Interestingly, even though per-read mismatch rate 1.2%–8.2% lower populations 1G). Similarly, characteristics such reducing misoriented read 1H) improving uniformity 1I) Within regions, noted decrease 2.0%–4.3% standard deviation analogous among population groups 1I). From alignments, 741 895 221 high-quality single-nucleotide indel relative (per-sample mean, 3 225 631) 744 667 800 237 686), observing shared called individual 6 9). Along improvement rate, attribute reduction per-sample calls errors, especially resolution correct conclusion observation sample decreased largely homozygous slight increase GJ superiority accurate reads. Next, effects (SV) published long Alignment reduced observed 1J) 1K) populations. corrected errors facilitated alignment, what S10). results, (from −16.3% −4.6%) SVs different against instead Similar variations above, those 7), likely due rare supplement phenotype genome-wide association studies (GWASs) assess efficiency 101 SNPs five agronomic traits, detected example, pleiotropic locus related yield plant 1 (qYPP1) significantly grain height not 1L–1M Gene-editing experiments screening revealed between plants type function-loss mutation encoding subunit ADP-glucose pyrophosphorylase, OsAGPL2 1N favorable haplotype showing (44.7 ± 11.8 g) haplotypes 1O). T2T-NIP-specific width enhanced mining summary, assembly, addressing missing information, represents resource. introduced ∼12.5 1324 predictions, include arrays, subtelomeres, unlocking variational studies. raw deposited National Center Biotechnology Information under project accession PRJNA953663 Genomics Data PRJCA018610. browser can easily accessed website research Natural Science Foundation China (32188102, 32101718), Guangdong Basic Applied Research (2023B1515020053), Youth Innovation Chinese Academy Agricultural Sciences (Y20230C36), specific fund Platform Academicians Hainan Province (YSPTZX202303).

Language: Английский

Citations

88

Integrated Genomic Selection for Accelerating Breeding Programs of Climate-Smart Cereals DOI Open Access
Dwaipayan Sinha, Arun Kumar Maurya, Gholamreza Abdi

et al.

Genes, Journal Year: 2023, Volume and Issue: 14(7), P. 1484 - 1484

Published: July 21, 2023

Rapidly rising population and climate changes are two critical issues that require immediate action to achieve sustainable development goals. The is posing increased demand for food, thereby pushing an acceleration in agricultural production. Furthermore, anthropogenic activities have resulted environmental pollution such as water soil degradation well alterations the composition concentration of gases. These affecting not only biodiversity loss but also physio-biochemical processes crop plants, resulting a stress-induced decline yield. To overcome problems ensure supply food material, consistent efforts being made develop strategies techniques increase yield enhance tolerance toward climate-induced stress. Plant breeding evolved after domestication initially remained dependent on phenotype-based selection improvement. But it has grown through cytological biochemical methods, newer contemporary methods based DNA-marker-based help agronomically useful traits. now supported by high-end molecular biology tools like PCR, high-throughput genotyping phenotyping, data from morpho-physiology, statistical tools, bioinformatics, machine learning. After establishing its worth animal breeding, genomic (GS), improved variant marker-assisted (MAS), way into crop-breeding programs powerful tool. novel innovative marker-based models genetic evaluation, GS makes use markers. can amend complex traits shorten period, making advantageous over pedigree (MAS). It reduces time resources required plant while allowing gain attributes. been taken new heights integrating advanced technologies speed learning, environmental/weather further harness potential, approach known integrated (IGS). This review highlights IGS strategies, procedures, approaches, associated emerging issues, with special emphasis cereal crops. In this domain, highlight potential cutting-edge innovation climate-smart crops endure abiotic stresses motive keeping production quality at par global demand.

Language: Английский

Citations

56

A pangenome analysis pipeline provides insights into functional gene identification in rice DOI Creative Commons
Jian Wang, Yang Wu, Shaohong Zhang

et al.

Genome biology, Journal Year: 2023, Volume and Issue: 24(1)

Published: Jan. 26, 2023

Abstract Background A pangenome aims to capture the complete genetic diversity within a species and reduce bias in analysis inherent using single reference genome. However, current linear format of most plant pangenomes limits presentation position information for novel sequences. Graph have been developed overcome this limitation. bioinformatics tools graph genomes are lacking. Results To problem, we develop strategy construction downstream pipeline (PSVCP) that captures variants’ while maintaining linearized layout. Using PSVCP, construct high-quality rice 12 representative analyze an international panel with 413 diverse accessions as reference. We show PSVCP successfully identifies causal structural variations grain weight height. Our results provide insights into population structure genomic diversity. characterize new locus ( qPH8-1 ) associated height on chromosome 8 undetected by SNP-based genome-wide association study (GWAS). Conclusions demonstrate constructed our combined presence absence variation-based GWAS can additional power analysis. The genome sequence variants data valuable resources genomics research improvement future.

Language: Английский

Citations

53

Structural variation (SV)-based pan-genome and GWAS reveal the impacts of SVs on the speciation and diversification of allotetraploid cottons DOI Creative Commons
Shangkun Jin, Zegang Han, Yan Hu

et al.

Molecular Plant, Journal Year: 2023, Volume and Issue: 16(4), P. 678 - 693

Published: Feb. 9, 2023

Structural variations (SVs) have long been described as being involved in the origin, adaption, and domestication of species. However, underlying genetic genomic mechanisms are poorly understood. Here, we report a high-quality genome assembly Gossypium barbadense acc. Tanguis, landrace that is closely related to formation extra-long-staple (ELS) cultivated cotton. An SV-based pan-genome (Pan-SV) was then constructed using total 182 593 non-redundant SVs, including 2236 inversions, 97 398 insertions, 82 959 deletions from 11 assembled genomes allopolyploid The utility this Pan-SV demonstrated through population structure analysis genome-wide association studies (GWASs). Using segregation mapping populations produced crossing ELS cotton along with an GWAS, certain SVs responsible for speciation, domestication, improvement tetraploid cottons were identified. Importantly, some presently identified associated yield fiber quality had not previous SNP-based GWAS. In particular, 9-bp insertion or deletion found associate elimination interspecific reproductive isolation between hirsutum G. barbadense. Collectively, study provides new insights into genome-wide, gene-scale linked important agronomic traits major crop species highlights importance during

Language: Английский

Citations

47

A review of the pangenome: how it affects our understanding of genomic variation, selection and breeding in domestic animals? DOI Creative Commons
Ying Gong, Yefang Li, Xuexue Liu

et al.

Journal of Animal Science and Biotechnology/Journal of animal science and biotechnology, Journal Year: 2023, Volume and Issue: 14(1)

Published: May 5, 2023

Abstract As large-scale genomic studies have progressed, it has been revealed that a single reference genome pattern cannot represent genetic diversity at the species level. While domestic animals tend to complex routes of origin and migration, suggesting possible omission some population-specific sequences in current genome. Conversely, pangenome is collection all DNA contains shared by individuals (core genome) also able display sequence information unique each individual (variable genome). The progress research humans, plants proved missing components identification large structural variants (SVs) can be explored through pangenomic studies. Many specific shown related biological adaptability, phenotype important economic traits. maturity technologies methods such as third-generation sequencing, Telomere-to-telomere genomes, graphic reference-free assembly will further promote development pangenome. In future, combined with long-read data multi-omics help resolve SVs their relationship main traits interest domesticated animals, providing better insights into animal domestication, evolution breeding. this review, we mainly discuss how analysis reveals variations (sheep, cattle, pigs, chickens) impacts on phenotypes contribute understanding diversity. Additionally, go potential issues future perspectives livestock poultry.

Language: Английский

Citations

45

Pan-genome inversion index reveals evolutionary insights into the subpopulation structure of Asian rice DOI Creative Commons
Yong Zhou, Zhichao Yu, Dmytro Chebotarov

et al.

Nature Communications, Journal Year: 2023, Volume and Issue: 14(1)

Published: March 21, 2023

Understanding and exploiting genetic diversity is a key factor for the productive stable production of rice. Here, we utilize 73 high-quality genomes that encompass subpopulation structure Asian rice (Oryza sativa), plus two wild relatives (O. rufipogon O. punctata), to build pan-genome inversion index 1769 non-redundant inversions span an average ~29% sativa cv. Nipponbare reference genome sequence. Using this index, estimate rate ~700 per million years in rice, which 16 50 times higher than previously estimated plants. Detailed analyses these show evidence their effects on gene expression, recombination rate, linkage disequilibrium. Our study uncovers prevalence scale large (≥100 bp) across hints at largely unexplored role functional biology crop performance.

Language: Английский

Citations

43

A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range DOI Creative Commons
Qichao Lian, Bruno Hüettel,

Birgit Walkemeier

et al.

Nature Genetics, Journal Year: 2024, Volume and Issue: 56(5), P. 982 - 991

Published: April 11, 2024

Abstract Although originally primarily a system for functional biology, Arabidopsis thaliana has, owing to its broad geographical distribution and adaptation diverse environments, developed into powerful model in population genomics. Here we present chromosome-level genome assemblies of 69 accessions from global species range. We found that genomic colinearity is very conserved, even among geographically genetically distant accessions. Along chromosome arms, megabase-scale rearrangements are rare typically only single accession. This indicates the karyotype quasi-fixed arms counter-selected. Centromeric regions display higher structural dynamics, divergences core centromeres account most size variations. Pan-genome analyses uncovered 32,986 distinct gene families, 60% being all 40% appearing be dispensable, including 18% private accession, indicating unexplored genic diversity. These new will empower future genetic research.

Language: Английский

Citations

42

Technology-enabled great leap in deciphering plant genomes DOI
Lingjuan Xie, Xiaojiao Gong, Kun Yang

et al.

Nature Plants, Journal Year: 2024, Volume and Issue: 10(4), P. 551 - 566

Published: March 20, 2024

Language: Английский

Citations

32