FGeneBERT: function-driven pre-trained gene language model for metagenomics DOI Creative Commons

Chenrui Duan,

Zelin Zang, Yongjie Xu

et al.

Briefings in Bioinformatics, Journal Year: 2025, Volume and Issue: 26(2)

Published: March 1, 2025

Metagenomic data, comprising mixed multi-species genomes, are prevalent in diverse environments like oceans and soils, significantly impacting human health ecological functions. However, current research relies on K-mer, which limits the capture of structurally functionally relevant gene contexts. Moreover, these approaches struggle with encoding biologically meaningful genes fail to address one-to-many many-to-one relationships inherent metagenomic data. To overcome challenges, we introduce FGeneBERT, a novel pre-trained model that employs protein-based representation as context-aware structure-relevant tokenizer. FGeneBERT incorporates masked modeling enhance understanding inter-gene contextual triplet enhanced contrastive learning elucidate sequence-function relationships. Pre-trained over 100 million sequences, demonstrates superior performance datasets at four levels, spanning gene, functional, bacterial, environmental levels ranging from 1 213 k input sequences. Case studies ATP synthase operons highlight FGeneBERT's capability for functional recognition its biological relevance research.

Language: Английский

Chromosome-scale genome assembly reveals how repeat elements shape non-coding RNA landscapes active during newt limb regeneration DOI Creative Commons
Tom Brown, Ketan Mishra, Ahmed Elewa

et al.

Cell Genomics, Journal Year: 2025, Volume and Issue: unknown, P. 100761 - 100761

Published: Jan. 1, 2025

Language: Английский

Citations

1

The genome awakens: transposon-mediated gene regulation DOI
Ileana Tossolini, Regina Mencia, A. Arce

et al.

Trends in Plant Science, Journal Year: 2025, Volume and Issue: unknown

Published: March 1, 2025

Language: Английский

Citations

1

The haplotype-resolved genome assembly of an ancient citrus variety provides insights into the domestication history and fruit trait formation of loose-skin mandarins DOI Creative Commons
Minqiang Yin, Xiuling Song, Chi He

et al.

Genome biology, Journal Year: 2025, Volume and Issue: 26(1)

Published: March 17, 2025

Loose-skin mandarins (LSMs) are among the oldest domesticated horticultural crops, yet their domestication history and genetic basis underlying formation of key selected traits remain unclear. We provide a chromosome-scale haplotype-resolved assembly for ancient Chinese citrus variety Nanfengmiju tangerine. Through integration 77 resequenced 114 published germplasm genomes, we categorize LSMs into 12 distinct groups based on population genomic analyses. infer that ancestors modern cultivated diverged from wild in Daoxian approximately 500,000 years ago, when they entered Yangtze Pearl River Basins. There, were four cultivation groups, forming cornerstone LSM cultivation. identify selective sweeps quantitative trait loci genes related to important fruit quality traits, including sweetness size. reveal co-selection sugar transporter metabolism associated with increased sweetness. Significant alterations auxin gibberellin signaling networks may contribute enlargement fruits. also comprehensive, high-spatiotemporal-resolution atlas allelic gene expression during development. detect 5890 allele pairs showing specific patterns significant increase variation levels. Our study provides valuable resources further revises origin LSMs, offering insights improvement plants.

Language: Английский

Citations

1

The brittle star genome illuminates the genetic basis of animal appendage regeneration DOI Creative Commons
Elise Parey, Olga Ortega‐Martinez, Jérôme Delroisse

et al.

Nature Ecology & Evolution, Journal Year: 2024, Volume and Issue: 8(8), P. 1505 - 1521

Published: July 19, 2024

Species within nearly all extant animal lineages are capable of regenerating body parts. However, it remains unclear whether the gene expression programme controlling regeneration is evolutionarily conserved. Brittle stars a species-rich class echinoderms with outstanding regenerative abilities, but investigations into genetic bases in this group have been hindered by limited genomic resources. Here we report chromosome-scale genome assembly for brittle star Amphiura filiformis. We show that most rearranged among sequenced so far, featuring reorganized Hox cluster reminiscent rearrangements observed sea urchins. In addition, performed an extensive profiling during adult arm and identified sequential waves governing wound healing, proliferation differentiation. conducted comparative transcriptomic analyses other invertebrate vertebrate models appendage uncovered hundreds genes conserved dynamics, particularly proliferative phase regeneration. Our findings emphasize crucial importance to detect long-range conservation between vertebrates classical model systems.

Language: Английский

Citations

6

A high-quality, haplotype-phased genome reconstruction reveals unexpected haplotype diversity in a pearl oyster DOI Creative Commons
Takeshi Takeuchi, Yoshihiko Suzuki, Shugo Watabe

et al.

DNA Research, Journal Year: 2022, Volume and Issue: 29(6)

Published: Sept. 12, 2022

Homologous chromosomes in the diploid genome are thought to contain equivalent genetic information, but this common concept has not been fully verified animal genomes with high heterozygosity. Here we report a near-complete, haplotype-phased, assembly of pearl oyster, Pinctada fucata, using hi-fidelity (HiFi) long reads and chromosome conformation capture data. This includes 14 pairs scaffolds (>38 Mb) corresponding (2n = 28). The accuracy assembly, as measured by an analysis k-mers, is estimated be 99.99997%. Moreover, haplotypes 95.2% 95.9%, respectively, complete single-copy BUSCO genes, demonstrating quality assembly. Transposons comprise 53.3% major contributor structural variations. Despite overall collinearity between haplotypes, one chromosomal contains megabase-scale non-syntenic regions, which necessarily have never detected resolved conventional haplotype-merged assemblies. These regions encode expanded gene families NACHT, DZIP3/hRUL138-like HEPN, immunoglobulin domains, multiplying immunity repertoire, hypothesize important for innate immune capability oysters. oyster provides insight into remarkable haplotype diversity animals.

Language: Английский

Citations

24

Near-gapless genome and transcriptome analyses provide insights into fruiting body development in Lentinula edodes DOI
Nan Shen,

Haoyu Xie,

Kefang Liu

et al.

International Journal of Biological Macromolecules, Journal Year: 2024, Volume and Issue: 263, P. 130610 - 130610

Published: March 5, 2024

Language: Английский

Citations

5

A butterfly pan-genome reveals that a large amount of structural variation underlies the evolution of chromatin accessibility DOI Creative Commons
Angelo Alberto Ruggieri, Luca Livraghi, James J. Lewis

et al.

Genome Research, Journal Year: 2022, Volume and Issue: 32(10), P. 1862 - 1875

Published: Sept. 15, 2022

Despite insertions and deletions being the most common structural variants (SVs) found across genomes, not much is known about how these SVs vary within populations between closely related species, nor their significance in evolution. To address questions, we characterized evolution of indel using genome assemblies three

Language: Английский

Citations

20

A chromosome-level phased genome enabling allele-level studies in sweet orange: a case study on citrus Huanglongbing tolerance DOI Creative Commons
Bo Wu, Qibin Yu, Zhanao Deng

et al.

Horticulture Research, Journal Year: 2022, Volume and Issue: 10(1)

Published: Nov. 3, 2022

Sweet orange originated from the introgressive hybridizations of pummelo and mandarin resulting in a highly heterozygous genome. How alleles two species cooperate shaping sweet phenotypes under distinct circumstances is unknown. Here, we assembled chromosome-level phased diploid Valencia (DVS) genome with over 99.999% base accuracy 99.2% gene annotation BUSCO completeness. DVS enables allele-level studies for other hybrids between mandarin. We first configured an allele-aware transcriptomic profiling pipeline applied it to 740 transcriptomes. On average, 32.5% genes have significantly biased allelic expression Different cultivars, transgenic lineages, tissues, development stages, disease status all impacted expressions resulted diversified patterns orange, but particularly citrus Huanglongbing (HLB) shifted hundreds leaves calyx abscission zones. In addition, detected structural mutations HLB-tolerant mutant (T19) more sensitive (T78) through long-read sequencing. The irradiation-induced mostly involved double-strand breaks, while most spontaneous were transposon insertions. mutants, significant ratio alterations (≥1.5-fold) directly affected by those mutations. T19, located at translocated segment terminal upregulated, including

Language: Английский

Citations

19

DCiPatho: deep cross-fusion networks for genome scale identification of pathogens DOI Creative Commons
Gaofei Jiang, Jiaxuan Zhang, Yaozhong Zhang

et al.

Briefings in Bioinformatics, Journal Year: 2023, Volume and Issue: 24(4)

Published: May 30, 2023

Abstract Pathogen detection from biological and environmental samples is important for global disease control. Despite advances in pathogen using deep learning, current algorithms have limitations processing long genomic sequences. Through the cross-fusion of cross, residual neural networks, we developed DCiPatho accurate based on integrated frequency features 3-to-7 k-mers. Compared with existing state-of-the-art algorithms, can be used to accurately identify distinct pathogenic bacteria infecting humans, animals plants. We evaluated both learned unlearned species genomics metagenomics datasets. an effective tool genomic-scale identification pathogens by integrating k-mers into networks. The source code publicly available at https://github.com/LorMeBioAI/DCiPatho.

Language: Английский

Citations

12

A nuclear genome assembly of an extinct flightless bird, the little bush moa DOI Creative Commons
Scott V. Edwards, Alison Cloutier, Glenn Cockburn

et al.

Science Advances, Journal Year: 2024, Volume and Issue: 10(21)

Published: May 23, 2024

We present a draft genome of the little bush moa (

Language: Английский

Citations

4