Nature Biotechnology, Journal Year: 2025, Volume and Issue: unknown
Published: April 29, 2025
Language: Английский
Nature Biotechnology, Journal Year: 2025, Volume and Issue: unknown
Published: April 29, 2025
Language: Английский
Chemical Engineering Journal, Journal Year: 2025, Volume and Issue: unknown, P. 160738 - 160738
Published: Feb. 1, 2025
Language: Английский
Citations
1bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2025, Volume and Issue: unknown
Published: Feb. 5, 2025
ABSTRACT Genome foundation models hold transformative potential for precision medicine, drug discovery, and understanding complex biological systems. However, existing are often inefficient, constrained by suboptimal tokenization architectural design, biased toward reference genomes, limiting their representation of low-abundance, uncultured microbes in the rare biosphere. To address these challenges, we developed GenomeOcean , a 4-billion-parameter generative genome model trained on over 600 Gbp high-quality contigs derived from 220 TB metagenomic datasets collected diverse habitats across Earth’s ecosystems. A key innovation is training directly large-scale co-assemblies samples, enabling enhanced microbial species improving generalizability beyond genome-centric approaches. We implemented byte-pair encoding (BPE) strategy sequence generation, alongside optimizations, achieving up to 150× faster generation while maintaining high fidelity. excels representing generating protein-coding genes evolutionary principles. Additionally, its fine-tuned demonstrates ability discover novel biosynthetic gene clusters (BGCs) natural genomes perform zero-shot synthesis biochemically plausible, complete BGCs. sets new benchmark research, product synthetic biology, offering robust advancing fields.
Language: Английский
Citations
0Research Square (Research Square), Journal Year: 2025, Volume and Issue: unknown
Published: April 14, 2025
Language: Английский
Citations
0Nature Biotechnology, Journal Year: 2025, Volume and Issue: unknown
Published: April 29, 2025
Language: Английский
Citations
0