Current Opinion in Plant Biology, Journal Year: 2024, Volume and Issue: 82, P. 102665 - 102665
Published: Nov. 22, 2024
Language: Английский
Current Opinion in Plant Biology, Journal Year: 2024, Volume and Issue: 82, P. 102665 - 102665
Published: Nov. 22, 2024
Language: Английский
Nature Methods, Journal Year: 2024, Volume and Issue: unknown
Published: Nov. 28, 2024
The prediction of molecular phenotypes from DNA sequences remains a longstanding challenge in genomics, often driven by limited annotated data and the inability to transfer learnings between tasks. Here, we present an extensive study foundation models pre-trained on sequences, named Nucleotide Transformer, ranging 50 million up 2.5 billion parameters integrating information 3,202 human genomes 850 diverse species. These transformer yield context-specific representations nucleotide which allow for accurate predictions even low-data settings. We show that developed can be fine-tuned at low cost solve variety genomics applications. Despite no supervision, learned focus attention key genomic elements used improve prioritization genetic variants. training application foundational provides widely applicable approach phenotype sequence. Transformer is series different parameter sizes datasets applied various downstream tasks fine-tuning.
Language: Английский
Citations
36Trends in Genetics, Journal Year: 2025, Volume and Issue: unknown
Published: Jan. 1, 2025
Language: Английский
Citations
4bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown
Published: March 4, 2024
ABSTRACT The emergence of genomic language models (gLMs) offers an unsupervised approach to learning a wide diversity cis -regulatory patterns in the non-coding genome without requiring labels functional activity generated by wet-lab experiments. Previous evaluations have shown that pre-trained gLMs can be leveraged improve predictive performance across broad range regulatory genomics tasks, albeit using relatively simple benchmark datasets and baseline models. Since these studies were tested upon fine-tuning their weights for each downstream task, determining whether gLM representations embody foundational understanding biology remains open question. Here we evaluate representational power predict interpret cell-type-specific data span DNA RNA regulation. Our findings suggest probing do not offer substantial advantages over conventional machine approaches use one-hot encoded sequences. This work highlights major gap with current gLMs, raising potential issues pre-training strategies genome.
Language: Английский
Citations
12bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown
Published: June 5, 2024
Interpreting function and fitness effects in diverse plant genomes requires transferable models. Language models (LMs) pre-trained on large-scale biological sequences can learn evolutionary conservation offer cross-species prediction better than supervised through fine-tuning limited labeled data. We introduce PlantCaduceus, a DNA LM based the Caduceus Mamba architectures, curated dataset of 16 Angiosperm genomes. Fine-tuning PlantCaduceus Arabidopsis data for four tasks, including predicting translation initiation/termination sites splice donor acceptor sites, demonstrated high transferability to 160 million year diverged maize, outperforming best existing by 1.45 7.23-fold. is competitive state-of-the-art protein LMs terms deleterious mutation identification, threefold PhyloP. Additionally, successfully identifies well-known causal variants both maize. Overall, versatile that accelerate genomics crop breeding applications.
Language: Английский
Citations
6Computational and Structural Biotechnology Journal, Journal Year: 2024, Volume and Issue: 23, P. 3454 - 3466
Published: Sept. 17, 2024
Language: Английский
Citations
5Tropical Plants, Journal Year: 2025, Volume and Issue: 4(1), P. 0 - 0
Published: Jan. 1, 2025
Language: Английский
Citations
0Nature Communications, Journal Year: 2025, Volume and Issue: 16(1)
Published: Jan. 24, 2025
Orphan crops are important sources of nutrition in developing regions and many tolerant to biotic abiotic stressors; however, modern crop improvement technologies have not been widely applied orphan due the lack resources available. There representatives across major types conservation genes between these related species can be used improvement. Machine learning (ML) has emerged as a promising tool for Transferring knowledge from using machine improve accuracy efficiency crops. Here, authors review transferring breeding.
Language: Английский
Citations
0Published: Jan. 1, 2025
Language: Английский
Citations
0Computers and Electronics in Agriculture, Journal Year: 2025, Volume and Issue: 235, P. 110396 - 110396
Published: April 19, 2025
Language: Английский
Citations
0Molecular Plant, Journal Year: 2024, Volume and Issue: unknown
Published: Dec. 1, 2024
Language: Английский
Citations
2