Current Opinion in Plant Biology, Год журнала: 2024, Номер 82, С. 102665 - 102665
Опубликована: Ноя. 22, 2024
Язык: Английский
Current Opinion in Plant Biology, Год журнала: 2024, Номер 82, С. 102665 - 102665
Опубликована: Ноя. 22, 2024
Язык: Английский
Nature Methods, Год журнала: 2024, Номер unknown
Опубликована: Ноя. 28, 2024
The prediction of molecular phenotypes from DNA sequences remains a longstanding challenge in genomics, often driven by limited annotated data and the inability to transfer learnings between tasks. Here, we present an extensive study foundation models pre-trained on sequences, named Nucleotide Transformer, ranging 50 million up 2.5 billion parameters integrating information 3,202 human genomes 850 diverse species. These transformer yield context-specific representations nucleotide which allow for accurate predictions even low-data settings. We show that developed can be fine-tuned at low cost solve variety genomics applications. Despite no supervision, learned focus attention key genomic elements used improve prioritization genetic variants. training application foundational provides widely applicable approach phenotype sequence. Transformer is series different parameter sizes datasets applied various downstream tasks fine-tuning.
Язык: Английский
Процитировано
36Trends in Genetics, Год журнала: 2025, Номер unknown
Опубликована: Янв. 1, 2025
Язык: Английский
Процитировано
4bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown
Опубликована: Март 4, 2024
ABSTRACT The emergence of genomic language models (gLMs) offers an unsupervised approach to learning a wide diversity cis -regulatory patterns in the non-coding genome without requiring labels functional activity generated by wet-lab experiments. Previous evaluations have shown that pre-trained gLMs can be leveraged improve predictive performance across broad range regulatory genomics tasks, albeit using relatively simple benchmark datasets and baseline models. Since these studies were tested upon fine-tuning their weights for each downstream task, determining whether gLM representations embody foundational understanding biology remains open question. Here we evaluate representational power predict interpret cell-type-specific data span DNA RNA regulation. Our findings suggest probing do not offer substantial advantages over conventional machine approaches use one-hot encoded sequences. This work highlights major gap with current gLMs, raising potential issues pre-training strategies genome.
Язык: Английский
Процитировано
12bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown
Опубликована: Июнь 5, 2024
Interpreting function and fitness effects in diverse plant genomes requires transferable models. Language models (LMs) pre-trained on large-scale biological sequences can learn evolutionary conservation offer cross-species prediction better than supervised through fine-tuning limited labeled data. We introduce PlantCaduceus, a DNA LM based the Caduceus Mamba architectures, curated dataset of 16 Angiosperm genomes. Fine-tuning PlantCaduceus Arabidopsis data for four tasks, including predicting translation initiation/termination sites splice donor acceptor sites, demonstrated high transferability to 160 million year diverged maize, outperforming best existing by 1.45 7.23-fold. is competitive state-of-the-art protein LMs terms deleterious mutation identification, threefold PhyloP. Additionally, successfully identifies well-known causal variants both maize. Overall, versatile that accelerate genomics crop breeding applications.
Язык: Английский
Процитировано
6Computational and Structural Biotechnology Journal, Год журнала: 2024, Номер 23, С. 3454 - 3466
Опубликована: Сен. 17, 2024
Язык: Английский
Процитировано
5Tropical Plants, Год журнала: 2025, Номер 4(1), С. 0 - 0
Опубликована: Янв. 1, 2025
Язык: Английский
Процитировано
0Nature Communications, Год журнала: 2025, Номер 16(1)
Опубликована: Янв. 24, 2025
Orphan crops are important sources of nutrition in developing regions and many tolerant to biotic abiotic stressors; however, modern crop improvement technologies have not been widely applied orphan due the lack resources available. There representatives across major types conservation genes between these related species can be used improvement. Machine learning (ML) has emerged as a promising tool for Transferring knowledge from using machine improve accuracy efficiency crops. Here, authors review transferring breeding.
Язык: Английский
Процитировано
0Опубликована: Янв. 1, 2025
Язык: Английский
Процитировано
0Computers and Electronics in Agriculture, Год журнала: 2025, Номер 235, С. 110396 - 110396
Опубликована: Апрель 19, 2025
Язык: Английский
Процитировано
0Molecular Plant, Год журнала: 2024, Номер unknown
Опубликована: Дек. 1, 2024
Язык: Английский
Процитировано
2