iPro-CSAF: identification of promoters based on convolutional spiking neural networks and spiking attention mechanism DOI Creative Commons
Qian Zhou, Jie Meng, Hao Luo

и другие.

PeerJ Computer Science, Год журнала: 2025, Номер 11, С. e2761 - e2761

Опубликована: Март 26, 2025

A promoter is a DNA segment which plays key role in regulating gene expression. Accurate identification of promoters significant for understanding the regulatory mechanisms involved expression and genetic disease treatment. Therefore, it an urgent challenge to develop computational methods identifying promoters. Most current were designed recognition on few species required complex feature extraction order attain high accuracy. Spiking neural networks have inherent recurrence use spike-based sparse coding. they good property processing spatio-temporal information are well suited learning sequence information. In this study, iPro-CSAF, convolutional spiking network combined with attention mechanism recognition. The method extracts features by two parallel branches including layer. iPro-CSAF evaluated exhaustive experiments both prokaryotic eukaryotic from seven species. Our results show that outperforms used CNN layers, CNNs capsule networks, mechanism, LSTM or BiLSTM, CNNs-based needed priori biological text extraction, while our has much fewer parameters. It indicates effective low complexity generalization

Язык: Английский

Genomic language models: opportunities and challenges DOI
Gonzalo Benegas, Chengzhong Ye,

Carlos Albors

и другие.

Trends in Genetics, Год журнала: 2025, Номер unknown

Опубликована: Янв. 1, 2025

Язык: Английский

Процитировано

4

A DNA language model based on multispecies alignment predicts the effects of genome-wide variants DOI
Gonzalo Benegas,

Carlos Albors,

Alan J. Aw

и другие.

Nature Biotechnology, Год журнала: 2025, Номер unknown

Опубликована: Янв. 2, 2025

Язык: Английский

Процитировано

4

GENA-LM: a family of open-source foundational DNA language models for long sequences DOI Creative Commons
Veniamin Fishman, Yuri Kuratov, Aleksei Shmelev

и другие.

Nucleic Acids Research, Год журнала: 2025, Номер 53(2)

Опубликована: Янв. 11, 2025

Abstract Recent advancements in genomics, propelled by artificial intelligence, have unlocked unprecedented capabilities interpreting genomic sequences, mitigating the need for exhaustive experimental analysis of complex, intertwined molecular processes inherent DNA function. A significant challenge, however, resides accurately decoding which inherently involves comprehending rich contextual information dispersed across thousands nucleotides. To address this need, we introduce GENA language model (GENA-LM), a suite transformer-based foundational models capable handling input lengths up to 36 000 base pairs. Notably, integrating newly developed recurrent memory mechanism allows these process even larger segments. We provide pre-trained versions GENA-LM, including multispecies and taxon-specific models, demonstrating their capability fine-tuning addressing spectrum complex biological tasks with modest computational demands. While already achieved breakthroughs protein biology, GENA-LM showcases similarly promising potential reshaping landscape genomics multi-omics data analysis. All are publicly available on GitHub (https://github.com/AIRI-Institute/GENA_LM) HuggingFace (https://huggingface.co/AIRI-Institute). In addition, web service (https://dnalm.airi.net/) allowing user-friendly annotation models.

Язык: Английский

Процитировано

3

Recent advances in deep learning and language models for studying the microbiome DOI Creative Commons
Binghao Yan,

Yunbi Nam,

Lingyao Li

и другие.

Frontiers in Genetics, Год журнала: 2025, Номер 15

Опубликована: Янв. 7, 2025

Recent advancements in deep learning, particularly large language models (LLMs), made a significant impact on how researchers study microbiome and metagenomics data. Microbial protein genomic sequences, like natural languages, form of life, enabling the adoption LLMs to extract useful insights from complex microbial ecologies. In this paper, we review applications learning analyzing We focus problem formulations, necessary datasets, integration modeling techniques. provide an extensive overview protein/genomic their contributions studies. also discuss such as novel viromics modeling, biosynthetic gene cluster prediction, knowledge for

Язык: Английский

Процитировано

2

Large language models in plant biology DOI
Hilbert Yuen In Lam, Xing Er Ong, Marek Mutwil

и другие.

Trends in Plant Science, Год журнала: 2024, Номер 29(10), С. 1145 - 1155

Опубликована: Май 26, 2024

Язык: Английский

Процитировано

14

How to build the virtual cell with artificial intelligence: Priorities and opportunities DOI Creative Commons
Charlotte Bunne, Yusuf Roohani, Yanay Rosen

и другие.

Cell, Год журнала: 2024, Номер 187(25), С. 7045 - 7063

Опубликована: Дек. 1, 2024

Cells are essential to understanding health and disease, yet traditional models fall short of modeling simulating their function behavior. Advances in AI omics offer groundbreaking opportunities create an virtual cell (AIVC), a multi-scale, multi-modal large-neural-network-based model that can represent simulate the behavior molecules, cells, tissues across diverse states. This Perspective provides vision on design how collaborative efforts build AIVCs will transform biological research by allowing high-fidelity simulations, accelerating discoveries, guiding experimental studies, offering new for cellular functions fostering interdisciplinary collaborations open science.

Язык: Английский

Процитировано

13

Evaluating the representational power of pre-trained DNA language models for regulatory genomics DOI Creative Commons
Ziqi Tang,

Nikunj V. Somia,

Yiyang Yu

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Март 4, 2024

ABSTRACT The emergence of genomic language models (gLMs) offers an unsupervised approach to learning a wide diversity cis -regulatory patterns in the non-coding genome without requiring labels functional activity generated by wet-lab experiments. Previous evaluations have shown that pre-trained gLMs can be leveraged improve predictive performance across broad range regulatory genomics tasks, albeit using relatively simple benchmark datasets and baseline models. Since these studies were tested upon fine-tuning their weights for each downstream task, determining whether gLM representations embody foundational understanding biology remains open question. Here we evaluate representational power predict interpret cell-type-specific data span DNA RNA regulation. Our findings suggest probing do not offer substantial advantages over conventional machine approaches use one-hot encoded sequences. This work highlights major gap with current gLMs, raising potential issues pre-training strategies genome.

Язык: Английский

Процитировано

12

Toward a comprehensive profiling of alternative splicing proteoform structures, interactions and functions DOI Creative Commons
Élodie Laine, María I. Freiberger

Current Opinion in Structural Biology, Год журнала: 2025, Номер 90, С. 102979 - 102979

Опубликована: Янв. 7, 2025

The mRNA splicing machinery has been estimated to generate 100,000 known protein-coding transcripts for 20,000 human genes (Ensembl, Sept. 2024). However, this set is expanding with the massive and rapidly growing data coming from high-throughput technologies, particularly single-cell long-read sequencing. Yet, implications of complexity at protein level remain largely uncharted. In review, we describe current advances toward systematically assessing contribution alternative proteome function diversification. We discuss potential challenges using artificial intelligence-based techniques in identifying proteoforms characterising their structures, interactions, functions.

Язык: Английский

Процитировано

1

DNALONGBENCH: A Benchmark Suite for Long-Range DNA Prediction Tasks DOI Creative Commons

W. Cheng,

Zhenqiao Song,

Yang Zhang

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2025, Номер unknown

Опубликована: Янв. 8, 2025

Modeling long-range DNA dependencies is crucial for understanding genome structure and function across a wide range of biological contexts. However, effectively capturing these extensive dependencies, which may span millions base pairs in tasks such as three-dimensional (3D) chromatin folding prediction, remains significant challenge. Furthermore, comprehensive benchmark suite evaluating that rely on notably absent. To address this gap, we introduce DNAL ong B ench , dataset encompassing five important genomics consider up to 1 million pairs: enhancer-target gene interaction, expression quantitative trait loci, 3D organization, regulatory sequence activity, transcription initiation signals. comprehensively assess evaluate the performance methods: task-specific expert model, convolutional neural network (CNN)-based three fine-tuned foundation models - HyenaDNA, Caduceus-Ph, Caduceus-PS. We envision standardized resource with potential facilitate comparisons rigorous evaluations emerging sequence-based deep learning account dependencies.

Язык: Английский

Процитировано

1

From GPUs to AI and quantum: three waves of acceleration in bioinformatics DOI Creative Commons
Bertil Schmidt, Andreas Hildebrandt

Drug Discovery Today, Год журнала: 2024, Номер 29(6), С. 103990 - 103990

Опубликована: Апрель 23, 2024

The enormous growth in the amount of data generated by life sciences is continuously shifting field from model-driven science towards data-driven science. need for efficient processing has led to adoption massively parallel accelerators such as graphics units (GPUs). Consequently, development bioinformatics methods nowadays often heavily depends on effective use these powerful technologies. Furthermore, progress computational techniques and architectures continues be highly dynamic, involving novel deep neural network models artificial intelligence (AI) accelerators, potentially quantum future. These are expected disruptive a whole drug discovery particular. Here, we identify three waves acceleration their applications context: (i) GPU computing, (ii) AI (iii) next-generation computers.

Язык: Английский

Процитировано

7