Bridging biomolecular modalities for knowledge transfer in bio-language models DOI Creative Commons

Mangal Prakash,

Artem Moskalev,

Peter A. DiMaggio

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Окт. 17, 2024

Abstract In biology, messenger RNA (mRNA) plays a crucial role in gene expression and protein synthesis. Accurate predictive modeling of mRNA properties can greatly enhance our understanding manipulation biological processes, leading to advancements medical biotechnological applications. Utilizing bio-language foundation models allows for leveraging large-scale pretrained knowledge, which significantly improve the efficiency accuracy these predictions. However, specific are notably limited posing challenges efficient mRNA-focused tasks. contrast, DNA modalities have numerous general-purpose trained on billions sequences. This paper explores potential adaptation existing Through experiments using various datasets curated from both public domain internal proprietary database, we demonstrate that pre-trained be effectively transferred tasks techniques such as probing, full-rank, low-rank finetuning. addition, identify key factors influence successful adaptation, offering guidelines when likely perform well We further assess impact model size efficacy, finding medium-scale often outperform larger ones cross-modal knowledge transfer. conclude by interconnectedness DNA, mRNA, proteins, outlined central dogma molecular across modalities, enhancing repertoire computational tools available analysis.

Язык: Английский

ML-Based RNA Secondary Structure Prediction Methods: A Survey DOI Creative Commons
Qi Zhao, Jingjing Chen, Zheng Zhao

и другие.

AI medicine., Год журнала: 2024, Номер unknown

Опубликована: Окт. 29, 2024

Article ML-Based RNA Secondary Structure Prediction Methods: A Survey Qi Zhao 1, Jingjing Chen Zheng 2, Qian Mao 3, Haoxuan Shi 1 and Xiaoya Fan 4,∗ School of Medicine Biological Information Engineering, Northeastern University, Shenyang 110000, China 2 Artificial Intelligence, Dalian Maritime 116000, 3 Department Food Science College Light Industry, Liaoning 4 Software, University Technology, Key Laboratory for Ubiquitous Network Service ∗ Correspondence: [email protected] Received: 6 May 2024; Revised: 17 October Accepted: 22 Published: 29 2024 Abstract: The secondary structure noncoding RNAs (ncRNA) is significantly related to their functions, emphasizing the importance value identifying ncRNA structure. Computational prediction methods have been widely used in this field. However, performance existing computational has plateaued recent years despite various advancements. Fortunately, emergence machine learning, particularly deep brought new hope In review, we present a comprehensive overview learning-based predicting structures, with particular emphasis on learning approaches. Additionally, discuss current challenges prospects prediction.

Язык: Английский

Процитировано

0

DGRNA: a long-context RNA foundation model with bidirectional attention Mamba2 DOI Creative Commons
Ye‐Fei Yuan,

Q Chen,

Xiaoyong Pan

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Ноя. 3, 2024

Abstract Ribonucleic acid (RNA) is an important biomolecule with diverse functions i.e. genetic information transfer, regulation of gene expression and cellular functions. In recent years, the rapid development sequencing technology has significantly enhanced our understanding RNA biology advanced RNA-based therapies, resulting in a huge volume data. Data-driven methods, particularly unsupervised large language models, have been used to automatically hidden semantic from these Current models are primarily based on Transformer architecture, which cannot efficiently process long sequences, while Mamba architecture can effectively alleviate quadratic complexity associated Transformers. this study, we propose foundational model DGRNA bidirectional trained 100 million demonstrated exceptional performance across six downstream tasks compared existing models.

Язык: Английский

Процитировано

0

MethylQUEEN: A Methylation Encoded DNA Foundation Model DOI Creative Commons
Mingyang Li,

Ruichu Gu,

Shiyu Fan

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Дек. 26, 2024

Abstract DNA 5-methylcytosine (5mC) modification plays a pivotal role in many biological processes, yet 5mC information and pattern hidden behind remains to be explored. Here, we develop Methyl ation Language Model based on Qu intupl e Bidir ctional Tra n sformer (MethylQUEEN), novel pre-trained methylation foundation model capable of sensing states covering the genome-wide landscape. Through tailored methylation-prone pre-training, MethylQUEEN effectively captured epigenetics within sequences: it accurately traces DNA’s tissue-of-origin, successfully recovers expression profile through states. Integrative analysis MethylQUEEN’s attention scores also enables us reveal unique status tissue for precise disease detection, identifying key regulatory sites intervention. As result, signifies new paradigm various problems. Besides, our study demonstrates effectiveness directly integrating into offering perspectives methodologies range methylation-related processes. It serves as an initial exploration development more comprehensive epigenomic models.

Язык: Английский

Процитировано

0

Bridging biomolecular modalities for knowledge transfer in bio-language models DOI Creative Commons

Mangal Prakash,

Artem Moskalev,

Peter A. DiMaggio

и другие.

bioRxiv (Cold Spring Harbor Laboratory), Год журнала: 2024, Номер unknown

Опубликована: Окт. 17, 2024

Abstract In biology, messenger RNA (mRNA) plays a crucial role in gene expression and protein synthesis. Accurate predictive modeling of mRNA properties can greatly enhance our understanding manipulation biological processes, leading to advancements medical biotechnological applications. Utilizing bio-language foundation models allows for leveraging large-scale pretrained knowledge, which significantly improve the efficiency accuracy these predictions. However, specific are notably limited posing challenges efficient mRNA-focused tasks. contrast, DNA modalities have numerous general-purpose trained on billions sequences. This paper explores potential adaptation existing Through experiments using various datasets curated from both public domain internal proprietary database, we demonstrate that pre-trained be effectively transferred tasks techniques such as probing, full-rank, low-rank finetuning. addition, identify key factors influence successful adaptation, offering guidelines when likely perform well We further assess impact model size efficacy, finding medium-scale often outperform larger ones cross-modal knowledge transfer. conclude by interconnectedness DNA, mRNA, proteins, outlined central dogma molecular across modalities, enhancing repertoire computational tools available analysis.

Язык: Английский

Процитировано

0