Bridging biomolecular modalities for knowledge transfer in bio-language models DOI Creative Commons

Mangal Prakash,

Artem Moskalev,

Peter A. DiMaggio

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Oct. 17, 2024

Abstract In biology, messenger RNA (mRNA) plays a crucial role in gene expression and protein synthesis. Accurate predictive modeling of mRNA properties can greatly enhance our understanding manipulation biological processes, leading to advancements medical biotechnological applications. Utilizing bio-language foundation models allows for leveraging large-scale pretrained knowledge, which significantly improve the efficiency accuracy these predictions. However, specific are notably limited posing challenges efficient mRNA-focused tasks. contrast, DNA modalities have numerous general-purpose trained on billions sequences. This paper explores potential adaptation existing Through experiments using various datasets curated from both public domain internal proprietary database, we demonstrate that pre-trained be effectively transferred tasks techniques such as probing, full-rank, low-rank finetuning. addition, identify key factors influence successful adaptation, offering guidelines when likely perform well We further assess impact model size efficacy, finding medium-scale often outperform larger ones cross-modal knowledge transfer. conclude by interconnectedness DNA, mRNA, proteins, outlined central dogma molecular across modalities, enhancing repertoire computational tools available analysis.

Language: Английский

Ribonanza: deep learning of RNA structure through dual crowdsourcing DOI Creative Commons
Shujun He, Rui Huang, Jill Townley

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Feb. 27, 2024

Prediction of RNA structure from sequence remains an unsolved problem, and progress has been slowed by a paucity experimental data. Here, we present Ribonanza, dataset chemical mapping measurements on two million diverse sequences collected through Eterna other crowdsourced initiatives. Ribonanza enabled solicitation, training, prospective evaluation deep neural networks Kaggle challenge, followed distillation into single, self-contained model called RibonanzaNet. When fine tuned auxiliary datasets, RibonanzaNet achieves state-of-the-art performance in modeling dropout, hydrolytic degradation, secondary structure, with implications for tertiary structure.

Language: Английский

Citations

9

Language models enable zero-shot prediction of RNA secondary structures including pseudoknots DOI Creative Commons
Tiansu Gong, Dongbo Bu

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Jan. 29, 2024

Current deep learning-based models for predicting RNA secondary structures face challenges in achieving high generalization ability. At the same time, a vast repository of unlabeled non-coding (ncRNA) sequences remains untapped structure prediction tasks. To address this challenge, we trained RNA-km, foundation language model that enables zero-shot including pseudoknots. For end, incorporated specific modifications into training process, k-mer masking strategy and relative positional encoding. RNA-km are on 23 million ncRNA self-supervised manner, gaining advantages target sequence, make with attention maps provided by specified minimum-cost flow algorithm. Our results popular benchmark datasets demonstrate exhibits abilities, excelling predictions structures. In addition, capture intricate structural relationships, as evidenced accurate pseudoknot precise identification long-distance base pairs. We anticipate enhances predictive capacity robustness existing models, thereby improving their ability to accurately predict novel sequences.

Language: Английский

Citations

5

RNA Sequence Analysis Landscape: A Comprehensive Review of Task Types, Databases, Datasets, Word Embedding Methods, and Language Models DOI Creative Commons
Muhammad Nabeel Asim, Muhammad Ali Ibrahim,

Tayyaba Asif

et al.

Heliyon, Journal Year: 2025, Volume and Issue: 11(2), P. e41488 - e41488

Published: Jan. 1, 2025

Deciphering information of RNA sequences reveals their diverse roles in living organisms, including gene regulation and protein synthesis. Aberrations sequence such as dysregulation mutations can drive a spectrum diseases cancers, genetic disorders, neurodegenerative conditions. Furthermore, researchers are harnessing RNA's therapeutic potential for transforming traditional treatment paradigms into personalized therapies through the development RNA-based drugs therapies. To gain insights biological functions to detect at early stages develop potent therapeutics, performing types analysis tasks. conventional wet-lab methods is expensive, time-consuming error prone. enable large-scale analysis, empowerment experimental with Artificial Intelligence (AI) applications necessitates scientists have comprehensive knowledge both DNA AI fields. While molecular biologists encounter challenges understanding methods, computer often lack basic foundations Considering absence literature that bridges this research gap promotes AI-driven applications, contributions manuscript manifold: It equips 47 distinct sets stage benchmark datasets related tasks by facilitating cruxes 64 different databases. presents word embeddings language models across streamlines new predictors providing survey 58 70 based predictive pipelines performance values well top encoding performances

Language: Английский

Citations

0

RNA function follows form – why is it so hard to predict? DOI Creative Commons

Diana Kwon

Nature, Journal Year: 2025, Volume and Issue: 639(8056), P. 1106 - 1108

Published: March 24, 2025

Language: Английский

Citations

0

Advances in the field of RNA 3D structure prediction and modeling, with purely theoretical approaches, and with the use of experimental data DOI
Sunandan Mukherjee, S. Naeim Moafinejad, Nagendar Goud Badepally

et al.

Structure, Journal Year: 2024, Volume and Issue: unknown

Published: Sept. 1, 2024

Language: Английский

Citations

3

Comprehensive translational profiling and STE AI uncover rapid control of protein biosynthesis during cell stress DOI Creative Commons
Attila Horváth, Yoshika Janapala,

Katrina Woodward

et al.

Nucleic Acids Research, Journal Year: 2024, Volume and Issue: 52(13), P. 7925 - 7946

Published: May 9, 2024

Abstract Translational control is important in all life, but it remains a challenge to accurately quantify. When ribosomes translate messenger (m)RNA into proteins, they attach the mRNA series, forming poly(ribo)somes, and can co-localize. Here, we computationally model new types of co-localized ribosomal complexes on identify them using enhanced translation complex profile sequencing (eTCP-seq) based rapid vivo crosslinking. We detect long disome footprints outside regions non-random elongation stalls show these are linked initiation protein biosynthesis rates. subject disomes other artificial intelligence (AI) analysis construct new, accurate self-normalized measure translation, termed stochastic efficiency (STE). then apply STE investigate changes yeast undergoing glucose depletion. Importantly, that, well beyond tagging stalls, provide rich insight translational mechanisms, polysome dynamics topology. AI ranks cellular mRNAs by absolute rates under given conditions, assist identifying its elements will facilitate development next-generation synthetic biology designs mRNA-based therapeutics.

Language: Английский

Citations

1

Structural and biophysical dissection of RNA conformational ensembles DOI Creative Commons
Steve Bonilla,

Alisha Jones,

Danny Incarnato

et al.

Current Opinion in Structural Biology, Journal Year: 2024, Volume and Issue: 88, P. 102908 - 102908

Published: Aug. 14, 2024

RNA's ability to form and interconvert between multiple secondary tertiary structures is critical its functional versatility the traditional view of RNA as static entities has shifted towards understanding them dynamic conformational ensembles. In this review we discuss structural ensembles their dynamics, highlighting concept energy landscapes a unifying framework for processes such folding, misfolding, changes, complex formation. Ongoing advancements in cryo-electron microscopy chemical probing techniques are significantly enhancing our investigate adopted by conformationally RNAs, while methods nuclear magnetic resonance spectroscopy continue play crucial role providing high-resolution, quantitative spatial temporal information. We how these methods, when used synergistically, can provide comprehensive ensembles, offering new insights into regulatory functions.

Language: Английский

Citations

1

A Large-Scale Foundation Model for RNA Function and Structure Prediction DOI Creative Commons

S. Zou,

Tianhua Tao,

Parvez Mahbub

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Nov. 29, 2024

Abstract Originally marginalized as an intermediate in the information flow from DNA to protein, RNA has become star of modern biology, holding key precision therapeutics, genetic engineering, evolutionary origins, and our understanding fundamental cellular processes. Yet is mysterious it prolific, serving store, a messenger, catalyst, spanning many underchar-acterized functional structural classes. Deciphering language important not only for mechanistic its biological functions but also accelerating drug design. Toward this goal, we introduce AIDO.RNA, pre-trained module AI-driven Digital Organism [1]. AIDO.RNA contains scale 1.6 billion parameters, trained on 42 million non-coding (ncRNA) sequences at single-nucleotide resolution, achieves state-of-the-art performance comprehensive set tasks, including structure prediction, regulation, molecular function across species, sequence after domain adaptation learns model essential parts protein translation that models, which have received widespread attention recent years, do not. More broadly, hints generality modeling ability leverage central dogma improve biomolecular representations. Models code are available through ModelGenerator https://github.com/genbio-ai/AIDO Hugging Face .

Language: Английский

Citations

1

RNAGenesis: Foundation Model for Enhanced RNA Sequence Generation and Structural Insights DOI Creative Commons
Zaixi Zhang, Chao Liu, Ruofan Jin

et al.

bioRxiv (Cold Spring Harbor Laboratory), Journal Year: 2024, Volume and Issue: unknown

Published: Dec. 31, 2024

ABSTRACT RNA molecule plays an essential role in a wide range of biological processes. Gaining deeper understanding their functions can significantly advance our knowledge life’s mechanisms and drive the development drugs for various diseases. Recently, advances foundation models have enabled new approaches to engineering, yet existing methods fall short generating novel sequences with specific functions. Here, we introduce RNAGenesis, model that combines sequence de novo design through latent diffusion. With Bert-like Transformer encoder Hybrid N-Gram tokenization encoding, Query space compression, autoregressive decoder generation, RNAGenesis reconstructs from learned representations. Specifically score-based denoising diffusion is trained capture distribution sequences. outperforms current understanding, achieving best results 9 13 benchmarks (especially structure prediction), further excels designing natural-like aptamers optimized CRISPR sgRNAs desirable properties. Our work establishes as powerful tool RNA-based therapeutics biotechnology.

Language: Английский

Citations

1

Editor’s pick: Atomic AI DOI

Vivien Marx

Nature Biotechnology, Journal Year: 2024, Volume and Issue: 42(9), P. 1341 - 1342

Published: Aug. 26, 2024

Language: Английский

Citations

0