bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Oct. 17, 2024
Abstract
In
biology,
messenger
RNA
(mRNA)
plays
a
crucial
role
in
gene
expression
and
protein
synthesis.
Accurate
predictive
modeling
of
mRNA
properties
can
greatly
enhance
our
understanding
manipulation
biological
processes,
leading
to
advancements
medical
biotechnological
applications.
Utilizing
bio-language
foundation
models
allows
for
leveraging
large-scale
pretrained
knowledge,
which
significantly
improve
the
efficiency
accuracy
these
predictions.
However,
specific
are
notably
limited
posing
challenges
efficient
mRNA-focused
tasks.
contrast,
DNA
modalities
have
numerous
general-purpose
trained
on
billions
sequences.
This
paper
explores
potential
adaptation
existing
Through
experiments
using
various
datasets
curated
from
both
public
domain
internal
proprietary
database,
we
demonstrate
that
pre-trained
be
effectively
transferred
tasks
techniques
such
as
probing,
full-rank,
low-rank
finetuning.
addition,
identify
key
factors
influence
successful
adaptation,
offering
guidelines
when
likely
perform
well
We
further
assess
impact
model
size
efficacy,
finding
medium-scale
often
outperform
larger
ones
cross-modal
knowledge
transfer.
conclude
by
interconnectedness
DNA,
mRNA,
proteins,
outlined
central
dogma
molecular
across
modalities,
enhancing
repertoire
computational
tools
available
analysis.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Feb. 27, 2024
Prediction
of
RNA
structure
from
sequence
remains
an
unsolved
problem,
and
progress
has
been
slowed
by
a
paucity
experimental
data.
Here,
we
present
Ribonanza,
dataset
chemical
mapping
measurements
on
two
million
diverse
sequences
collected
through
Eterna
other
crowdsourced
initiatives.
Ribonanza
enabled
solicitation,
training,
prospective
evaluation
deep
neural
networks
Kaggle
challenge,
followed
distillation
into
single,
self-contained
model
called
RibonanzaNet.
When
fine
tuned
auxiliary
datasets,
RibonanzaNet
achieves
state-of-the-art
performance
in
modeling
dropout,
hydrolytic
degradation,
secondary
structure,
with
implications
for
tertiary
structure.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Jan. 29, 2024
Current
deep
learning-based
models
for
predicting
RNA
secondary
structures
face
challenges
in
achieving
high
generalization
ability.
At
the
same
time,
a
vast
repository
of
unlabeled
non-coding
(ncRNA)
sequences
remains
untapped
structure
prediction
tasks.
To
address
this
challenge,
we
trained
RNA-km,
foundation
language
model
that
enables
zero-shot
including
pseudoknots.
For
end,
incorporated
specific
modifications
into
training
process,
k-mer
masking
strategy
and
relative
positional
encoding.
RNA-km
are
on
23
million
ncRNA
self-supervised
manner,
gaining
advantages
target
sequence,
make
with
attention
maps
provided
by
specified
minimum-cost
flow
algorithm.
Our
results
popular
benchmark
datasets
demonstrate
exhibits
abilities,
excelling
predictions
structures.
In
addition,
capture
intricate
structural
relationships,
as
evidenced
accurate
pseudoknot
precise
identification
long-distance
base
pairs.
We
anticipate
enhances
predictive
capacity
robustness
existing
models,
thereby
improving
their
ability
to
accurately
predict
novel
sequences.
Heliyon,
Journal Year:
2025,
Volume and Issue:
11(2), P. e41488 - e41488
Published: Jan. 1, 2025
Deciphering
information
of
RNA
sequences
reveals
their
diverse
roles
in
living
organisms,
including
gene
regulation
and
protein
synthesis.
Aberrations
sequence
such
as
dysregulation
mutations
can
drive
a
spectrum
diseases
cancers,
genetic
disorders,
neurodegenerative
conditions.
Furthermore,
researchers
are
harnessing
RNA's
therapeutic
potential
for
transforming
traditional
treatment
paradigms
into
personalized
therapies
through
the
development
RNA-based
drugs
therapies.
To
gain
insights
biological
functions
to
detect
at
early
stages
develop
potent
therapeutics,
performing
types
analysis
tasks.
conventional
wet-lab
methods
is
expensive,
time-consuming
error
prone.
enable
large-scale
analysis,
empowerment
experimental
with
Artificial
Intelligence
(AI)
applications
necessitates
scientists
have
comprehensive
knowledge
both
DNA
AI
fields.
While
molecular
biologists
encounter
challenges
understanding
methods,
computer
often
lack
basic
foundations
Considering
absence
literature
that
bridges
this
research
gap
promotes
AI-driven
applications,
contributions
manuscript
manifold:
It
equips
47
distinct
sets
stage
benchmark
datasets
related
tasks
by
facilitating
cruxes
64
different
databases.
presents
word
embeddings
language
models
across
streamlines
new
predictors
providing
survey
58
70
based
predictive
pipelines
performance
values
well
top
encoding
performances
Nucleic Acids Research,
Journal Year:
2024,
Volume and Issue:
52(13), P. 7925 - 7946
Published: May 9, 2024
Abstract
Translational
control
is
important
in
all
life,
but
it
remains
a
challenge
to
accurately
quantify.
When
ribosomes
translate
messenger
(m)RNA
into
proteins,
they
attach
the
mRNA
series,
forming
poly(ribo)somes,
and
can
co-localize.
Here,
we
computationally
model
new
types
of
co-localized
ribosomal
complexes
on
identify
them
using
enhanced
translation
complex
profile
sequencing
(eTCP-seq)
based
rapid
vivo
crosslinking.
We
detect
long
disome
footprints
outside
regions
non-random
elongation
stalls
show
these
are
linked
initiation
protein
biosynthesis
rates.
subject
disomes
other
artificial
intelligence
(AI)
analysis
construct
new,
accurate
self-normalized
measure
translation,
termed
stochastic
efficiency
(STE).
then
apply
STE
investigate
changes
yeast
undergoing
glucose
depletion.
Importantly,
that,
well
beyond
tagging
stalls,
provide
rich
insight
translational
mechanisms,
polysome
dynamics
topology.
AI
ranks
cellular
mRNAs
by
absolute
rates
under
given
conditions,
assist
identifying
its
elements
will
facilitate
development
next-generation
synthetic
biology
designs
mRNA-based
therapeutics.
Current Opinion in Structural Biology,
Journal Year:
2024,
Volume and Issue:
88, P. 102908 - 102908
Published: Aug. 14, 2024
RNA's
ability
to
form
and
interconvert
between
multiple
secondary
tertiary
structures
is
critical
its
functional
versatility
the
traditional
view
of
RNA
as
static
entities
has
shifted
towards
understanding
them
dynamic
conformational
ensembles.
In
this
review
we
discuss
structural
ensembles
their
dynamics,
highlighting
concept
energy
landscapes
a
unifying
framework
for
processes
such
folding,
misfolding,
changes,
complex
formation.
Ongoing
advancements
in
cryo-electron
microscopy
chemical
probing
techniques
are
significantly
enhancing
our
investigate
adopted
by
conformationally
RNAs,
while
methods
nuclear
magnetic
resonance
spectroscopy
continue
play
crucial
role
providing
high-resolution,
quantitative
spatial
temporal
information.
We
how
these
methods,
when
used
synergistically,
can
provide
comprehensive
ensembles,
offering
new
insights
into
regulatory
functions.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Nov. 29, 2024
Abstract
Originally
marginalized
as
an
intermediate
in
the
information
flow
from
DNA
to
protein,
RNA
has
become
star
of
modern
biology,
holding
key
precision
therapeutics,
genetic
engineering,
evolutionary
origins,
and
our
understanding
fundamental
cellular
processes.
Yet
is
mysterious
it
prolific,
serving
store,
a
messenger,
catalyst,
spanning
many
underchar-acterized
functional
structural
classes.
Deciphering
language
important
not
only
for
mechanistic
its
biological
functions
but
also
accelerating
drug
design.
Toward
this
goal,
we
introduce
AIDO.RNA,
pre-trained
module
AI-driven
Digital
Organism
[1].
AIDO.RNA
contains
scale
1.6
billion
parameters,
trained
on
42
million
non-coding
(ncRNA)
sequences
at
single-nucleotide
resolution,
achieves
state-of-the-art
performance
comprehensive
set
tasks,
including
structure
prediction,
regulation,
molecular
function
across
species,
sequence
after
domain
adaptation
learns
model
essential
parts
protein
translation
that
models,
which
have
received
widespread
attention
recent
years,
do
not.
More
broadly,
hints
generality
modeling
ability
leverage
central
dogma
improve
biomolecular
representations.
Models
code
are
available
through
ModelGenerator
https://github.com/genbio-ai/AIDO
Hugging
Face
.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Dec. 31, 2024
ABSTRACT
RNA
molecule
plays
an
essential
role
in
a
wide
range
of
biological
processes.
Gaining
deeper
understanding
their
functions
can
significantly
advance
our
knowledge
life’s
mechanisms
and
drive
the
development
drugs
for
various
diseases.
Recently,
advances
foundation
models
have
enabled
new
approaches
to
engineering,
yet
existing
methods
fall
short
generating
novel
sequences
with
specific
functions.
Here,
we
introduce
RNAGenesis,
model
that
combines
sequence
de
novo
design
through
latent
diffusion.
With
Bert-like
Transformer
encoder
Hybrid
N-Gram
tokenization
encoding,
Query
space
compression,
autoregressive
decoder
generation,
RNAGenesis
reconstructs
from
learned
representations.
Specifically
score-based
denoising
diffusion
is
trained
capture
distribution
sequences.
outperforms
current
understanding,
achieving
best
results
9
13
benchmarks
(especially
structure
prediction),
further
excels
designing
natural-like
aptamers
optimized
CRISPR
sgRNAs
desirable
properties.
Our
work
establishes
as
powerful
tool
RNA-based
therapeutics
biotechnology.