bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Окт. 17, 2024
Abstract
In
biology,
messenger
RNA
(mRNA)
plays
a
crucial
role
in
gene
expression
and
protein
synthesis.
Accurate
predictive
modeling
of
mRNA
properties
can
greatly
enhance
our
understanding
manipulation
biological
processes,
leading
to
advancements
medical
biotechnological
applications.
Utilizing
bio-language
foundation
models
allows
for
leveraging
large-scale
pretrained
knowledge,
which
significantly
improve
the
efficiency
accuracy
these
predictions.
However,
specific
are
notably
limited
posing
challenges
efficient
mRNA-focused
tasks.
contrast,
DNA
modalities
have
numerous
general-purpose
trained
on
billions
sequences.
This
paper
explores
potential
adaptation
existing
Through
experiments
using
various
datasets
curated
from
both
public
domain
internal
proprietary
database,
we
demonstrate
that
pre-trained
be
effectively
transferred
tasks
techniques
such
as
probing,
full-rank,
low-rank
finetuning.
addition,
identify
key
factors
influence
successful
adaptation,
offering
guidelines
when
likely
perform
well
We
further
assess
impact
model
size
efficacy,
finding
medium-scale
often
outperform
larger
ones
cross-modal
knowledge
transfer.
conclude
by
interconnectedness
DNA,
mRNA,
proteins,
outlined
central
dogma
molecular
across
modalities,
enhancing
repertoire
computational
tools
available
analysis.
ACM Computing Surveys,
Год журнала:
2025,
Номер
unknown
Опубликована: Янв. 26, 2025
Large
Language
Models
(LLMs)
have
emerged
as
a
transformative
power
in
enhancing
natural
language
comprehension,
representing
significant
stride
toward
artificial
general
intelligence.
The
application
of
LLMs
extends
beyond
conventional
linguistic
boundaries,
encompassing
specialized
systems
developed
within
various
scientific
disciplines.
This
growing
interest
has
led
to
the
advent
LLMs,
novel
subclass
specifically
engineered
for
facilitating
discovery.
As
burgeoning
area
community
AI
Science,
warrant
comprehensive
exploration.
However,
systematic
and
up-to-date
survey
introducing
them
is
currently
lacking.
In
this
paper,
we
endeavor
methodically
delineate
concept
“scientific
language”,
whilst
providing
thorough
review
latest
advancements
LLMs.
Given
expansive
realm
disciplines,
our
analysis
adopts
focused
lens,
concentrating
on
biological
chemical
domains.
includes
an
in-depth
examination
textual
knowledge,
small
molecules,
macromolecular
proteins,
genomic
sequences,
their
combinations,
analyzing
terms
model
architectures,
capabilities,
datasets,
evaluation.
Finally,
critically
examine
prevailing
challenges
point
out
promising
research
directions
along
with
advances
By
offering
overview
technical
developments
field,
aspires
be
invaluable
resource
researchers
navigating
intricate
landscape
Computers in Biology and Medicine,
Год журнала:
2025,
Номер
188, С. 109845 - 109845
Опубликована: Фев. 20, 2025
In
computational
biology,
accurate
RNA
structure
prediction
offers
several
benefits,
including
facilitating
a
better
understanding
of
functions
and
RNA-based
drug
design.
Implementing
deep
learning
techniques
for
has
led
tremendous
progress
in
this
field,
resulting
significant
improvements
accuracy.
This
comprehensive
review
aims
to
provide
an
overview
the
diverse
strategies
employed
predicting
secondary
structures,
emphasizing
methods.
The
article
categorizes
discussion
into
three
main
dimensions:
feature
extraction
methods,
existing
state-of-the-art
model
architectures,
approaches.
We
present
comparative
analysis
various
models
highlighting
their
strengths
weaknesses.
Finally,
we
identify
gaps
literature,
discuss
current
challenges,
suggest
future
approaches
enhance
performance
applicability
tasks.
provides
deeper
insight
subject
paves
way
further
dynamic
intersection
life
sciences
artificial
intelligence.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Март 17, 2024
Abstract
With
large
amounts
of
unlabeled
RNA
sequences
data
produced
by
high-throughput
sequencing
technologies,
pre-trained
language
models
have
been
developed
to
estimate
semantic
space
molecules,
which
facilities
the
understanding
grammar
language.
However,
existing
overlook
impact
structure
when
modeling
space,
resulting
in
incomplete
feature
extraction
and
suboptimal
performance
across
various
downstream
tasks.
In
this
study,
we
a
model
named
ERNIE-RNA
(
E
nhanced
R
eprese
n
tations
with
base-pa
i
ring
r
e
striction
for
modeling)
based
on
modified
BERT
(Bidirectional
Encoder
Representations
from
Transformers)
incorporating
base-pairing
restriction
no
MSA
(Multiple
Sequence
Alignment)
information.
We
found
that
attention
maps
fine-tuning
are
able
capture
zero-shot
experiment
more
precisely
than
conventional
methods
such
as
fine-tuned
RNAfold
RNAstructure,
suggesting
can
provide
comprehensive
structural
representations.
Furthermore,
achieved
SOTA
(state-of-the-art)
after
tasks,
including
functional
predictions.
summary,
our
provides
general
features
be
widely
effectively
applied
subsequent
research
Our
results
indicate
introducing
key
knowledge-based
prior
information
framework
may
useful
strategy
enhance
other
models.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Июнь 9, 2024
Predicting
the
3D
structure
of
RNA
is
an
ongoing
challenge
that
has
yet
to
be
completely
addressed
despite
continuous
advancements.
structures
rely
on
distances
between
residues
and
base
interactions
but
also
backbone
torsional
angles.
Knowing
angles
for
each
residue
could
help
reconstruct
its
global
folding,
which
what
we
tackle
in
this
work.
This
paper
presents
a
novel
approach
directly
predicting
from
raw
sequence
data.
Our
method
draws
inspiration
successful
application
language
models
various
domains
adapts
them
RNA.
We
have
developed
language-based
model,
RNA-TorsionBERT,
incorporating
better
sequential
pseudo-torsional
only.
Through
extensive
benchmarking,
demonstrate
our
improves
prediction
compared
state-of-the-art
methods.
In
addition,
by
using
predictive
inferred
torsion
angle-dependent
scoring
function,
called
RNA-Torsion-A,
replaces
true
reference
model
prediction.
show
it
accurately
evaluates
quality
near-native
predicted
structures,
terms
pseudo-torsion
angle
values.
work
demonstrates
promising
results,
suggesting
potential
utility
advancing
The
source
code
freely
available
EvryRNA
platform:
https://evryrna.ibisc.univ-evry.fr/evryrna/RNA-TorsionBERT
.
Computational and Structural Biotechnology Journal,
Год журнала:
2025,
Номер
unknown
Опубликована: Март 1, 2025
The
Transformer
is
a
deep
neural
network
based
on
the
self-attention
mechanism,
designed
to
handle
sequential
data.
Given
its
tremendous
advantages
in
natural
language
processing,
it
has
gained
traction
for
other
applications.
As
primary
structure
of
RNA
sequence
nucleotides,
researchers
have
applied
Transformers
predict
secondary
and
tertiary
structures
from
sequences.
number
Transformer-based
models
prediction
tasks
rapidly
increasing
as
they
performed
par
or
better
than
learning
networks,
such
Convolutional
Recurrent
Neural
Networks.
This
article
thoroughly
examines
models.
Through
an
in-depth
analysis
models,
we
aim
explain
how
their
architectural
innovations
improve
performances
what
still
lack.
techniques
continue
evolve,
this
review
serves
both
record
past
achievements
guide
future
avenues.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2023,
Номер
unknown
Опубликована: Дек. 14, 2023
Abstract
RNA-based
medicines
and
RNA-targeting
drugs
are
emerging
as
promising
new
approaches
for
treating
disease.
Optimizing
these
therapeutics
by
naive
experimental
screening
is
a
time-consuming
expensive
process,
while
rational
design
requires
an
accurate
understanding
of
the
structure
function
RNA.
To
address
this
challenge,
we
present
ATOM-1,
first
RNA
foundation
model
trained
on
chemical
mapping
data,
enabled
data
collection
strategies
purposely
developed
machine
learning
training.
Using
small
probe
neural
networks
top
ATOM-1
embeddings,
demonstrate
that
has
rich
internal
representations
Trained
limited
amounts
additional
achieve
state-of-the-art
accuracy
key
prediction
tasks,
suggesting
approach
can
enable
therapies
across
landscape.