bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Окт. 17, 2024
Abstract
In
biology,
messenger
RNA
(mRNA)
plays
a
crucial
role
in
gene
expression
and
protein
synthesis.
Accurate
predictive
modeling
of
mRNA
properties
can
greatly
enhance
our
understanding
manipulation
biological
processes,
leading
to
advancements
medical
biotechnological
applications.
Utilizing
bio-language
foundation
models
allows
for
leveraging
large-scale
pretrained
knowledge,
which
significantly
improve
the
efficiency
accuracy
these
predictions.
However,
specific
are
notably
limited
posing
challenges
efficient
mRNA-focused
tasks.
contrast,
DNA
modalities
have
numerous
general-purpose
trained
on
billions
sequences.
This
paper
explores
potential
adaptation
existing
Through
experiments
using
various
datasets
curated
from
both
public
domain
internal
proprietary
database,
we
demonstrate
that
pre-trained
be
effectively
transferred
tasks
techniques
such
as
probing,
full-rank,
low-rank
finetuning.
addition,
identify
key
factors
influence
successful
adaptation,
offering
guidelines
when
likely
perform
well
We
further
assess
impact
model
size
efficacy,
finding
medium-scale
often
outperform
larger
ones
cross-modal
knowledge
transfer.
conclude
by
interconnectedness
DNA,
mRNA,
proteins,
outlined
central
dogma
molecular
across
modalities,
enhancing
repertoire
computational
tools
available
analysis.
Article
ML-Based
RNA
Secondary
Structure
Prediction
Methods:
A
Survey
Qi
Zhao
1,
Jingjing
Chen
Zheng
2,
Qian
Mao
3,
Haoxuan
Shi
1
and
Xiaoya
Fan
4,∗
School
of
Medicine
Biological
Information
Engineering,
Northeastern
University,
Shenyang
110000,
China
2
Artificial
Intelligence,
Dalian
Maritime
116000,
3
Department
Food
Science
College
Light
Industry,
Liaoning
4
Software,
University
Technology,
Key
Laboratory
for
Ubiquitous
Network
Service
∗
Correspondence:
[email protected]
Received:
6
May
2024;
Revised:
17
October
Accepted:
22
Published:
29
2024
Abstract:
The
secondary
structure
noncoding
RNAs
(ncRNA)
is
significantly
related
to
their
functions,
emphasizing
the
importance
value
identifying
ncRNA
structure.
Computational
prediction
methods
have
been
widely
used
in
this
field.
However,
performance
existing
computational
has
plateaued
recent
years
despite
various
advancements.
Fortunately,
emergence
machine
learning,
particularly
deep
brought
new
hope
In
review,
we
present
a
comprehensive
overview
learning-based
predicting
structures,
with
particular
emphasis
on
learning
approaches.
Additionally,
discuss
current
challenges
prospects
prediction.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Ноя. 3, 2024
Abstract
Ribonucleic
acid
(RNA)
is
an
important
biomolecule
with
diverse
functions
i.e.
genetic
information
transfer,
regulation
of
gene
expression
and
cellular
functions.
In
recent
years,
the
rapid
development
sequencing
technology
has
significantly
enhanced
our
understanding
RNA
biology
advanced
RNA-based
therapies,
resulting
in
a
huge
volume
data.
Data-driven
methods,
particularly
unsupervised
large
language
models,
have
been
used
to
automatically
hidden
semantic
from
these
Current
models
are
primarily
based
on
Transformer
architecture,
which
cannot
efficiently
process
long
sequences,
while
Mamba
architecture
can
effectively
alleviate
quadratic
complexity
associated
Transformers.
this
study,
we
propose
foundational
model
DGRNA
bidirectional
trained
100
million
demonstrated
exceptional
performance
across
six
downstream
tasks
compared
existing
models.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Дек. 26, 2024
Abstract
DNA
5-methylcytosine
(5mC)
modification
plays
a
pivotal
role
in
many
biological
processes,
yet
5mC
information
and
pattern
hidden
behind
remains
to
be
explored.
Here,
we
develop
Methyl
ation
Language
Model
based
on
Qu
intupl
e
Bidir
ctional
Tra
n
sformer
(MethylQUEEN),
novel
pre-trained
methylation
foundation
model
capable
of
sensing
states
covering
the
genome-wide
landscape.
Through
tailored
methylation-prone
pre-training,
MethylQUEEN
effectively
captured
epigenetics
within
sequences:
it
accurately
traces
DNA’s
tissue-of-origin,
successfully
recovers
expression
profile
through
states.
Integrative
analysis
MethylQUEEN’s
attention
scores
also
enables
us
reveal
unique
status
tissue
for
precise
disease
detection,
identifying
key
regulatory
sites
intervention.
As
result,
signifies
new
paradigm
various
problems.
Besides,
our
study
demonstrates
effectiveness
directly
integrating
into
offering
perspectives
methodologies
range
methylation-related
processes.
It
serves
as
an
initial
exploration
development
more
comprehensive
epigenomic
models.
bioRxiv (Cold Spring Harbor Laboratory),
Год журнала:
2024,
Номер
unknown
Опубликована: Окт. 17, 2024
Abstract
In
biology,
messenger
RNA
(mRNA)
plays
a
crucial
role
in
gene
expression
and
protein
synthesis.
Accurate
predictive
modeling
of
mRNA
properties
can
greatly
enhance
our
understanding
manipulation
biological
processes,
leading
to
advancements
medical
biotechnological
applications.
Utilizing
bio-language
foundation
models
allows
for
leveraging
large-scale
pretrained
knowledge,
which
significantly
improve
the
efficiency
accuracy
these
predictions.
However,
specific
are
notably
limited
posing
challenges
efficient
mRNA-focused
tasks.
contrast,
DNA
modalities
have
numerous
general-purpose
trained
on
billions
sequences.
This
paper
explores
potential
adaptation
existing
Through
experiments
using
various
datasets
curated
from
both
public
domain
internal
proprietary
database,
we
demonstrate
that
pre-trained
be
effectively
transferred
tasks
techniques
such
as
probing,
full-rank,
low-rank
finetuning.
addition,
identify
key
factors
influence
successful
adaptation,
offering
guidelines
when
likely
perform
well
We
further
assess
impact
model
size
efficacy,
finding
medium-scale
often
outperform
larger
ones
cross-modal
knowledge
transfer.
conclude
by
interconnectedness
DNA,
mRNA,
proteins,
outlined
central
dogma
molecular
across
modalities,
enhancing
repertoire
computational
tools
available
analysis.