Abstract
Transformer-based
language
models
are
successfully
used
to
address
massive
text-related
tasks.
DNA
methylation
is
an
important
epigenetic
mechanism,
and
its
analysis
provides
valuable
insights
into
gene
regulation
biomarker
identification.
Several
deep
learning–based
methods
have
been
proposed
identify
methylation,
each
seeks
strike
a
balance
between
computational
effort
accuracy.
Here,
we
introduce
MuLan-Methyl,
learning
framework
for
predicting
sites,
which
based
on
5
popular
transformer-based
models.
The
identifies
sites
3
different
types
of
methylation:
N6-adenine,
N4-cytosine,
5-hydroxymethylcytosine.
Each
the
employed
adapted
task
using
“pretrain
fine-tune”
paradigm.
Pretraining
performed
custom
corpus
fragments
taxonomy
lineages
self-supervised
learning.
Fine-tuning
aims
at
status
type.
collectively
predict
status.
We
report
excellent
performance
MuLan-Methyl
benchmark
dataset.
Moreover,
argue
that
model
captures
characteristic
differences
species
relevant
methylation.
This
work
demonstrates
can
be
applications
in
biological
sequence
joint
utilization
improves
performance.
Mulan-Methyl
open
source,
provide
web
server
implements
approach.
International Journal of Molecular Sciences,
Год журнала:
2025,
Номер
26(6), С. 2468 - 2468
Опубликована: Март 10, 2025
In
recent
years,
many
approved
drugs
have
been
discovered
using
phenotypic
screening,
which
elaborates
the
exact
mechanisms
of
action
or
molecular
targets
drugs.
Drug
susceptibility
prediction
is
an
important
type
screening.
Large-scale
pharmacogenomics
studies
provided
us
with
large
amounts
drug
sensitivity
data.
By
analyzing
these
data
computational
methods,
we
can
effectively
build
models
to
predict
susceptibility.
However,
due
differences
in
distribution
among
databases,
researchers
cannot
directly
utilize
from
multiple
sources.
this
study,
propose
a
deep
transfer
learning
model.
We
integrate
genomic
characterization
cancer
cell
lines
chemical
information
on
compounds,
combined
Encyclopedia
Cancer
Cell
Lines
(CCLE)
and
Genomics
Sensitivity
(GDSC)
datasets,
through
domain-adapted
approach
half-maximal
inhibitory
concentrations
(IC50
values).
Afterward,
validity
results
our
model
verified.
This
study
addresses
challenge
cross-database
discrepancies
by
integrating
multi-source
heterogeneous
constructing
serves
as
reliable
tool
for
precision
development.
Its
widespread
application
facilitate
optimization
therapeutic
strategies
personalized
medicine
while
also
providing
technical
support
high-throughput
screening
discovery
new
targets.
Nucleic Acids Research,
Год журнала:
2025,
Номер
53(6)
Опубликована: Март 20, 2025
Abstract
Accurate
prediction
of
DNA
methylation
remains
a
challenge.
Identifying
is
important
for
understanding
its
functions
and
elucidating
role
in
gene
regulation
mechanisms.
In
this
study,
we
propose
Methyl-GP,
general
predictor
that
accurately
predicts
three
types
from
sequences.
We
found
the
conservation
sequence
patterns
among
different
species
contributes
to
enhancing
generalizability
model.
By
fine-tuning
language
model
on
dataset
comprising
multiple
with
similar
employing
fusion
module
integrate
embeddings
into
high-quality
comprehensive
representation,
Methyl-GP
demonstrates
satisfactory
predictive
performance
identification.
Experiments
17
benchmark
datasets
(4mC,
5hmC,
6mA)
demonstrate
superiority
over
existing
predictors.
Furthermore,
by
utilizing
attention
mechanism,
have
visualized
learned
model,
which
may
help
us
gain
deeper
across
various
species.
Abstract
Transformer-based
language
models
are
successfully
used
to
address
massive
text-related
tasks.
DNA
methylation
is
an
important
epigenetic
mechanism,
and
its
analysis
provides
valuable
insights
into
gene
regulation
biomarker
identification.
Several
deep
learning–based
methods
have
been
proposed
identify
methylation,
each
seeks
strike
a
balance
between
computational
effort
accuracy.
Here,
we
introduce
MuLan-Methyl,
learning
framework
for
predicting
sites,
which
based
on
5
popular
transformer-based
models.
The
identifies
sites
3
different
types
of
methylation:
N6-adenine,
N4-cytosine,
5-hydroxymethylcytosine.
Each
the
employed
adapted
task
using
“pretrain
fine-tune”
paradigm.
Pretraining
performed
custom
corpus
fragments
taxonomy
lineages
self-supervised
learning.
Fine-tuning
aims
at
status
type.
collectively
predict
status.
We
report
excellent
performance
MuLan-Methyl
benchmark
dataset.
Moreover,
argue
that
model
captures
characteristic
differences
species
relevant
methylation.
This
work
demonstrates
can
be
applications
in
biological
sequence
joint
utilization
improves
performance.
Mulan-Methyl
open
source,
provide
web
server
implements
approach.