Symmetry,
Год журнала:
2023,
Номер
15(3), С. 731 - 731
Опубликована: Март 15, 2023
As
the
most
abundant
RNA
methylation
modification,
N6-methyladenosine
(m6A)
could
regulate
asymmetric
and
symmetric
division
of
hematopoietic
stem
cells
play
an
important
role
in
various
diseases.
Therefore,
precise
identification
m6A
sites
around
genomes
different
species
is
a
critical
step
to
further
revealing
their
biological
functions
influence
on
these
However,
traditional
wet-lab
experimental
methods
for
identifying
are
often
laborious
expensive.
In
this
study,
we
proposed
ensemble
deep
learning
model
called
m6A-BERT-Stacking,
powerful
predictor
detection
tissues
three
species.
First,
utilized
two
encoding
methods,
i.e.,
di
ribonucleotide
index
(DiNUCindex_RNA)
k-mer
word
segmentation,
extract
sequence
features.
Second,
matrices
together
with
original
sequences
were
respectively
input
into
models
parallel
train
sub-models,
namely
residual
networks
convolutional
block
attention
module
(Resnet-CBAM),
bidirectional
long
short-term
memory
(BiLSTM-Attention),
pre-trained
encoder
representations
from
transformers
DNA-language
(DNABERT).
Finally,
outputs
all
sub-models
ensembled
based
stacking
strategy
obtain
final
prediction
through
fully
connected
layer.
The
results
demonstrated
that
m6A-BERT-Stacking
outperformed
existing
same
independent
datasets.
Briefings in Bioinformatics,
Год журнала:
2021,
Номер
22(5)
Опубликована: Янв. 6, 2021
Abstract
Anticancer
peptides
constitute
one
of
the
most
promising
therapeutic
agents
for
combating
common
human
cancers.
Using
wet
experiments
to
verify
whether
a
peptide
displays
anticancer
characteristics
is
time-consuming
and
costly.
Hence,
in
this
study,
we
proposed
computational
method
named
identify
via
deep
representation
learning
features
(iACP-DRLF)
using
light
gradient
boosting
machine
algorithm
features.
Two
kinds
sequence
embedding
technologies
were
used,
namely
soft
symmetric
alignment
unified
(UniRep)
embedding,
both
which
involved
neural
network
models
based
on
long
short-term
memory
networks
their
derived
networks.
The
results
showed
that
use
greatly
improved
capability
discriminate
from
other
peptides.
Also,
UMAP
(uniform
manifold
approximation
projection
dimension
reduction)
SHAP
(shapley
additive
explanations)
analysis
proved
UniRep
have
an
advantage
over
identification.
python
script
pretrained
could
be
downloaded
https://github.com/zhibinlv/iACP-DRLF
or
http://public.aibiochem.net/iACP-DRLF/.
With
the
rapid
development
of
biotechnology,
number
biological
sequences
has
grown
exponentially.
The
continuous
expansion
sequence
data
promotes
application
machine
learning
in
to
construct
predictive
models
for
mining
information.
There
are
many
branches
classification
research.
In
this
review,
we
mainly
focus
on
function
and
modification
based
learning.
Sequence-based
prediction
analysis
basic
tasks
understand
functions
DNA,
RNA,
proteins,
peptides.
However,
there
hundreds
developed
sequences,
quite
varied
specific
methods
seem
dizzying
at
first
glance.
Here,
aim
establish
a
long-term
support
website
(http://lab.malab.cn/~acy/BioseqData/home.html),
which
provides
readers
with
detailed
information
method
download
links
relevant
datasets.
We
briefly
introduce
steps
build
an
effective
model
framework
data.
addition,
brief
introduction
single-cell
sequencing
applications
biology
is
also
included.
Finally,
discuss
current
challenges
future
perspectives
IEEE Journal of Biomedical and Health Informatics,
Год журнала:
2024,
Номер
28(4), С. 2362 - 2372
Опубликована: Янв. 24, 2024
As
a
pivotal
post-transcriptional
modification
of
RNA,
N6-methyladenosine
(m6A)
has
substantial
influence
on
gene
expression
modulation
and
cellular
fate
determination.
Although
variety
computational
models
have
been
developed
to
accurately
identify
potential
m6A
sites,
few
them
are
capable
interpreting
the
identification
process
with
insights
gained
from
consensus
knowledge.
To
overcome
this
problem,
we
propose
deep
learning
model,
namely
M6A-DCR,
by
discovering
regions
for
interpretable
sites.
In
particular,
M6A-DCR
first
constructs
an
instance
graph
each
RNA
sequence
integrating
specific
positions
types
nucleotides.
The
discovery
is
then
formulated
as
clustering
problem
in
light
aggregating
all
graphs.
After
that,
adopts
motif-aware
reconstruction
optimization
learn
high-quality
embeddings
input
sequences,
thus
achieving
sites
end-to-end
manner.
Experimental
results
demonstrate
superior
performance
comparing
it
several
state-of-the-art
models.
consideration
empowers
our
model
make
predictions
at
motif
level.
analysis
cross
validation
through
different
species
tissues
further
verifies
consistency
between
evolutionary
relationships
among
Briefings in Bioinformatics,
Год журнала:
2020,
Номер
22(4)
Опубликована: Сен. 22, 2020
Abstract
Origins
of
replication
sites
(ORIs),
which
refers
to
the
initiative
locations
genomic
DNA
replication,
play
essential
roles
in
process.
Detection
ORIs’
distribution
genome
scale
is
one
key
steps
in-depth
understanding
their
regulation
mechanisms.
In
this
study,
we
presented
a
novel
machine
learning-based
approach
called
Stack-ORI
encompassing
10
cell-specific
prediction
models
for
identifying
ORIs
from
four
different
eukaryotic
species
(Homo
sapiens,
Mus
musculus,
Drosophila
melanogaster
and
Arabidopsis
thaliana).
For
each
model,
employed
12
feature
encoding
schemes
that
cover
nucleic
acid
composition,
position-specific
physicochemical
properties
information.
The
optimal
set
was
identified
individually
developed
respective
baseline
using
eXtreme
Gradient
Boosting
(XGBoost)
classifier.
Subsequently,
predicted
scores
are
integrated
as
vector
train
XGBoost
develop
final
model.
Extensive
experimental
results
show
achieves
significantly
better
performance
compared
with
on
both
training
independent
datasets.
Interestingly,
consistently
outperforms
existing
predictor
all
models,
not
only
but
also
test.
Moreover,
our
provides
necessary
interpretations
help
model
success
by
leveraging
powerful
SHapley
Additive
exPlanation
algorithm,
thus
underlining
most
important
significant
predicting
ORIs.
Nature Communications,
Год журнала:
2021,
Номер
12(1)
Опубликована: Июнь 29, 2021
Abstract
Recent
studies
suggest
that
epi-transcriptome
regulation
via
post-transcriptional
RNA
modifications
is
vital
for
all
types.
Precise
identification
of
modification
sites
essential
understanding
the
functions
and
regulatory
mechanisms
RNAs.
Here,
we
present
MultiRM,
a
method
integrated
prediction
interpretation
from
sequences.
Built
upon
an
attention-based
multi-label
deep
learning
framework,
MultiRM
not
only
simultaneously
predicts
putative
twelve
widely
occurring
transcriptome
(m
6
A,
m
1
5
C,
U,
Am,
7
G,
Ψ,
I,
Cm,
Gm,
Um),
but
also
returns
key
sequence
contents
contribute
most
to
positive
predictions.
Importantly,
our
model
revealed
strong
association
among
different
types
perspective
their
associated
contexts.
Our
work
provides
solution
detecting
multiple
modifications,
enabling
analysis
these
gaining
better
sequence-based
mechanisms.
Computational and Structural Biotechnology Journal,
Год журнала:
2021,
Номер
19, С. 5762 - 5790
Опубликована: Янв. 1, 2021
We
review
the
current
applications
of
artificial
intelligence
(AI)
in
functional
genomics.
The
recent
explosion
AI
follows
remarkable
achievements
made
possible
by
"deep
learning",
along
with
a
burst
"big
data"
that
can
meet
its
hunger.
Biology
is
about
to
overthrow
astronomy
as
paradigmatic
representative
big
data
producer.
This
has
been
huge
advancements
field
high
throughput
technologies,
applied
determine
how
individual
components
biological
system
work
together
accomplish
different
processes.
disciplines
contributing
this
bulk
are
collectively
known
They
consist
studies
of:
i)
information
contained
DNA
(genomics);
ii)
modifications
reversibly
undergo
(epigenomics);
iii)
RNA
transcripts
originated
genome
(transcriptomics);
iv)
ensemble
chemical
decorating
types
(epitranscriptomics);
v)
products
protein-coding
(proteomics);
and
vi)
small
molecules
produced
from
cell
metabolism
(metabolomics)
present
an
organism
or
at
given
time,
physiological
pathological
conditions.
After
reviewing
main
genomics,
we
discuss
important
accompanying
issues,
including
ethical,
legal
economic
issues
importance
explainability.
Bioinformatics,
Год журнала:
2020,
Номер
36(24), С. 5600 - 5609
Опубликована: Дек. 14, 2020
The
Golgi
apparatus
has
a
key
functional
role
in
protein
biosynthesis
within
the
eukaryotic
cell
with
malfunction
resulting
various
neurodegenerative
diseases.
For
better
understanding
of
apparatus,
it
is
essential
to
identification
sub-Golgi
localization.
Although
some
machine
learning
methods
have
been
used
identify
localization
proteins
by
sequence
representation
fusion,
more
accurate
still
challenging
existing
methodology.we
developed
protocol
using
deep
features
107
dimensions.
By
this
protocol,
we
demonstrated
that
instead
multi-type
feature
fusion
as
previous
state-of-the-art
sub-Golgi-protein
classifiers,
sufficient
exploit
only
one
type
for
accurately
proteins.
Compared
independent
testing
results
benchmark
datasets,
our
able
perform
generally,
reliably
and
robustly
prediction.A
use-friendly
webserver
freely
accessible
at
http://isGP-DRLF.aibiochem.net
prediction
code
https://github.com/zhibinlv/isGP-DRLF.Supplementary
data
are
available
Bioinformatics
online.
Computational and Structural Biotechnology Journal,
Год журнала:
2021,
Номер
19, С. 4123 - 4131
Опубликована: Янв. 1, 2021
Cyclin
proteins
are
capable
to
regulate
the
cell
cycle
by
forming
a
complex
with
cyclin-dependent
kinases
activate
cycle.
Correct
recognition
of
cyclin
could
provide
key
clues
for
studying
their
functions.
However,
sequences
share
low
similarity,
which
results
in
poor
prediction
sequence
similarity-based
methods.
Thus,
it
is
urgent
construct
machine
learning
model
identify
proteins.
This
study
aimed
develop
computational
discriminate
from
non-cyclin
In
our
model,
protein
were
encoded
seven
kinds
features
that
amino
acid
composition,
composition
k-spaced
pairs,
tri
peptide
pseudo
geary
correlation,
normalized
moreau-broto
autocorrelation
and
composition/transition/distribution.
Afterward,
these
optimized
using
analysis
variance
(ANOVA)
minimum
redundancy
maximum
relevance
(mRMR)
incremental
feature
selection
(IFS)
technique.
A
gradient
boost
decision
tree
(GBDT)
classifier
was
trained
on
optimal
features.
Five-fold
cross-validated
showed
would
cyclins
an
accuracy
93.06%
AUC
value
0.971,
higher
than
two
recent
studies
same
data.