Computational and Structural Biotechnology Journal,
Год журнала:
2022,
Номер
20, С. 3503 - 3510
Опубликована: Янв. 1, 2022
Proteins
are
the
executors
of
cellular
physiological
activities,
and
accurate
structural
function
elucidation
crucial
for
refined
mapping
proteins.
As
a
feature
engineering
method,
reduction
amino
acid
composition
is
not
only
an
important
method
protein
structure
analysis,
but
also
opens
broad
horizon
complex
field
machine
learning.
Representing
sequences
with
fewer
types
greatly
reduces
complexity
noise
traditional
in
dimension,
provides
more
interpretable
predictive
models
learning
to
capture
key
features.
In
this
paper,
we
systematically
reviewed
strategy
studies
reduced
(RAA)
alphabets,
summarized
its
main
research
sequence
alignment,
functional
classification,
prediction
properties,
respectively.
end,
gave
comprehensive
analysis
672
RAA
alphabets
from
74
methods.
Diagnostics,
Год журнала:
2023,
Номер
13(14), С. 2465 - 2465
Опубликована: Июль 24, 2023
Heparin-binding
protein
(HBP)
is
a
cationic
antibacterial
derived
from
multinuclear
neutrophils
and
an
important
biomarker
of
infectious
diseases.
The
correct
identification
HBP
great
significance
to
the
study
This
work
provides
first
recognition
framework
based
on
machine
learning
accurately
identify
HBP.
By
using
four
sequence
descriptors,
non-HBP
samples
were
represented
by
discrete
numbers.
inputting
these
features
into
support
vector
(SVM)
random
forest
(RF)
algorithm
comparing
prediction
performances
methods
training
data
independent
test
data,
it
found
that
SVM-based
classifier
has
greatest
potential
model
could
produce
auROC
0.981
±
0.028
10-fold
cross-validation
overall
accuracy
95.0%
data.
As
for
recognition,
will
provide
some
help
diseases
stimulate
further
research
in
related
fields.
IEEE Access,
Год журнала:
2023,
Номер
11, С. 137099 - 137114
Опубликована: Янв. 1, 2023
Mycobacterium
tuberculosis,
a
highly
perilous
pathogen
in
humans,
serves
as
the
causative
agent
of
tuberculosis
(TB),
affecting
nearly
33%
global
population.
With
increasing
prevalence
multidrug-resistant
TB,
there
is
needs
for
novel
and
efficacious
alternative
therapies.
Peptide
therapies
have
emerged
favorable
due
to
its
remarkable
specificity
targeting
effected
cells
without
effecting
healthy
cells.
However,
experimental
identification
anti-tubercular
peptides
(AtbPs)
labor-intensive
costly.
Therefore,
accurate
prediction
AtbPs
has
become
challenging
large
number
peptide
samples.
In
this
paper,
we
propose
an
ensemble
learning
model
enhance
outcomes
by
addressing
limitations
individual
models.
We
formulate
training
samples
utilizing
four
distinct
representation
methods:
AAindex,
Composition/Transition/Distribution,
Dipeptide
Deviation
from
Expected
Mean,
Enhanced
Grouped
Amino
Acid
Composition
numerically
encode
The
feature
vectors
extracted
these
methods
are
fused
develop
compact
vector.
evaluate
rates
using
three
different
classification
models,
employing
both
heterogeneous
vectors.
Furthermore,
capabilities
proposed
predicted
labels
classifiers
implementing
deep
via
genetic
algorithm.
Through
evaluation
on
datasets
independent
datasets,
our
learner
achieves
impressive
accuracies
97.80%,
95.13%,
93.91%,
94.17%,
RD
training,
MD
independent,
respectively.
Our
findings
demonstrate
that
pAtbP-EnC
outperforms
existing
predictors
reporting
approximately
11%
higher
accuracy.
conclude
predictor
will
be
considerable
tool
field
pharmaceutical
design
research
academia.
used
source
code
publicly
available
at
https://github.com/Intelligent-models/pAtbP-EnC2023.
Briefings in Bioinformatics,
Год журнала:
2020,
Номер
22(2), С. 2020 - 2031
Опубликована: Фев. 17, 2020
Abstract
Breast
cancer
is
one
of
the
most
human
malignant
diseases
and
leading
cause
cancer-related
death
in
world.
However,
prognostic
therapeutic
benefits
breast
patients
cannot
be
predicted
accurately
by
current
stratifying
system.
In
this
study,
an
immune-related
score
was
established
22
cohorts
with
a
total
6415
samples.
An
extensive
immunogenomic
analysis
conducted
to
explore
relationships
between
immune
score,
significance,
infiltrating
cells,
genotypes
potential
escape
mechanisms.
Our
revealed
that
promising
biomarker
for
estimating
overall
survival
cancer.
This
associated
important
immunophenotypic
factors,
such
as
mutation
load.
Further
high
scores
exhibited
from
chemotherapy
immunotherapy.
Based
on
these
results,
we
can
conclude
may
useful
tool
prediction
treatment
guidance
Computational and Structural Biotechnology Journal,
Год журнала:
2020,
Номер
18, С. 1084 - 1091
Опубликована: Янв. 1, 2020
N6-methyladenosine
(m6A)
is
the
methylation
of
adenosine
at
nitrogen-6
position,
which
most
abundant
RNA
modification
and
involves
a
series
important
biological
processes.
Accurate
identification
m6A
sites
in
genome-wide
invaluable
for
better
understanding
their
functions.
In
this
work,
an
ensemble
predictor
named
iRNA-m6A
was
established
to
identify
multiple
tissues
human,
mouse
rat
based
on
data
from
high-throughput
sequencing
techniques.
proposed
predictor,
sequences
were
encoded
by
physical-chemical
property
matrix,
mono-nucleotide
binary
encoding
nucleotide
chemical
property.
Subsequently,
these
features
optimized
using
minimum
Redundancy
Maximum
Relevance
(mRMR)
feature
selection
method.
Based
optimal
subset,
best
classification
models
trained
Support
Vector
Machine
(SVM)
with
5-fold
cross-validation
test.
Prediction
results
independent
dataset
showed
that
our
method
could
produce
excellent
generalization
ability.
We
also
user-friendly
webserver
called
can
be
freely
accessible
http://lin-group.cn/server/iRNA-m6A.
This
tool
will
provide
more
convenience
users
studying
different
tissues.
Frontiers in Genetics,
Год журнала:
2021,
Номер
12
Опубликована: Июнь 21, 2021
Exploring
drug–target
interactions
by
biomedical
experiments
requires
a
lot
of
human,
financial,
and
material
resources.
To
save
time
cost
to
meet
the
needs
present
generation,
machine
learning
methods
have
been
introduced
into
prediction
interactions.
The
large
amount
available
drug
target
data
in
existing
databases,
evolving
innovative
computer
technologies,
inherent
characteristics
various
types
made
techniques
mainstream
method
for
interaction
research.
In
this
review,
details
specific
applications
are
summarized,
each
algorithm
analyzed,
issues
that
need
be
further
addressed
explored
future
research
discussed.
aim
review
is
provide
sound
basis
construction
high-performance
models.
Computational and Structural Biotechnology Journal,
Год журнала:
2021,
Номер
19, С. 4123 - 4131
Опубликована: Янв. 1, 2021
Cyclin
proteins
are
capable
to
regulate
the
cell
cycle
by
forming
a
complex
with
cyclin-dependent
kinases
activate
cycle.
Correct
recognition
of
cyclin
could
provide
key
clues
for
studying
their
functions.
However,
sequences
share
low
similarity,
which
results
in
poor
prediction
sequence
similarity-based
methods.
Thus,
it
is
urgent
construct
machine
learning
model
identify
proteins.
This
study
aimed
develop
computational
discriminate
from
non-cyclin
In
our
model,
protein
were
encoded
seven
kinds
features
that
amino
acid
composition,
composition
k-spaced
pairs,
tri
peptide
pseudo
geary
correlation,
normalized
moreau-broto
autocorrelation
and
composition/transition/distribution.
Afterward,
these
optimized
using
analysis
variance
(ANOVA)
minimum
redundancy
maximum
relevance
(mRMR)
incremental
feature
selection
(IFS)
technique.
A
gradient
boost
decision
tree
(GBDT)
classifier
was
trained
on
optimal
features.
Five-fold
cross-validated
showed
would
cyclins
an
accuracy
93.06%
AUC
value
0.971,
higher
than
two
recent
studies
same
data.