PLoS ONE,
Journal Year:
2024,
Volume and Issue:
19(12), P. e0316454 - e0316454
Published: Dec. 31, 2024
Credit
scoring
models
play
a
crucial
role
for
financial
institutions
in
evaluating
borrower
risk
and
sustaining
profitability.
Logistic
regression
is
widely
used
credit
due
to
its
robustness,
interpretability,
computational
efficiency;
however,
predictive
power
decreases
when
applied
complex
or
non-linear
datasets,
resulting
reduced
accuracy.
In
contrast,
tree-based
machine
learning
often
provide
enhanced
performance
but
struggle
with
interpretability.
Furthermore,
imbalanced
class
distributions,
which
are
prevalent
scoring,
can
adversely
impact
model
accuracy
as
the
majority
tends
dominate.
Despite
these
challenges,
research
that
comprehensively
addresses
both
explainability
aspects
within
domain
remains
limited.
This
paper
introduces
Non-pArameTric
oversampling
approach
Explainable
(NATE),
framework
designed
address
challenges
by
combining
techniques
classifiers
enhance
NATE
incorporates
balancing
methods
mitigate
of
data
distributions
integrates
interpretability
features
elucidate
model’s
decision-making
process.
Experimental
results
show
substantially
outperforms
traditional
logistic
classification,
improvements
19.33%
AUC,
71.56%
MCC,
85.33%
F1
Score.
Oversampling
approaches,
particularly
gradient
boosting,
demonstrated
superior
effectiveness
compared
undersampling,
achieving
optimal
metrics
AUC:
0.9649,
MCC:
0.8104,
Score:
0.9072.
Moreover,
enhances
providing
detailed
insights
into
feature
contributions,
aiding
understanding
individual
predictions.
These
findings
highlight
NATE’s
capability
managing
imbalance,
improving
performance,
enhancing
demonstrating
potential
reliable
transparent
tool
applications.
Scientific Reports,
Journal Year:
2025,
Volume and Issue:
15(1)
Published: Jan. 2, 2025
Worldwide,
Cancer
remains
a
significant
health
concern
due
to
its
high
mortality
rates.
Despite
numerous
traditional
therapies
and
wet-laboratory
methods
for
treating
cancer-affected
cells,
these
approaches
often
face
limitations,
including
costs
substantial
side
effects.
Recently
the
selectivity
of
peptides
has
garnered
attention
from
scientists
their
reliable
targeted
actions
minimal
adverse
Furthermore,
keeping
outcomes
existing
computational
models,
we
propose
highly
effective
model
namely,
pACP-HybDeep
accurate
prediction
anticancer
peptides.
In
this
model,
training
are
numerically
encoded
using
an
attention-based
ProtBERT-BFD
encoder
extract
semantic
features
along
with
CTDT-based
structural
information.
k-nearest
neighbor-based
binary
tree
growth
(BTG)
algorithm
is
employed
select
optimal
feature
set
multi-perspective
vector.
The
selected
vector
subsequently
trained
CNN
+
RNN-based
deep
learning
model.
Our
proposed
demonstrated
accuracy
95.33%,
AUC
0.97.
To
validate
generalization
capabilities
our
achieved
accuracies
94.92%,
92.26%,
91.16%
on
independent
datasets
Ind-S1,
Ind-S2,
Ind-S3,
respectively.
efficacy,
reliability
test
establish
it
as
valuable
tool
researchers
in
academia
pharmaceutical
drug
design.
PLoS ONE,
Journal Year:
2025,
Volume and Issue:
20(2), P. e0317396 - e0317396
Published: Feb. 10, 2025
In
recent
years,
the
challenge
of
imbalanced
data
has
become
increasingly
prominent
in
machine
learning,
affecting
performance
classification
algorithms.
This
study
proposes
a
novel
data-level
oversampling
method
called
Cluster-Based
Reduced
Noise
SMOTE
(CRN-SMOTE)
to
address
this
issue.
CRN-SMOTE
combines
for
minority
classes
with
cluster-based
noise
reduction
technique.
approach,
it
is
crucial
that
samples
from
each
category
form
one
or
two
clusters,
feature
conventional
methods
do
not
achieve.
The
proposed
evaluated
on
four
datasets
(ILPD,
QSAR,
Blood,
and
Maternal
Health
Risk)
using
five
metrics:
Cohen’s
kappa,
Matthew’s
correlation
coefficient
(MCC),
F1-score,
precision,
recall.
Results
demonstrate
consistently
outperformed
state-of-the-art
(RN-SMOTE),
SMOTE-Tomek
Link,
SMOTE-ENN
across
all
datasets,
particularly
notable
improvements
observed
QSAR
Risk
indicating
its
effectiveness
enhancing
performance.
Overall,
experimental
findings
indicate
RN-SMOTE
100%
cases,
achieving
average
6.6%
Kappa,
4.01%
MCC,
1.87%
1.7%
2.05%
recall,
setting
SMOTE’s
neighbors’
number
5.
Journal of Chemical Information and Modeling,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Dec. 3, 2024
Inflammation
is
a
biological
response
to
harmful
stimuli,
playing
crucial
role
in
facilitating
tissue
repair
by
eradicating
pathogenic
microorganisms.
However,
when
inflammation
becomes
chronic,
it
leads
numerous
serious
disorders,
particularly
autoimmune
diseases.
Anti-inflammatory
peptides
(AIPs)
have
emerged
as
promising
therapeutic
agents
due
their
high
specificity,
potency,
and
low
toxicity.
identifying
AIPs
using
traditional
vivo
methods
time-consuming
expensive.
Recent
advancements
computational-based
intelligent
models
for
offered
cost-effective
alternative
various
inflammatory
diseases,
owing
selectivity
toward
targeted
cells
with
side
effects.
In
this
paper,
we
propose
novel
computational
model,
namely,
DeepAIPs-Pred,
the
accurate
prediction
of
AIP
sequences.
The
training
samples
are
represented
LBP-PSSM-
LBP-SMR-based
evolutionary
image
transformation
methods.
Additionally,
capture
contextual
semantic
features,
employed
attention-based
ProtBERT-BFD
embedding
QLC
structural
features.
Furthermore,
differential
evolution
(DE)-based
weighted
feature
integration
utilized
produce
multiview
vector.
SMOTE-Tomek
Links
introduced
address
class
imbalance
problem,
two-layer
selection
technique
proposed
reduce
select
optimal
Finally,
self-normalized
bidirectional
temporal
convolutional
networks
(SnBiTCN)
trained
achieving
significant
predictive
accuracy
94.92%
an
AUC
0.97.
generalization
our
model
validated
two
independent
datasets,
demonstrating
higher
performance
improvement
∼2
∼10%
accuracies
than
existing
state-of-the-art
Ind-I
Ind-II,
respectively.
efficacy
reliability
DeepAIPs-Pred
highlight
its
potential
valuable
tool
drug
development
research
academia.
BMC Bioinformatics,
Journal Year:
2024,
Volume and Issue:
25(1)
Published: Nov. 19, 2024
RNA
5-methyluridine
(m5U)
modifications
play
a
crucial
role
in
biological
processes,
making
their
accurate
identification
key
focus
computational
biology.
This
paper
introduces
Deep-m5U,
robust
predictor
designed
to
enhance
the
prediction
of
m5U
modifications.
The
proposed
method,
named
utilizes
hybrid
pseudo-K-tuple
nucleotide
composition
(PseKNC)
for
sequence
formulation,
Shapley
Additive
exPlanations
(SHAP)
algorithm
discriminant
feature
selection,
and
deep
neural
network
(DNN)
as
classifier.
model
was
evaluated
using
two
benchmark
datasets,
i.e.,
Full
Transcript
Mature
mRNA.
Deep-m5U
achieved
overall
accuracies
91.47%
95.86%
mRNA
datasets
with
10-fold
cross-validation,
independent
samples,
attained
92.94%
95.17%
accuracy.
Compared
existing
models,
showed
approximately
5.23%
3.73%
higher
accuracy
on
training
data
3.95%
3.26%
samples
respectively.
reliability
effectiveness
make
it
valuable
tool
scientists
potential
asset
pharmaceutical
design
research.
International Journal of Medical Informatics,
Journal Year:
2025,
Volume and Issue:
195, P. 105806 - 105806
Published: Jan. 23, 2025
Segmentation
models
for
clinical
data
experience
severe
performance
degradation
when
trained
on
a
single
client
from
one
domain
and
distributed
to
other
clients
different
domain.
Federated
Learning
(FL)
provides
solution
by
enabling
multi-party
collaborative
learning
without
compromising
the
confidentiality
of
clients'
private
data.
In
this
paper,
we
propose
cross-domain
FL
method
Weakly
Supervised
Semantic
(FL-W3S)
white
blood
cells
in
microscopic
images.
We
perform
model
training
multiple
with
distributions
obtain
global
aggregated
using
only
image-level
class
labels
semantic
segmentation
cells.
A
multi-class
token
transformer
learns
relationship
between
patch
tokens
during
generates
class-specific
localization
maps
mask
predictions.
To
rectify
maps,
use
patch-level
pairwise
affinity
obtained
patch-to-patch
attention.
evaluate
proposed
two
datasets
domains.
Our
experimental
results
show
that
datasets,
there
is
2.56%
1.39%
increase
over
existing
state-of-the-art
methods.
The
combination
federated
while
preserving
privacy,
alongside
cell
techniques
precise
identification,
enhances
diagnostic
accuracy
personalized
treatment
strategies
applications,
particularly
hematology
pathology.
More
specifically,
it
involves
isolating
smear
further
analysis
such
as
automated
counting,
morphological
analysis,
classification,
disease
diagnosis
monitoring.
PLoS ONE,
Journal Year:
2025,
Volume and Issue:
20(1), P. e0317999 - e0317999
Published: Jan. 27, 2025
As
people’s
material
living
standards
continue
to
improve,
the
types
and
quantities
of
household
garbage
they
generate
rapidly
increase.
Therefore,
it
is
urgent
develop
a
reasonable
effective
method
for
classification.
This
important
resource
recycling
environmental
improvement
contributes
sustainable
development
production
economy.
However,
existing
deep
learning-based
image
classification
models
generally
suffer
from
low
accuracy,
insufficient
robustness,
slow
detection
speed
due
large
number
model
parameters.
To
this
end,
new
proposed,
with
ResNet-50
network
as
core
architecture.
Specifically,
first,
redundancy-weighted
feature
fusion
module
enabling
fully
leverage
valuable
information,
thereby
improving
its
performance.
At
same
time,
filters
out
redundant
information
multi-scale
features,
reducing
Second,
standard
3×3
convolutions
in
are
replaced
depth-separable
convolutions,
significantly
model’s
computational
efficiency
while
preserving
extraction
capability
original
convolutional
structure.
Finally,
address
issue
class
imbalance,
weighting
factor
added
Focal
Loss,
aiming
mitigate
negative
impact
imbalance
on
performance
enhance
robustness.
Experimental
results
TrashNet
dataset
show
that
proposed
effectively
reduces
parameters,
improves
speed,
achieves
an
accuracy
94.13%,
surpassing
vast
majority
waste
models,
demonstrating
solid
practical
value.