Detecting
code
smells
through
machine
learning
(ML)
poses
challenges
due
to
its
unbalanced
nature
and
potential
interpretation
bias.
While
previous
studies
focused
on
severity
tended
categorize
smell’s
specific
types,
this
research
aims
detect
classify
smell
in
a
single
dataset
containing
instances
of
four
distinct
types:
God-class,
Data-Class,
Feature-Envy,
Long-Method.
This
study
also
explores
the
impact
applying
data
scaling,
feature
selection
techniques,
ensemble
methods
enhance
ML
models
for
purpose
above.
The
evaluation
two
combined
reveals
that
using
standardization
methods,
Chi-square
outperforms
result
other
combinations,
achieving
81.04%
81.41%
accuracy
XGBoost
CatBoost
models.
Additionally,
algorithm
attains
highest
at
80.67%,
even
without
preprocessing.
Comparatively
with
state-of-the-art,
results
obtained,
an
85%,
by
proposed
approach
detecting
are
promising
suggest
improvements
approaches
techniques
effectiveness
reliability
real-world
scenarios.
Scientific Reports,
Journal Year:
2023,
Volume and Issue:
13(1)
Published: Sept. 27, 2023
Abstract
Detecting
code
smells
may
be
highly
helpful
for
reducing
maintenance
costs
and
raising
source
quality.
Code
facilitate
developers
or
researchers
to
understand
several
types
of
design
flaws.
with
high
severity
can
cause
significant
problems
the
software
challenges
system's
maintainability.
It
is
quite
essential
assess
detected
in
software,
as
it
prioritizes
refactoring
efforts.
The
class
imbalance
problem
also
further
enhances
difficulties
smell
detection.
In
this
study,
four
datasets
(Data
class,
God
Feature
envy,
Long
method)
are
selected
detect
severity.
work,
an
effort
made
address
issue
imbalance,
which,
Synthetic
Minority
Oversampling
Technique
(SMOTE)
balancing
technique
applied.
Each
dataset's
relevant
features
chosen
using
a
feature
selection
based
on
principal
component
analysis.
determined
five
machine
learning
techniques:
K-nearest
neighbor,
Random
forest,
Decision
tree,
Multi-layer
Perceptron,
Logistic
Regression.
This
study
obtained
0.99
accuracy
score
forest
tree
approach
method
smell.
model's
performance
compared
its
three
other
measurements
(Precision,
Recall,
F-measure)
estimate
classification
models.
impact
presented
without
applying
SMOTE.
results
promising
beneficial
paving
way
studies
area.
IEEE Access,
Journal Year:
2024,
Volume and Issue:
12, P. 53664 - 53676
Published: Jan. 1, 2024
(1)
Background:
Code
smell
is
the
most
popular
and
reliable
method
for
detecting
potential
errors
in
code.
In
real-world
circumstances,
a
single
source
code
may
have
multiple
smells.
Multi-label
detection
research
study.
However,
limited
studies
are
available
on
it,
there
need
standardized
classifier
reliably
identifying
various
multi-label
smells
that
belong
to
method-level
category.
The
primary
goal
of
this
study
develop
rule-based
(2)
Methods:
Binary
Relevance,
Label
Powerset,
Classifier
Chain
methods
utilized
with
tree
based
single-label
algorithms,
including
some
ensemble
algorithms
paper.
chi-square
feature
selection
technique
applied
select
relevant
features.
proposed
model
trained
using
10-fold
cross-validation,
Random
Search
cross-validation
parameter
tuning,
different
performance
measures
used
evaluate
model.
(3)
Results:
achieves
99.54%
best
jaccard
accuracy
Decision
Tree.
Tree
incorporating
outperforms
alternative
approaches
classification.
Single-label
classifiers
produced
better
results
after
considering
correlation
factor.
(4)
Conclusion:
This
will
facilitate
scientists
programmers
by
providing
systematic
software
projects
saving
time
effort
during
reviews
problems
simultaneously.
After
smell,
can
create
more
organized,
easier-to-understand,
trustworthy
programs.