Good
unit
tests
play
a
paramount
role
when
it
comes
to
foster
and
evaluate
software
quality.
However,
writing
effective
is
an
extremely
costly
time
consuming
practice.
To
reduce
such
burden
for
developers,
researchers
devised
ingenious
techniques
automatically
generate
test
suite
existing
code
bases.
Nevertheless,
how
generated
cases
fare
against
manually
written
ones
open
research
question.
In
2008,
Bacchelli
et.al.
conducted
initial
case
study
comparing
automatic
suites.
Since
in
the
last
ten
years
we
have
witnessed
huge
amount
of
work
on
novel
approaches
tools
generation,
this
paper
revise
their
using
current
as
well
complementing
method
by
evaluating
these
tools'
ability
finding
regressions.
Preprint
[https://doi.org/10.5281/zenodo.2595232],
dataset
[https://doi.org/10.6084/m9.figshare.7628642].
Code
smells
are
symptoms
of
poor
design
and
implementation
choices
weighing
heavily
on
the
quality
produced
source
code.
During
last
decades
several
code
smell
detection
tools
have
been
proposed.
However,
literature
shows
that
results
these
can
be
subjective
intrinsically
tied
to
nature
approach
detection.
In
a
recent
work
use
Machine-Learning
(ML)
techniques
for
has
proposed,
possibly
solving
issue
tool
subjectivity
giving
learner
ability
discern
between
smelly
non-smelly
elements.
While
this
opened
new
perspective
detection,
it
only
considered
case
where
instances
affected
by
single
type
contained
in
each
dataset
used
train
test
machine
learners.
we
replicate
study
with
different
configuration
containing
more
than
one
smell.
The
reveal
learning
critical
limitations
state
art
which
deserve
further
research.
Code
smells
represent
poor
implementation
choices
performed
by
developers
when
enhancing
source
code.
Their
negative
impact
on
code
maintainability
and
comprehensibility
has
been
widely
shown
in
the
past
several
techniques
to
automatically
detect
them
have
devised.
Most
of
these
are
based
heuristics,
namely
they
compute
a
set
metrics
combine
creating
detection
rules;
while
reasonable
accuracy,
recent
trend
is
represented
use
machine
learning
where
used
as
predictors
smelliness
artefacts.
Despite
advances
field,
there
still
noticeable
lack
knowledge
whether
can
actually
be
more
accurate
than
traditional
heuristic-based
approaches.
To
fill
this
gap,
paper
we
propose
large-scale
study
empirically
compare
performance
machine-learning-based
for
metric-based
smell
detection.
We
consider
five
types
models
with
DECOR,
state-of-the-art
approach.
Key
findings
emphasize
need
further
research
aimed
at
improving
effectiveness
both
heuristic
approaches
detection:
DECOR
generally
achieves
better
baseline,
its
precision
too
low
make
it
usable
practice.
IEEE Transactions on Software Engineering,
Journal Year:
2022,
Volume and Issue:
49(1), P. 44 - 63
Published: Jan. 6, 2022
Software
vulnerabilities
are
weaknesses
in
source
code
that
can
be
potentially
exploited
to
cause
loss
or
harm.
While
researchers
have
been
devising
a
number
of
methods
deal
with
vulnerabilities,
there
is
still
noticeable
lack
knowledge
on
their
software
engineering
life
cycle,
for
example
how
introduced
and
removed
by
developers.
This
information
design
more
effective
vulnerability
prevention
detection,
as
well
understand
the
granularity
at
which
these
should
aim.
To
investigate
cycle
known
we
focus
how,
when,
under
circumstances
contributions
introduction
projects
made,
long,
they
xmlns:xlink="http://www.w3.org/1999/xlink">removed
.
We
consider
3,663
public
patches
from
National
Vulnerability
Database—pertaining
1,096
open-source
GitHub
—and
define
an
eight-step
process
involving
both
automated
parts
(e.g.,
using
procedure
based
SZZ
algorithm
find
vulnerability-contributing
commits)
manual
analyses
were
fixed).
The
investigated
classified
144
categories,
take
average
least
4
contributing
commits
before
being
introduced,
half
them
remain
unfixed
than
one
year.
Most
xmlns:xlink="http://www.w3.org/1999/xlink">contributions
done
developers
high
workload,
often
when
doing
maintenance
activities,
mostly
addition
new
aiming
implementing
further
checks
inputs.
conclude
distilling
practical
implications
detectors
work
assist
timely
identifying
issues.
Software
testing
is
a
key
activity
to
control
the
reliability
of
production
code.
Unfortunately,
effectiveness
test
cases
can
be
threatened
by
presence
faults.
Recent
work
showed
that
static
indicators
exploited
identify
test-related
issues.
In
particular
smells,
i.e.,
sub-optimal
design
choices
applied
developers
when
implementing
cases,
have
been
shown
related
case
effectiveness.
While
some
approaches
for
automatic
detection
smells
proposed
so
far,
they
generally
suffer
poor
performance:
as
consequence,
current
detectors
cannot
properly
provide
support
diagnosing
quality
cases.
this
paper,
we
aim
at
making
step
ahead
toward
automated
devising
novel
textual-based
detector,
coined
TASTE
(Textual
AnalySis
Test
smEll
detection),
with
evaluating
usefulness
textual
analysis
detecting
three
smell
types,
General
Fixture,
Eager
Test,
and
Lack
Cohesion
Methods.
We
evaluate
in
an
empirical
study
involves
manually-built
dataset
composed
494
instances
belonging
12
software
projects,
comparing
capabilities
our
detector
those
two
code
metrics-based
techniques
Van
Rompaey
et
al.
Greiler
Our
results
show
structural-based
existing
most
dataset,
while
up
44%
more
effective.
Finally,
find
structural
different
sets
thereby
indicating
complementarity.
Empirical Software Engineering,
Journal Year:
2020,
Volume and Issue:
25(2), P. 1294 - 1340
Published: Feb. 4, 2020
Abstract
When
identifying
the
origin
of
software
bugs,
many
studies
assume
that
“a
bug
was
introduced
by
lines
code
were
modified
to
fix
it”.
However,
this
assumption
does
not
always
hold
and
at
least
in
some
cases,
these
are
responsible
for
introducing
bug.
For
example,
when
caused
a
change
an
external
API.
The
lack
empirical
evidence
makes
it
impossible
assess
how
important
cases
therefore,
which
extent
is
valid.
To
advance
direction,
better
understand
bugs
“are
born”,
we
propose
model
defining
criteria
identify
first
snapshot
evolving
system
exhibits
This
model,
based
on
perfect
test
idea,
decides
whether
observed
after
software.
Furthermore,
studied
model’s
carefully
analyzing
116
two
different
open
source
projects.
manual
analysis
helped
classify
root
cause
those
created
manually
curated
datasets
with
bug-introducing
changes
any
code.
Finally,
used
evaluate
performance
four
existing
SZZ-based
algorithms
detecting
changes.
We
found
very
accurate,
especially
multiple
commits
found;
F-Score
varies
from
0.44
0.77,
while
percentage
true
positives
exceed
63%.
Our
results
show
prevalent
assumption,
it”,
just
one
case
system.
Finding
what
trivial:
can
be
developers
code,
or
irrespective
Thus,
further
research
towards
understanding
projects
could
help
improve
design
integration
tests
other
procedures
make
development
more
robust.
Flaky
tests
are
software
that
exhibit
a
seemingly
random
outcome
(pass
or
fail)
when
run
against
the
same,
identical
code.
Previous
work
has
examined
fixes
to
flaky
and
proposed
automated
solutions
locate
as
well
fix
tests--we
complement
it
by
examining
perceptions
of
developers
about
nature,
relevance,
challenges
this
phenomenon.
We
asked
21
professional
classify
200
they
previously
fixed,
in
terms
nature
flakiness,
origin
fixing
effort.
analysis
with
information
strategy.
Subsequently,
we
conducted
an
online
survey
121
median
industrial
programming
experience
five
years.
Our
research
shows
that:
The
flakiness
is
due
several
different
causes,
four
which
have
never
been
reported
before,
despite
being
most
costly
fix;
perceived
significant
vast
majority
developers,
regardless
their
team's
size
project's
domain,
can
effects
on
resource
allocation,
scheduling,
reliability
test
suite;
report
face
regard
mostly
reproduction
behavior
identification
cause
for
flakiness.
Data
materials
[https://doi.org/10.5281/zenodo.3265785].
The
name
of
a
unit
test
helps
developers
to
understand
the
purpose
and
scenario
test,
names
support
when
navigating
amongst
sets
tests.
When
tests
are
generated
automatically,
however,
they
tend
be
given
non-descriptive
such
as
"test0",
which
provide
none
benefits
descriptive
can
give
test.
underlying
challenge
is
that
automatically
typically
do
not
represent
real
scenarios
have
no
clear
other
than
covering
code,
makes
naming
them
di
cult.
In
this
paper,
we
present
an
automated
approach
generates
for
by
summarizing
API-level
coverage
goals.
optimized
short,
relation
covered
code
under
allow
uniquely
distinguish
in
suite.
An
empirical
evaluation
with
47
participants
shows
agree
synthesized
names,
equally
manually
written
names.
Study
were
even
more
accurate
faster
at
matching
compared
derived