Open-source
software
(OSS)
projects
rely
on
core
and
peripheral
developers
to
develop,
release,
maintain
software.
The
former
group
plays
a
crucial
role
in
initiating
the
project
making
key
decisions,
while
latter
contributes
less
frequently
has
little
decision-making
power.
Prior
studies
have
explored
relationship
between
developer
experience
test
code
quality.
However,
there
is
limited
empirical
evidence
regarding
survivability
of
smells
during
evolution
maintenance.
In
this
study,
we
investigate
developers'
case
refactorings
OSS
projects.
We
empirically
studied
four
Java
projects,
which
identified
using
manual
automated
approaches
analyzed
authorship
insertion
removal
smells.
Our
findings
reveal
that
are
commonly
inserted
class
creation,
10.39%
them
removed
366
2,911
days.
While
remove
more
smells,
different
types
Journal of Software Evolution and Process,
Journal Year:
2023,
Volume and Issue:
36(4)
Published: Jan. 18, 2023
Abstract
Test
smells
are
considered
bad
practices
that
can
reduce
the
test
code
quality,
thus
harming
software
testing
goals
and
maintenance
activities.
Prior
studies
have
investigated
diffusion
of
their
impact
on
maintainability.
However,
we
cannot
directly
compare
outcomes
as
most
them
use
customized
datasets.
In
response,
introduced
TSSM
(Test
Smells
Structural
Metrics)
dataset,
containing
detected
using
JNose
tool
structural
metrics
(test
production
code)
calculated
with
CK
13,703
open‐source
Java
systems
from
GitHub.
addition,
perform
an
empirical
study
to
investigate
relationship
between
a
large‐scale
dataset.
We
split
projects
into
three
clusters
analyze
distribution
smells,
co‐occurrences
among
correlation
code.
The
ratio
smelly
classes
specific
smell
is
similar
clusters,
but
could
observe
significant
difference
in
number
them.
Sleepy
,
Mystery
Guest
Resource
Optimism
rarely
occur
last
two
strongly
correlated,
indicating
those
more
severe
than
others.
Our
results
point
out
moderate
high
complexity,
large
size,
coupling
code,
they
also
negatively
affect
its
quality.
To
support
further
studies,
made
our
dataset
publicly
available.
To
ensure
the
quality
of
a
software
system,
developers
perform
an
activity
known
as
unit
testing,
where
they
write
code
(known
test
cases)
that
verifies
individual
units
make
up
system.
Like
production
code,
cases
are
subject
to
bad
programming
practices,
smells,
hurt
maintenance
activities.
An
essential
part
most
activities
is
program
comprehension
which
involves
reading
understand
its
behavior
fix
issues
or
update
features.
In
this
study,
we
conduct
controlled
experiment
with
96
undergraduate
computer
science
students
investigate
impact
two
common
types
namely
Assertion
Roulette
and
Eager
Test,
on
student's
ability
debug
troubleshoot
case
failures.
Our
findings
show
take
longer
correct
errors
in
when
smells
present
their
associated
cases,
especially
Roulette.
We
envision
our
supporting
academia
better
equipping
knowledge
resources
writing
maintaining
high-quality
cases.
experimental
materials
available
online
1
https://wajdialjedaani.github.io/testsmellstd/
Background:
Test
smells
indicate
potential
problems
in
the
design
and
implementation
of
automated
software
tests
that
may
negatively
impact
test
code
maintainability,
coverage,
reliability.
When
poorly
described,
manual
written
natural
language
suffer
from
related
problems,
which
enable
their
analysis
point
view
smells.
Despite
possible
prejudice
to
manually
tested
products,
little
is
known
about
tests,
results
many
open
questions
regarding
types,
frequency,
harm
language.
Aims:
Therefore,
this
study
aims
contribute
a
catalog
for
tests.
Method:
We
perform
two-fold
empirical
strategy.
First,
an
exploratory
three
systems:
Ubuntu
Operational
System,
Brazilian
Electronic
Voting
Machine,
User
Interface
large
smartphone
manufacturer.
use
our
findings
propose
eight
identification
rules
based
on
syntactical
morphological
text
analysis,
validating
with
24
in-company
engineers.
Second,
using
proposals,
we
create
tool
Natural
Language
Processing
(NLP)
analyze
subject
systems'
results.
Results:
observed
occurrence
A
survey
professionals
showed
80.7%
agreed
definitions
examples.
Our
NLP-based
achieved
precision
92%,
recall
95%,
f-measure
93.5%,
its
execution
evidenced
13,169
occurrences
cataloged
analyzed
systems.
Conclusion:
novel
detection
strategies
better
explore
capabilities
current
NLP
mechanisms
promising
reduced
effort
different
idioms.
ACM Transactions on Software Engineering and Methodology,
Journal Year:
2023,
Volume and Issue:
33(3), P. 1 - 32
Published: Nov. 20, 2023
Test
smell
refers
to
poor
programming
and
design
practices
in
testing
widely
spreads
throughout
software
projects.
Considering
test
smells
have
negative
impacts
on
the
comprehension
maintenance
of
code
even
make
code-under-test
more
defect-prone,
it
thus
has
great
importance
mining,
detecting,
refactoring
them.
Since
Deursen
et
al.
introduced
definition
“test
smell”,
several
studies
worked
discovering
new
from
specifications
practitioners’
experience.
Indeed,
many
bad
are
“observed”
by
developers
during
creating
scripts
rather
than
through
academic
research
discussed
engineering
community
(e.g.,
Stack
Overflow)
[
70
,
94
].
However,
no
prior
explored
discussions,
formally
defined
them
as
types,
analyzed
their
characteristics,
which
plays
a
role
for
knowing
these
avoiding
using
development.
Therefore,
we
pick
up
those
challenges
act
working
systematic
methods
explore
types
one
most
mainstream
developers’
Q&A
platforms,
i.e.,
Overflow.
We
further
investigate
harmfulness
analyze
possible
solutions
eliminating
find
that
some
hard
fix
failed
cases
trace
failing
reasons.
To
exacerbate
matters,
identified
two
pose
risk
accuracy
cases.
Next,
develop
detector
detect
software.
The
is
composed
six
detection
different
types.
These
both
wrapped
with
set
syntactic
rules
based
patterns
extracted
styles.
manually
construct
dataset
seven
popular
Java
projects
evaluate
effectiveness
our
it.
experimental
results
show
achieves
high
performance
precision,
recall,
F1
score.
Then,
utilize
919
real-world
whether
prevalent
practice.
observe
spread
722
out
projects,
demonstrates
they
Finally,
validate
usefulness
practice,
submit
56
issue
reports
53
smells.
Our
achieve
76.4%
acceptance
conducting
sentiment
analysis
replies.
evaluations
confirm
prevalence
practicality
Empirical Software Engineering,
Journal Year:
2024,
Volume and Issue:
29(2)
Published: Feb. 26, 2024
Abstract
Context
The
readability
of
source
code
is
key
for
understanding
and
maintaining
software
systems
tests.
Although
several
studies
investigate
the
code,
there
limited
research
specifically
on
test
related
influence
factors.
Objective
In
this
paper,
we
aim
at
investigating
factors
that
from
an
academic
perspective
based
scientific
literature
sources
complemented
by
practical
views,
as
discussed
in
grey
literature.
Methods
First,
perform
a
Systematic
Mapping
Study
(SMS)
with
focus
Second,
extend
study
reviewing
aspects
understandability.
Finally,
conduct
controlled
experiment
selected
set
cases
to
collect
additional
knowledge
practice.
Results
result
SMS
includes
19
primary
further
analysis.
search
reveals
62
information
readability.
Based
analysis
these
sources,
identified
combined
14
code.
7
were
found
literature,
while
some
mainly
academia
(2)
or
industry
(5)
only
overlap.
practically
relevant
showed
investigated
have
significant
impact
half
cases.
Conclusion
Our
review
interest
consensus
However,
also
practitioners.
For
able
confirm
first
experiment.
Therefore,
see
need
bring
together
viewpoints
achieve
common
view
IEEE Transactions on Software Engineering,
Journal Year:
2024,
Volume and Issue:
50(5), P. 1264 - 1280
Published: March 22, 2024
Test
amplification
makes
systematic
changes
to
existing,
manually
written
tests
provide
complementary
an
automated
test
suite.
We
consider
developer-centric
amplification,
where
the
developer
explores,
judges
and
edits
amplified
before
adding
them
their
maintained
However,
it
is
as
yet
unclear
which
kind
of
selection
editing
steps
developers
take
including
into
In
this
paper
we
conduct
open
source
contribution
study,
amplifying
Java
projects
from
GitHub.
report
deficiencies
observe
in
while
filtering
39
pull
requests
with
tests.
present
a
detailed
analysis
maintainer's
feedback
regarding
proposed
changes,
requested
information,
expressed
judgment.
Our
observations
basis
for
practitioners
informed
decision
on
whether
adopt
amplification.
As
several
are
based
developer's
understanding
test,
conjecture
that
should
invest
supporting
understand
Test
smells
are
coding
issues
that
typically
arise
from
inadequate
practices,
a
lack
of
knowledge
about
effective
testing,
or
deadline
pressures
to
complete
projects.
The
presence
test
can
negatively
impact
the
maintainability
and
reliability
software.
While
there
tools
use
advanced
static
analysis
machine
learning
techniques
detect
smells,
these
often
require
effort
be
used.
This
study
aims
evaluate
capability
Large
Language
Models
(LLMs)
in
automatically
detecting
smells.
We
evaluated
ChatGPT-4,
Mistral
Large,
Gemini
Advanced
using
30
types
across
codebases
seven
different
programming
languages
collected
literature.
ChatGPT-4
identified
21
17
types,
while
detected
15
LLMs
demonstrated
potential
as
valuable
tool
identifying
Block-based
programming
environments
like
Scratch
are
widely
used
in
introductory
courses.
They
facilitate
learning
pivotal
concepts
by
eliminating
syntactical
errors,
but
logical
errors
that
break
the
desired
program
behaviour
nevertheless
possible.
Finding
such
requires
testing,
i.e.,
running
and
checking
its
behaviour.
In
many
environments,
this
step
can
be
automated
providing
executable
tests
as
code;
Scratch,
testing
only
done
manually
invoking
events
through
user
input
observing
rendered
stage.
While
is
arguably
sufficient
for
learners,
lack
of
may
inhibitive
teachers
wishing
to
provide
feedback
on
their
students'
solutions.
order
address
issue,
we
introduce
a
new
category
blocks
enables
creation
tests.
With
these
blocks,
students
alike
create
receive
directly
within
environment
using
familiar
block-based
logic.
To
enable
batch
processing
sets
student
solutions,
extend
interface
with
an
accompanying
test
interface.
We
evaluated
framework
28
who
created
popular
game
subsequently
assess
implementations.
An
overall
accuracy
0.93
teachers'
compared
evaluating
functionality
21
solutions
demonstrates
able
effectively
use
A
subsequent
survey
confirms
consider
approach
useful.