Open-source
software
(OSS)
projects
rely
on
core
and
peripheral
developers
to
develop,
release,
maintain
software.
The
former
group
plays
a
crucial
role
in
initiating
the
project
making
key
decisions,
while
latter
contributes
less
frequently
has
little
decision-making
power.
Prior
studies
have
explored
relationship
between
developer
experience
test
code
quality.
However,
there
is
limited
empirical
evidence
regarding
survivability
of
smells
during
evolution
maintenance.
In
this
study,
we
investigate
developers'
case
refactorings
OSS
projects.
We
empirically
studied
four
Java
projects,
which
identified
using
manual
automated
approaches
analyzed
authorship
insertion
removal
smells.
Our
findings
reveal
that
are
commonly
inserted
class
creation,
10.39%
them
removed
366
2,911
days.
While
remove
more
smells,
different
types
Software Testing Verification and Reliability,
Journal Year:
2022,
Volume and Issue:
33(3)
Published: Dec. 20, 2022
Summary
Researchers
and
practitioners
have
designed
implemented
various
automated
test
case
generators
to
support
effective
software
testing.
Such
exist
for
languages
(e.g.,
Java,
C#,
or
Python)
platforms
desktop,
web,
mobile
applications).
The
exhibit
varying
effectiveness
efficiency,
depending
on
the
testing
goals
they
aim
satisfy
unit‐testing
of
libraries
versus
system‐testing
entire
applications)
underlying
techniques
implement.
In
this
context,
need
be
able
compare
different
identify
most
suited
one
their
requirements,
while
researchers
seek
future
research
directions.
This
can
achieved
by
systematically
executing
large‐scale
evaluations
generators.
However,
such
empirical
is
not
trivial
requires
substantial
effort
select
appropriate
benchmarks,
setup
evaluation
infrastructure,
collect
analyse
results.
Software
Note,
we
present
our
JUnit
Generation
Benchmarking
Infrastructure
(
JUGE
)
supporting
(search‐based,
random‐based,
symbolic
execution,
etc.)
seeking
automate
production
unit
tests
purposes
(validation,
regression
testing,
fault
localization,
etc.).
primary
goal
reduce
overall
benchmarking
effort,
ease
comparison
several
generators,
enhance
knowledge
transfer
between
academia
industry
standardizing
process.
Since
2013,
editions
a
tool
competition,
co‐located
with
Search‐Based
Testing
Workshop,
taken
place
where
was
used
evolved.
As
result,
an
increasing
amount
tools
(over
10)
from
been
evaluated
,
matured
over
years,
allowed
identification
Based
experience
gained
competitions,
discuss
expected
impact
in
improving
approaches
generation
industry.
Indeed,
infrastructure
demonstrated
implementation
design
that
flexible
enough
enable
integration
additional
tools,
which
practical
developers
allows
experiment
new
advanced
approaches.
Image
captioning
(IC)
systems
aim
to
generate
a
text
description
of
the
salient
objects
in
an
image.
In
recent
years,
IC
have
been
increasingly
integrated
into
our
daily
lives,
such
as
assistance
for
visually-impaired
people
and
generation
Microsoft
Powerpoint.
However,
even
cutting-edge
(e.g.,
Azure
Cognitive
Services)
algorithms
OFA)
could
produce
erroneous
captions,
leading
incorrect
important
objects,
misunderstanding,
threats
personal
safety.
The
existing
testing
approaches
either
fail
handle
complex
form
system
output
(i.e.,
sentences
natural
language)
or
unnatural
images
test
cases.
To
address
these
problems,
we
introduce
Recursive
Object
MElting
(Rome),
novel
metamorphic
approach
validating
systems.
Different
from
that
cases
by
inserting
which
easily
make
generated
unnatural,
Rome
melts
remove
inpaint)
objects.
assumes
object
set
caption
image
includes
after
melting.
Given
image,
can
recursively
its
different
pairs
images.
We
use
one
widely-adopted
API
four
state-of-the-art
(SOTA)
algorithms.
results
show
look
much
more
than
SOTA
they
achieve
comparable
naturalness
original
Meanwhile,
generating
using
226
seed
images,
reports
total
9,121
issues
with
high
precision
(86.47%-92.17%).
addition,
further
utilize
retrain
Oscar,
improves
performance
across
multiple
evaluation
metrics.
Unit
testing
is
a
vital
part
of
the
software
development
process
and
involves
developers
writing
code
to
verify
or
assert
production
code.
Furthermore,
help
comprehend
test
case
troubleshoot
issues,
have
option
provide
message
that
explains
reason
for
assertion
failure.
In
this
exploratory
empirical
study,
we
examine
characteristics
messages
contained
in
methods
20
open-source
Java
systems.
Our
findings
show
while
rarely
utilize
supplying
message,
those
who
do,
either
compose
it
only
string
literals,
identifiers,
combination
both
types.
Using
standard
English
readability
measuring
techniques,
observe
beginner's
knowledge
required
understand
containing
4th
-grade
education
level
composed
literals.
We
also
discuss
shortcomings
with
using
such
techniques
common
anti-patterns
construction.
envision
our
results
incorporated
into
quality
tools
appraise
understandability
messages.
Tests
play
a
crucial
role
in
software
development
by
ensuring
code
quality.
However,
test
can
suffer
from
“smells”
—
poor
implementation
choices
that
hinder
maintainability
and
evolution.
Numerous
studies
have
addressed
smells
various
programming
languages,
proposing
tools
for
detecting
them
Java,
C++,
Scala,
others.
These
employ
techniques
such
as
information
retrieval,
metrics
analysis,
abstract
syntax
tree
(AST)
parsing.
their
focus
on
specific
languages
limits
generalizability
applicability
to
other
frameworks.
This
challenge
is
similar
issues
found
smell
detection
static
analysis.
Therefore,
this
work
proposes
language-agnostic
approach
detect
smells.
Our
leverages
AST
parsing
extract
relevant
the
code,
followed
based
extracted
data.
method
aims
facilitate
of
across
frameworks,
enhancing
tool’s
usability.
To
check
viability
our
approach,
we
created
proof
concept
using
two
different
languages.
Diverse
studies
have
analyzed
the
quality
of
automatically
generated
test
cases
by
using
smells
as
main
attribute.
But
recent
work
reported
that
tests
might
suffer
from
a
number
issues
not
considered
previously,
thus
suggesting
all
been
identified
yet.
Little
is
known
about
these
and
their
frequency
within
tests.
In
this
paper,
we
report
on
manual
analysis
an
external
dataset
consisting
2,340
This
aimed
at
detecting
new
issues,
covered
past
recognized
smells.
We
use
thematic
to
group
categorize
found.
As
result,
propose
taxonomy
13
grouped
in
four
categories.
also
present
eight
recommendations
generators
may
consider
improve
usefulness
additional
contribution,
our
results
suggest
(i)
should
be
evaluated
only
themselves,
but
considering
tested
code;
(ii)
flaws
are
unlikely
found
manually
created
require
specific
checking
tools.
Automated
test
generation
tools,
such
as
EvoSuite,
typically
aim
to
generate
tests
that
maximize
code
coverage
and
do
not
adequately
consider
non-coverage
aspects
may
be
relevant
for
developers,
e.g.,
test's
quality.
Hence,
automatically
generated
are
often
affected
by
test-specific
bad
programming
practices,
i.e.,
smells,
hinder
the
quality
of
source
and,
ultimately,
under
test.
Although
EvoSuite
uses
secondary
criteria
a
post-processing
procedure
optimize
improve
readability
tests,
it
does
explicitly
usage
good
practices.
Thus,
in
this
paper,
we
propose
novel
approach
assist
EvoSuite's
search
algorithm
generating
smell-free
out
box.
To
aim,
first
compile
set
54
smell
metrics
from
several
sources.
Secondly,
systematically
identify
30
smells
affect
eight
cannot
computed.
Thirdly,
incorporate
remaining
16
into
empirically
only
14
tool
(e.g.,
Indirect
Testing).
Fourthly,
describe
integrate
an
EvoSuite.
Finally,
conduct
empirical
study
(i)
understand
what
extend
default
mechanisms
leads
fewer
smelly
tests.
(ii)
assess
whether
our
And
(iii)
how
affects
fault
detection
effectiveness
Our
results
report
can
8.58%
without
significantly
compromising
their
or
effectiveness.