IEEE Transactions on Software Engineering,
Journal Year:
2019,
Volume and Issue:
47(11), P. 2332 - 2347
Published: Oct. 11, 2019
Automated
test
case
generation
is
an
effective
technique
to
yield
high-coverage
suites.
While
the
majority
of
research
effort
has
been
devoted
satisfying
coverage
criteria,
a
recent
trend
emerged
towards
optimizing
other
non-coverage
aspects.
In
this
regard,
runtime
and
memory
usage
are
two
essential
dimensions:
less
expensive
tests
reduce
resource
demands
for
process
later
regression
testing
phases.
This
study
shows
that
performance-aware
requires
solving
main
challenges:
providing
good
approximation
with
minimal
overhead
avoiding
detrimental
effects
on
both
final
fault
detection
effectiveness.
To
tackle
these
challenges,
we
conceived
set
performance
proxies
--
inspired
by
previous
work
provide
reasonable
estimation
execution
costs
(i.e.,
usage).
Thus,
propose
adaptive
strategy,
called
aDynaMOSA,
which
leverages
extending
DynaMOSA,
state-of-the-art
evolutionary
algorithm
in
unit
testing.
Our
empirical
involving
110
non-trivial
Java
classes
reveals
our
approach
generates
suite
statistically
significant
improvements
(-25%)
heap
consumption
(-15%)
compared
DynaMOSA.
Additionally,
aDynaMOSA
comparable
results
DynaMOSA
over
seven
different
criteria
similar
investigation
also
highlights
without
adaptiveness)
not
sufficient
generate
more
performant
cases
compromising
overall
coverage.
IEEE Transactions on Software Engineering,
Journal Year:
2017,
Volume and Issue:
44(2), P. 122 - 158
Published: Feb. 7, 2017
The
test
case
generation
is
intrinsically
a
multi-objective
problem,
since
the
goal
covering
multiple
targets
(e.g.,
branches).
Existing
search-based
approaches
either
consider
one
target
at
time
or
aggregate
all
into
single
fitness
function
(whole-suite
approach).
Multi
and
many-objective
optimisation
algorithms
(MOAs)
have
never
been
applied
to
this
because
existing
do
not
scale
number
of
coverage
objectives
that
are
typically
found
in
real-world
software.
In
addition,
final
for
MOAs
find
alternative
trade-off
solutions
objective
space,
while
interesting
only
those
cases
more
uncovered
targets.
paper,
we
present
Dynamic
Many-Objective
Sorting
Algorithm
(DynaMOSA),
novel
solver
specifically
designed
address
problem
context
testing.
DynaMOSA
extends
our
previous
technique
(MOSA)
with
dynamic
selection
based
on
control
dependency
hierarchy.
Such
extension
makes
approach
effective
efficient
limited
search
budget.
We
carried
out
an
empirical
study
346
Java
classes
using
three
criteria
(i.e.,
statement,
branch,
strong
mutation
coverage)
assess
performance
respect
whole-suite
(WS),
its
archive-based
variant
(WSA)
MOSA.
results
show
outperforms
WSA
28
percent
branch
(+8
average)
27
(+11
killed
mutants
average).
It
WS
51
statement
coverage,
leading
+11
average.
Moreover,
predecessor
MOSA
19
+8
code
IEEE Transactions on Software Engineering,
Journal Year:
2024,
Volume and Issue:
50(6), P. 1340 - 1359
Published: March 29, 2024
Recent
advancements
in
large
language
models
(LLMs)
have
demonstrated
exceptional
success
a
wide
range
of
general
domain
tasks,
such
as
question
answering
and
following
instructions.
Moreover,
LLMs
shown
potential
various
software
engineering
applications.
In
this
study,
we
present
systematic
comparison
test
suites
generated
by
the
ChatGPT
LLM
state-of-the-art
SBST
tool
EvoSuite.
Our
is
based
on
several
critical
factors,
including
correctness,
readability,
code
coverage,
bug
detection
capability.
By
highlighting
strengths
weaknesses
(specifically
ChatGPT)
generating
unit
cases
compared
to
EvoSuite,
work
provides
valuable
insights
into
performance
solving
problems.
Overall,
our
findings
underscore
pave
way
for
further
research
area.
Automated
unit
test
generation
has
been
extensively
studied
in
the
literature
recent
years.
Previous
studies
on
open
source
systems
have
shown
that
tools
are
quite
effective
at
detecting
faults,
but
how
and
applicable
they
an
industrial
application?
In
this
paper,
we
investigate
question
using
a
life
insurance
pension
products
calculator
engine
owned
by
SEB
Life
&
Pension
Holding
AB
Riga
Branch.
To
study
fault-finding
effectiveness,
extracted
25
real
faults
from
version
history
of
software
project,
applied
two
up-to-date
for
Java,
EVOSUITE
RANDOOP,
which
implement
search-based
feedback-directed
random
generation,
respectively.
Automatically
generated
suites
detected
up
to
56.40%
(EVOSUITE)
38.00%
(RANDOOP)
these
faults.
The
analysis
our
results
demonstrates
challenges
need
be
addressed
order
improve
fault
detection
tools.
particular,
classification
undetected
shows
97.62%
them
depend
either
"specific
primitive
values"
(50.00%)
or
construction
"complex
state
configuration
objects"
(47.62%).
applicability,
surveyed
developers
application
under
their
experience
opinions
about
cases.
This
leads
insights
requirements
academic
prototypes
successful
technology
transfer
research
practice,
such
as
integrate
with
popular
build
tools,
readability
tests.
Information and Software Technology,
Journal Year:
2018,
Volume and Issue:
104, P. 207 - 235
Published: Aug. 22, 2018
Evolutionary
algorithms
have
been
shown
to
be
effective
at
generating
unit
test
suites
optimised
for
code
coverage.
While
many
specific
aspects
of
these
evaluated
in
detail
(e.g.,
length
and
different
kinds
techniques
aimed
improving
performance,
like
seeding),
the
influence
choice
evolutionary
algorithm
has
date
seen
less
attention
literature.
Since
it
is
theoretically
impossible
design
an
that
best
on
all
possible
problems,
a
common
approach
software
engineering
problems
first
try
most
algorithm,
genetic
only
afterwards
refine
or
compare
with
other
see
if
any
them
more
suited
addressed
problem.
The
objective
this
paper
perform
analysis,
order
shed
light
search
applied
generation.
We
empirically
evaluate
thirteen
two
random
approaches
selection
non-trivial
open
source
classes.
All
are
implemented
EvoSuite
generation
tool,
which
includes
recent
optimisations
such
as
use
archive
during
optimisation
multiple
coverage
criteria.
Our
study
shows
makes
clearly
better
than
testing,
confirms
DynaMOSA
many-objective
results
show
can
substantial
performance
whole
suite
optimisation.
Although
we
make
recommendation
practice,
no
superior
cases,
suggesting
future
work
improved
The
name
of
a
unit
test
helps
developers
to
understand
the
purpose
and
scenario
test,
names
support
when
navigating
amongst
sets
tests.
When
tests
are
generated
automatically,
however,
they
tend
be
given
non-descriptive
such
as
"test0",
which
provide
none
benefits
descriptive
can
give
test.
underlying
challenge
is
that
automatically
typically
do
not
represent
real
scenarios
have
no
clear
other
than
covering
code,
makes
naming
them
di
cult.
In
this
paper,
we
present
an
automated
approach
generates
for
by
summarizing
API-level
coverage
goals.
optimized
short,
relation
covered
code
under
allow
uniquely
distinguish
in
suite.
An
empirical
evaluation
with
47
participants
shows
agree
synthesized
names,
equally
manually
written
names.
Study
were
even
more
accurate
faster
at
matching
compared
derived
IEEE Transactions on Software Engineering,
Journal Year:
2018,
Volume and Issue:
46(12), P. 1294 - 1317
Published: Oct. 24, 2018
Software
systems
fail.
These
failures
are
often
reported
to
issue
tracking
systems,
where
they
prioritized
and
assigned
responsible
developers
be
investigated.
When
debug
software,
need
reproduce
the
failure
in
order
verify
whether
their
fix
actually
prevents
from
happening
again.
Since
manually
reproducing
each
could
a
complex
task,
several
automated
techniques
have
been
proposed
tackle
this
problem.
Despite
showing
advancements
area,
showed
various
types
of
limitations.
In
paper,
we
present
EvoCrash,
new
approach
crash
reproduction
based
on
novel
evolutionary
algorithm,
called
Guided
Genetic
Algorithm
(GGA).
We
report
our
empirical
study
using
EvoCrash
54
real-world
crashes,
as
well
results
controlled
experiment,
involving
human
participants,
assess
impact
tests
debugging.
Based
results,
outperforms
state-of-the-art
uncovers
that
undetected
by
classical
coverage-based
unit
test
generation
tools.
addition,
observed
helps
provide
fixes
more
take
less
time
when
debugging,
compared
debugging
fixing
code
without
tests.
Empirical Software Engineering,
Journal Year:
2022,
Volume and Issue:
27(7)
Published: Sept. 20, 2022
Abstract
Test
smells
aim
to
capture
design
issues
in
test
code
that
reduces
its
maintainability.
These
have
been
extensively
studied
and
generally
found
quite
prevalent
both
human-written
automatically
generated
test-cases.
However,
most
evidence
of
prevalence
is
based
on
specific
static
detection
rules.
Although
those
are
the
original,
conceptual
definitions
various
smells,
recent
empirical
studies
indicate
developers
perceive
warnings
raised
by
tools
as
overly
strict
non-representative
maintainability
quality
suites.
This
leads
us
re-assess
smell
tools’
accuracy
investigate
detectability
more
broadly.
Specifically,
we
construct
a
hand-annotated
dataset
spanning
hundreds
suites
written
two
generation
(
EvoSuite
JTExpert
)
performed
multi-stage,
cross-validated
manual
analysis
identify
presence
six
types
these.
We
then
use
this
labeling
benchmark
performance
external
validity
tools—one
widely
used
prior
work
one
recently
introduced
with
express
goal
match
developer
perceptions
smells.
Our
results
primarily
show
current
vocabulary
highly
mismatched
real
concerns:
multiple
were
ubiquitous
developer-written
tests
but
virtually
never
correlated
semantic
or
flaws;
machine-generated
actually
often
scored
better,
reality,
suffered
from
host
problems
not
well-captured
Current
strategies
poorly
characterized
these
suites;
particular,
older
tool’s
misclassified
over
70%
missing
instances
(false
negatives)
marking
many
smell-free
smelly
positives).
common
patterns
can
be
improve
tools,
refine
update
definition
certain
highlight
yet
uncharacterized
issues.
findings
suggest
need
for
(i)
appropriate
metrics
development
practice,
(ii)
accurate
evaluated
industrial
contexts.
ACM Transactions on Computer-Human Interaction,
Journal Year:
2022,
Volume and Issue:
29(4), P. 1 - 44
Published: May 5, 2022
From
automated
customer
support
to
virtual
assistants,
conversational
agents
have
transformed
everyday
interactions,
yet
despite
phenomenal
progress,
no
agent
exists
for
programming
tasks.
To
understand
the
design
space
of
such
an
agent,
we
prototyped
PairBuddy—an
interactive
pair
partner—based
on
research
from
agents,
software
engineering,
education,
human-robot
psychology,
and
artificial
intelligence.
We
iterated
PairBuddy’s
using
a
series
Wizard-of-Oz
studies.
Our
pilot
study
six
programmers
showed
promising
results
provided
insights
toward
interface
design.
second
14
was
positively
praised
across
all
skill
levels.
active
application
soft
skills—adaptability,
motivation,
social
presence—as
navigator
increased
participants’
confidence
trust,
while
its
technical
skills—code
contributions,
just-in-time
feedback,
creativity
support—as
driver
helped
participants
realize
their
own
solutions.
PairBuddy
takes
first
step
towards
Alexa-like
partner.
Empirical Software Engineering,
Journal Year:
2022,
Volume and Issue:
27(2)
Published: Jan. 11, 2022
Abstract
Search-based
test
generation
is
guided
by
feedback
from
one
or
more
fitness
functions—scoring
functions
that
judge
solution
optimality.
Choosing
informative
crucial
to
meeting
the
goals
of
a
tester.
Unfortunately,
many
goals—such
as
forcing
class-under-test
throw
exceptions,
increasing
suite
diversity,
and
attaining
Strong
Mutation
Coverage—
do
not
have
effective
function
formulations.
We
propose
such
requires
treating
identification
secondary
optimization
step.
An
adaptive
algorithm
can
vary
selection
could
adjust
its
throughout
process
maximize
goal
attainment,
based
on
current
population
suites.
To
this
hypothesis,
we
implemented
two
reinforcement
learning
algorithms
in
EvoSuite
unit
framework,
used
these
dynamically
set
during
for
three
identified
above.
evaluated
our
EvoSuiteFIT,
Java
case
examples.
EvoSuiteFIT
techniques
attain
significant
improvements
goals,
show
limited
third
when
number
generations
evolution
fixed.
Additionally,
detects
faults
missed
other
techniques.
The
ability
allows
strategic
choices
efficiently
produce
suites,
examining
offers
insight
into
how
testing
goals.
find
powerful
technique
apply
an
does
already
exist
achieving
goal.