IEEE Transactions on Software Engineering,
Journal Year:
2019,
Volume and Issue:
47(11), P. 2332 - 2347
Published: Oct. 11, 2019
Automated
test
case
generation
is
an
effective
technique
to
yield
high-coverage
suites.
While
the
majority
of
research
effort
has
been
devoted
satisfying
coverage
criteria,
a
recent
trend
emerged
towards
optimizing
other
non-coverage
aspects.
In
this
regard,
runtime
and
memory
usage
are
two
essential
dimensions:
less
expensive
tests
reduce
resource
demands
for
process
later
regression
testing
phases.
This
study
shows
that
performance-aware
requires
solving
main
challenges:
providing
good
approximation
with
minimal
overhead
avoiding
detrimental
effects
on
both
final
fault
detection
effectiveness.
To
tackle
these
challenges,
we
conceived
set
performance
proxies
--
inspired
by
previous
work
provide
reasonable
estimation
execution
costs
(i.e.,
usage).
Thus,
propose
adaptive
strategy,
called
aDynaMOSA,
which
leverages
extending
DynaMOSA,
state-of-the-art
evolutionary
algorithm
in
unit
testing.
Our
empirical
involving
110
non-trivial
Java
classes
reveals
our
approach
generates
suite
statistically
significant
improvements
(-25%)
heap
consumption
(-15%)
compared
DynaMOSA.
Additionally,
aDynaMOSA
comparable
results
DynaMOSA
over
seven
different
criteria
similar
investigation
also
highlights
without
adaptiveness)
not
sufficient
generate
more
performant
cases
compromising
overall
coverage.
Generating
unit
tests
automatically
saves
time
over
writing
manually
and
can
lead
to
higher
code
coverage.
However,
generated
are
usually
not
based
on
realistic
scenarios,
therefore
generally
considered
be
less
readable.
This
places
a
question
mark
their
practical
value:
Every
test
fails,
developer
has
decide
whether
this
failure
revealed
regression
fault
in
the
program
under
test,
or
itself
needs
updated.
Does
fact
that
harder
read
outweigh
time-savings
gained
by
automated
generation,
render
them
more
of
hindrance
than
help
for
software
maintenance?
In
order
answer
question,
we
performed
an
empirical
study
which
participants
were
presented
with
written
failing
asked
identify
fix
cause
failure.
Our
experiment
two
replications
resulted
total
150
data
points
75
participants.
Whilst
maintenance
activities
take
longer
when
working
tests,
found
developers
equally
effective
tests.
implications
how
generation
is
best
used
practice,
it
indicates
need
research
into
Journal of Software Evolution and Process,
Journal Year:
2019,
Volume and Issue:
31(9)
Published: March 8, 2019
Abstract
Software
testing
is
crucial
in
continuous
integration
(CI).
Ideally,
at
every
commit,
all
the
test
cases
should
be
executed,
and
moreover,
new
generated
for
source
code.
This
especially
true
a
Continuous
Test
Generation
(CTG)
environment,
where
automatic
generation
of
integrated
into
pipeline.
In
this
context,
developers
want
to
achieve
certain
minimum
level
coverage
software
build.
However,
executing
and,
generating
ones
classes
commit
not
feasible.
As
consequence,
have
select
which
subset
has
tested
and/or
targeted
by
test‐case
generation.
We
argue
that
knowing
priori
branch
can
achieved
with
test‐data
tools
help
taking
informed
decision
about
those
issues.
paper,
we
investigate
possibility
use
source‐code
metrics
predict
tools.
four
different
categories
features
assess
prediction
on
large
data
set
involving
more
than
3'000
Java
classes.
compare
machine
learning
algorithms
conduct
fine‐grained
feature
analysis
aimed
investigating
factors
most
impact
accuracy.
Moreover,
extend
our
investigation
search
budgets.
Our
evaluation
shows
best
model
achieves
an
average
0.15
0.21
MAE
nested
cross‐validation
over
budgets,
respectively,
EVOSUITE
RANDOOP
.
Finally,
discussion
results
demonstrate
relevance
coupling‐related
Unit
testing
is
reported
as
one
of
the
skills
that
graduating
students
lack,
yet
it
an
essential
skill
for
professional
software
developers.
Understanding
challenges
face
during
can
help
inform
practices
education.
To
end,
we
conduct
exploratory
study
to
reveal
students'
perceptions
unit
and
encounter
when
practicing
testing.
We
surveyed
54
from
two
universities
gave
them
tasks,
involving
black-box
test
design
white-box
implementation.
For
used
projects
prior
work
in
studying
test-first
development
among
quantitatively
analyzed
survey
responses
code
properties,
qualitatively
identified
mistakes
smells
code.
further
report
on
our
experience
running
this
with
students.
Unit
testing
is
an
essential
part
of
the
software
development
process,
which
helps
to
identify
issues
with
source
code
in
early
stages
and
prevent
regressions.
Machine
learning
has
emerged
as
viable
approach
help
developers
generate
automated
unit
tests.
However,
generating
reliable
test
cases
that
are
semantically
correct
capable
catching
bugs
or
unintended
behavior
via
machine
requires
large,
metadata-rich,
datasets.
In
this
paper
we
present
Methods2Test:
a
supervised
dataset
mapped
corresponding
methods
under
(i.e.,
focal
methods).
This
contains
780,944
pairs
JUnit
tests
methods,
extracted
from
total
91,385
Java
open
projects
hosted
on
GitHub
licenses
permitting
re-distribution.
The
main
challenge
behind
creation
Methods2Test
was
establish
mapping
between
case
relevant
method.
To
aim,
designed
set
heuristics,
based
developers'
best
practices
testing,
likely
method
for
given
case.
facilitate
further
analysis,
store
rich
metadata
each
method-test
pair
JSON-formatted
files.
Additionally,
extract
textual
corpus
at
different
context
levels,
provide
both
raw
tokenized
forms,
order
enable
researchers
train
evaluate
models
Automated
Test
Generation.
publicly
available
at:
https://github.com/microsoft/methods2test
Empirical Software Engineering,
Journal Year:
2022,
Volume and Issue:
27(4)
Published: May 2, 2022
Abstract
Automatically
generating
test
cases
for
software
has
been
an
active
research
topic
many
years.
While
current
tools
can
generate
powerful
regression
or
crash-reproducing
cases,
these
are
often
kept
separately
from
the
maintained
suite.
In
this
paper,
we
leverage
developer’s
familiarity
with
amplified
existing,
manually
written
developer
tests.
Starting
issues
reported
by
developers
in
previous
studies,
investigate
what
aspects
important
to
design
a
developer-centric
amplification
approach,
that
provides
taken
over
into
their
We
conduct
16
semi-structured
interviews
supported
our
prototypical
designs
of
approach
and
corresponding
exploration
tool.
extend
tool
DSpot,
easier
understand.
Our
IntelliJ
plugin
TestCube
"Image
missing"
empowers
explore
familiar
environment.
From
interviews,
gather
52
observations
summarize
23
result
categories
give
two
key
recommendations
on
how
future
designers
make
better
suited
amplification.
ACM Transactions on Software Engineering and Methodology,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 27, 2025
Given
the
increasing
adoption
of
modern
AI-enabled
control
systems,
ensuring
their
safety
and
reliability
has
become
a
critical
task
in
software
testing.
One
prevalent
approach
to
testing
systems
is
falsification,
which
aims
find
an
input
signal
that
causes
system
violate
formal
specification
using
optimization
algorithms.
However,
applying
falsification
poses
two
significant
challenges:
(1)
it
requires
execute
numerous
candidate
test
inputs,
can
be
time-consuming,
particularly
for
with
AI
models
have
many
parameters,
(2)
multiple
requirements
are
typically
defined
as
conjunctive
specification,
difficult
existing
approaches
comprehensively
cover.
This
paper
introduces
Synthify
,
framework
tailored
i.e.,
equipped
controllers.
Our
performs
two-phase
process.
At
start,
synthesizes
program
implements
one
or
few
linear
controllers
serve
proxy
controller.
mimics
controller's
functionality
but
computationally
more
efficient.
Then,
employs
\(\epsilon\)
-greedy
strategy
sample
promising
sub-specification
from
specification.
It
then
uses
Simulated
Annealing-based
algorithm
violations
sampled
system.
To
evaluate
we
compare
PSY-TaLiRo
state-of-the-art
industrial-strength
tool,
on
8
publicly
available
systems.
On
average,
achieves
83.5%
higher
success
rate
compared
same
budget
trials.
Additionally,
our
method
12.8
\(\times\)
faster
finding
single
violation
than
baseline.
The
found
by
also
diverse
those
covering
137.7%
sub-specifications.
Proceedings of the ACM on software engineering.,
Journal Year:
2025,
Volume and Issue:
2(ISSTA), P. 1234 - 1256
Published: June 22, 2025
Synchronizing
production
and
test
code,
known
as
PT
co-evolution,
is
critical
for
software
quality.
Given
the
significant
manual
effort
involved,
researchers
have
tried
automating
co-evolution
using
predefined
heuristics
machine
learning
models.
However,
existing
solutions
are
still
incomplete.
Most
approaches
only
detect
flag
obsolete
cases,
leaving
developers
to
manually
update
them.
Meanwhile,
may
suffer
from
low
accuracy,
especially
when
applied
real-world
projects.
In
this
paper,
we
propose
ReAccept,
a
novel
approach
leveraging
large
language
models
(LLMs),
retrievalaugmented
generation
(RAG),
dynamic
validation
fully
automate
with
high
accuracy.
ReAccept
employs
an
experience-guided
generate
prompt
templates
identification
subsequent
processes.
After
updating
case,
performs
by
checking
syntax,
verifying
semantics,
assessing
coverage.
If
fails,
leverages
error
messages
iteratively
refine
patch.
To
evaluate
ReAccept's
effectiveness,
conducted
extensive
experiments
dataset
of
537
Java
projects
compared
performance
several
stateof-the-art
methods.
The
evaluation
results
show
that
achieved
accuracy
60.16%
on
correctly
identified
surpassing
state-of-the-art
technique
CEPROT
90%.
These
findings
demonstrate
can
effectively
maintain
improve
overall
quality,
significantly
reduce
maintenance
effort.
2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER),
Journal Year:
2022,
Volume and Issue:
10, P. 1167 - 1174
Published: March 1, 2022
The
readability
of
software
code
is
a
key
success
criterion
for
understanding
and
maintaining
systems
tests.
In
industry
practice,
limited
number
guidelines
aim
improving
assessing
the
(test)
code.
Although
several
studies
focus
on
investigating
code,
we
observed
research
work
that
focuses
test
this
paper
systematically
characteristics,
factors,
assessment
criteria
have
an
impact
We
build
Systematic
Mapping
Study
(SMS)
to
identify
readability,
legibility,
understandability
support
improve
maintenance
tasks.
result
set
includes
16
further
analysis.
majority
publications
investigations
automatically
generated
(88%),
often
evaluated
with
surveys
access
(44
%).
approaches
at
isolated
combination
different
aspects
within
framework
can
help
better
assess
justify
system
maintenance.