Good
unit
tests
play
a
paramount
role
when
it
comes
to
foster
and
evaluate
software
quality.
However,
writing
effective
is
an
extremely
costly
time
consuming
practice.
To
reduce
such
burden
for
developers,
researchers
devised
ingenious
techniques
automatically
generate
test
suite
existing
code
bases.
Nevertheless,
how
generated
cases
fare
against
manually
written
ones
open
research
question.
In
2008,
Bacchelli
et.al.
conducted
initial
case
study
comparing
automatic
suites.
Since
in
the
last
ten
years
we
have
witnessed
huge
amount
of
work
on
novel
approaches
tools
generation,
this
paper
revise
their
using
current
as
well
complementing
method
by
evaluating
these
tools'
ability
finding
regressions.
Preprint
[https://doi.org/10.5281/zenodo.2595232],
dataset
[https://doi.org/10.6084/m9.figshare.7628642].
ACM Transactions on Software Engineering and Methodology,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Jan. 27, 2025
Given
the
increasing
adoption
of
modern
AI-enabled
control
systems,
ensuring
their
safety
and
reliability
has
become
a
critical
task
in
software
testing.
One
prevalent
approach
to
testing
systems
is
falsification,
which
aims
find
an
input
signal
that
causes
system
violate
formal
specification
using
optimization
algorithms.
However,
applying
falsification
poses
two
significant
challenges:
(1)
it
requires
execute
numerous
candidate
test
inputs,
can
be
time-consuming,
particularly
for
with
AI
models
have
many
parameters,
(2)
multiple
requirements
are
typically
defined
as
conjunctive
specification,
difficult
existing
approaches
comprehensively
cover.
This
paper
introduces
Synthify
,
framework
tailored
i.e.,
equipped
controllers.
Our
performs
two-phase
process.
At
start,
synthesizes
program
implements
one
or
few
linear
controllers
serve
proxy
controller.
mimics
controller's
functionality
but
computationally
more
efficient.
Then,
employs
\(\epsilon\)
-greedy
strategy
sample
promising
sub-specification
from
specification.
It
then
uses
Simulated
Annealing-based
algorithm
violations
sampled
system.
To
evaluate
we
compare
PSY-TaLiRo
state-of-the-art
industrial-strength
tool,
on
8
publicly
available
systems.
On
average,
achieves
83.5%
higher
success
rate
compared
same
budget
trials.
Additionally,
our
method
12.8
\(\times\)
faster
finding
single
violation
than
baseline.
The
found
by
also
diverse
those
covering
137.7%
sub-specifications.
Generating
unit
tests
automatically
saves
time
over
writing
manually
and
can
lead
to
higher
code
coverage.
However,
generated
are
usually
not
based
on
realistic
scenarios,
therefore
generally
considered
be
less
readable.
This
places
a
question
mark
their
practical
value:
Every
test
fails,
developer
has
decide
whether
this
failure
revealed
regression
fault
in
the
program
under
test,
or
itself
needs
updated.
Does
fact
that
harder
read
outweigh
time-savings
gained
by
automated
generation,
render
them
more
of
hindrance
than
help
for
software
maintenance?
In
order
answer
question,
we
performed
an
empirical
study
which
participants
were
presented
with
written
failing
asked
identify
fix
cause
failure.
Our
experiment
two
replications
resulted
total
150
data
points
75
participants.
Whilst
maintenance
activities
take
longer
when
working
tests,
found
developers
equally
effective
tests.
implications
how
generation
is
best
used
practice,
it
indicates
need
research
into
Journal of Software Evolution and Process,
Journal Year:
2019,
Volume and Issue:
31(9)
Published: March 8, 2019
Abstract
Software
testing
is
crucial
in
continuous
integration
(CI).
Ideally,
at
every
commit,
all
the
test
cases
should
be
executed,
and
moreover,
new
generated
for
source
code.
This
especially
true
a
Continuous
Test
Generation
(CTG)
environment,
where
automatic
generation
of
integrated
into
pipeline.
In
this
context,
developers
want
to
achieve
certain
minimum
level
coverage
software
build.
However,
executing
and,
generating
ones
classes
commit
not
feasible.
As
consequence,
have
select
which
subset
has
tested
and/or
targeted
by
test‐case
generation.
We
argue
that
knowing
priori
branch
can
achieved
with
test‐data
tools
help
taking
informed
decision
about
those
issues.
paper,
we
investigate
possibility
use
source‐code
metrics
predict
tools.
four
different
categories
features
assess
prediction
on
large
data
set
involving
more
than
3'000
Java
classes.
compare
machine
learning
algorithms
conduct
fine‐grained
feature
analysis
aimed
investigating
factors
most
impact
accuracy.
Moreover,
extend
our
investigation
search
budgets.
Our
evaluation
shows
best
model
achieves
an
average
0.15
0.21
MAE
nested
cross‐validation
over
budgets,
respectively,
EVOSUITE
RANDOOP
.
Finally,
discussion
results
demonstrate
relevance
coupling‐related
Unit
testing
is
reported
as
one
of
the
skills
that
graduating
students
lack,
yet
it
an
essential
skill
for
professional
software
developers.
Understanding
challenges
face
during
can
help
inform
practices
education.
To
end,
we
conduct
exploratory
study
to
reveal
students'
perceptions
unit
and
encounter
when
practicing
testing.
We
surveyed
54
from
two
universities
gave
them
tasks,
involving
black-box
test
design
white-box
implementation.
For
used
projects
prior
work
in
studying
test-first
development
among
quantitatively
analyzed
survey
responses
code
properties,
qualitatively
identified
mistakes
smells
code.
further
report
on
our
experience
running
this
with
students.
Unit
testing
is
an
essential
part
of
the
software
development
process,
which
helps
to
identify
issues
with
source
code
in
early
stages
and
prevent
regressions.
Machine
learning
has
emerged
as
viable
approach
help
developers
generate
automated
unit
tests.
However,
generating
reliable
test
cases
that
are
semantically
correct
capable
catching
bugs
or
unintended
behavior
via
machine
requires
large,
metadata-rich,
datasets.
In
this
paper
we
present
Methods2Test:
a
supervised
dataset
mapped
corresponding
methods
under
(i.e.,
focal
methods).
This
contains
780,944
pairs
JUnit
tests
methods,
extracted
from
total
91,385
Java
open
projects
hosted
on
GitHub
licenses
permitting
re-distribution.
The
main
challenge
behind
creation
Methods2Test
was
establish
mapping
between
case
relevant
method.
To
aim,
designed
set
heuristics,
based
developers'
best
practices
testing,
likely
method
for
given
case.
facilitate
further
analysis,
store
rich
metadata
each
method-test
pair
JSON-formatted
files.
Additionally,
extract
textual
corpus
at
different
context
levels,
provide
both
raw
tokenized
forms,
order
enable
researchers
train
evaluate
models
Automated
Test
Generation.
publicly
available
at:
https://github.com/microsoft/methods2test
Empirical Software Engineering,
Journal Year:
2022,
Volume and Issue:
27(4)
Published: May 2, 2022
Abstract
Automatically
generating
test
cases
for
software
has
been
an
active
research
topic
many
years.
While
current
tools
can
generate
powerful
regression
or
crash-reproducing
cases,
these
are
often
kept
separately
from
the
maintained
suite.
In
this
paper,
we
leverage
developer’s
familiarity
with
amplified
existing,
manually
written
developer
tests.
Starting
issues
reported
by
developers
in
previous
studies,
investigate
what
aspects
important
to
design
a
developer-centric
amplification
approach,
that
provides
taken
over
into
their
We
conduct
16
semi-structured
interviews
supported
our
prototypical
designs
of
approach
and
corresponding
exploration
tool.
extend
tool
DSpot,
easier
understand.
Our
IntelliJ
plugin
TestCube
"Image
missing"
empowers
explore
familiar
environment.
From
interviews,
gather
52
observations
summarize
23
result
categories
give
two
key
recommendations
on
how
future
designers
make
better
suited
amplification.
2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER),
Journal Year:
2022,
Volume and Issue:
10, P. 1167 - 1174
Published: March 1, 2022
The
readability
of
software
code
is
a
key
success
criterion
for
understanding
and
maintaining
systems
tests.
In
industry
practice,
limited
number
guidelines
aim
improving
assessing
the
(test)
code.
Although
several
studies
focus
on
investigating
code,
we
observed
research
work
that
focuses
test
this
paper
systematically
characteristics,
factors,
assessment
criteria
have
an
impact
We
build
Systematic
Mapping
Study
(SMS)
to
identify
readability,
legibility,
understandability
support
improve
maintenance
tasks.
result
set
includes
16
further
analysis.
majority
publications
investigations
automatically
generated
(88%),
often
evaluated
with
surveys
access
(44
%).
approaches
at
isolated
combination
different
aspects
within
framework
can
help
better
assess
justify
system
maintenance.
The
impact
of
developers'
experience
on
several
development
practices
has
been
widely
investigated
in
the
past.
One
most
promising
research
fields
is
software
testing,
as
many
researchers
found
significant
correlations
between
and
testing
effectiveness.
In
this
paper,
we
aim
at
further
studying
relation,
by
focusing
how
teams'
associated
with
assertion
density,
i.e.,
number
assertions
per
test
class
KLOC,
that
previously
shown
an
effective
way
to
decrease
fault
density.
We
perform
a
mixed-methods
empirical
study.
First,
devise
statistical
model
relating
other
control
factors
density
classes
belonging
12
projects.
This
enables
us
investigate
whether
comes
out
statistically
factor
explain
Second,
contrast
findings
survey
study
conducted
57
developers,
who
were
asked
their
opinions
developer's
related
they
add
code.
Our
suggest
existence
relationship:
one
hand,
team's
systems
have
investigated;
developers
confirm
importance
team
composition
for
production