Empirical Software Engineering,
Journal Year:
2022,
Volume and Issue:
27(4)
Published: May 2, 2022
Abstract
Automatically
generating
test
cases
for
software
has
been
an
active
research
topic
many
years.
While
current
tools
can
generate
powerful
regression
or
crash-reproducing
cases,
these
are
often
kept
separately
from
the
maintained
suite.
In
this
paper,
we
leverage
developer’s
familiarity
with
amplified
existing,
manually
written
developer
tests.
Starting
issues
reported
by
developers
in
previous
studies,
investigate
what
aspects
important
to
design
a
developer-centric
amplification
approach,
that
provides
taken
over
into
their
We
conduct
16
semi-structured
interviews
supported
our
prototypical
designs
of
approach
and
corresponding
exploration
tool.
extend
tool
DSpot,
easier
understand.
Our
IntelliJ
plugin
TestCube
"Image
missing"
empowers
explore
familiar
environment.
From
interviews,
gather
52
observations
summarize
23
result
categories
give
two
key
recommendations
on
how
future
designers
make
better
suited
amplification.
Search-based
software
testing
(SBST)
generates
high-coverage
test
cases
for
programs
under
with
a
combination
of
case
generation
and
mutation.
SBST's
performance
relies
on
there
being
reasonable
probability
generating
that
exercise
the
core
logic
program
test.
Given
such
cases,
SBST
can
then
explore
space
around
them
to
various
parts
program.
This
paper
explores
whether
Large
Language
Models
(LLMs)
code,
as
OpenAI's
Codex,
be
used
help
exploration.
Our
proposed
algorithm,
CodaMosa,
conducts
until
its
coverage
improvements
stall,
asks
Codex
provide
example
under-covered
functions.
These
examples
redirect
search
more
useful
areas
space.
On
an
evaluation
over
486
benchmarks,
CodaMosa
achieves
statistically
significantly
higher
many
benchmarks
(173
279)
than
it
reduces
(10
4),
compared
LLM-only
baselines.
IEEE Transactions on Software Engineering,
Journal Year:
2023,
Volume and Issue:
50(1), P. 85 - 105
Published: Nov. 28, 2023
Unit
tests
play
a
key
role
in
ensuring
the
correctness
of
software.
However,
manually
creating
unit
is
laborious
task,
motivating
need
for
automation.
Large
Language
Models
(LLMs)
have
recently
been
applied
to
various
aspects
software
development,
including
their
suggested
use
automated
generation
tests,
but
while
requiring
additional
training
or
few-shot
learning
on
examples
existing
tests.
This
paper
presents
large-scale
empirical
evaluation
effectiveness
LLMs
test
without
manual
effort.
Concretely,
we
consider
an
approach
where
LLM
provided
with
prompts
that
include
signature
and
implementation
function
under
test,
along
usage
extracted
from
documentation.
Furthermore,
if
generated
fails,
our
attempts
generate
new
fixes
problem
by
re-prompting
model
failing
error
message.
We
implement
TestPilot
,
adaptive
LLM-based
tool
JavaScript
automatically
generates
methods
given
project's
API.
evaluate
using
OpenAI's
gpt3.5-turbo
25
npm
packages
total
1,684
API
functions.
The
achieve
median
statement
coverage
70.2%
branch
52.8%.
In
contrast,
state-of-the
feedback-directed
technique,
Nessie,
achieves
only
51.3%
25.6%
coverage.
experiments
excluding
parts
information
included
show
all
components
contribute
towards
effective
suites.
also
find
92.8%
's
$\leq$
50%
similarity
(as
measured
normalized
edit
distance),
none
them
being
exact
copies.
Finally,
run
two
LLMs,
older
code-cushman-002StarCoder
which
process
publicly
documented.
Overall,
observed
similar
results
former
(68.2%
coverage),
somewhat
worse
latter
(54.0%
suggesting
influenced
size
set
LLM,
does
not
fundamentally
depend
specific
model.
Software
testing
is
an
essential
part
of
the
software
lifecycle
and
requires
a
substantial
amount
time
effort.
It
has
been
estimated
that
developers
spend
close
to
50%
their
on
code
they
write.
For
these
reasons,
long
standing
goal
within
research
community
(partially)
automate
testing.
While
several
techniques
tools
have
proposed
automatically
generate
test
methods,
recent
work
criticized
quality
usefulness
assert
statements
generate.
Therefore,
we
employ
Neural
Machine
Translation
(NMT)
based
approach
called
Atlas(AuTomatic
Learning
Assert
Statements)
meaningful
for
methods.
Given
method
focal
(i.e.,the
main
under
test),
Atlas
can
predict
statement
assess
correctness
method.
We
applied
thousands
methods
from
GitHub
projects
it
was
able
exact
manually
written
by
in
31%
cases
when
only
considering
top-1
predicted
assert.
When
top-5
statements,
matches
cases.
These
promising
results
hint
potential
ofour
as
(i)
complement
automatic
case
generation
techniques,
(ii)
completion
support
developers,
whocan
benefit
recommended
while
writing
code.
Static
bug
detectors
are
becoming
increasingly
popular
and
widely
used
by
professional
software
developers.
While
most
work
on
focuses
whether
they
find
bugs
at
all,
how
many
false
positives
report
in
addition
to
legitimate
warnings,
the
inverse
question
is
often
neglected:
How
of
all
real-world
do
static
find?
This
paper
addresses
this
studying
results
applying
three
an
extended
version
Defects4J
dataset
that
consists
15
Java
projects
with
594
known
bugs.
To
decide
which
these
tools
detect,
we
use
a
novel
methodology
combines
automatic
analysis
warnings
manual
validation
each
candidate
detected
bug.
The
study
show
that:
(i)
non-negligible
amount
bugs,
(ii)
different
mostly
complementary
other,
(iii)
current
miss
large
majority
studied
A
detailed
missed
shows
some
could
have
been
found
variants
existing
detectors,
while
others
domain-specific
problems
not
match
any
pattern.
These
findings
help
potential
users
such
assess
their
utility,
motivate
outline
directions
for
future
detection,
provide
basis
comparisons
detection
other
finding
techniques,
as
automated
testing.
Unit
testing
represents
the
foundational
basis
of
software
pyramid,
beneath
integration
and
end-to-end
testing.
Automated
researchers
have
proposed
a
variety
techniques
to
assist
developers
in
this
time-consuming
task.
Proceedings of the 44th International Conference on Software Engineering,
Journal Year:
2022,
Volume and Issue:
unknown, P. 163 - 174
Published: May 21, 2022
Unit
testing
could
be
used
to
validate
the
correctness
of
basic
units
software
system
under
test.
To
reduce
manual
efforts
in
conducting
unit
testing,
research
community
has
contributed
with
tools
that
automatically
generate
test
cases,
including
inputs
and
oracles
(e.g.,
assertions).
Recently,
ATLAS,
a
deep
learning
(DL)
based
approach,
was
proposed
assertions
for
on
other
already
written
tests.
Despite
promising,
effectiveness
ATLAS
is
still
limited.
improve
effectiveness,
this
work,
we
make
first
attempt
leverage
Information
Retrieval
(IR)
assertion
generation
propose
an
IR-based
technique
retrieval
retrieved-assertion
adaptation.
In
addition,
integration
approach
combine
our
DL-based
ATLAS)
further
effectiveness.
Our
experimental
results
show
outperforms
state-of-the-art
integrating
can
achieve
higher
accuracy.
convey
important
message
information
competitive
worthwhile
pursue
engineering
tasks
such
as
generation,
should
seriously
considered
by
given
recent
years
solutions
have
been
over-popularly
adopted
tasks.
IEEE Transactions on Quantum Engineering,
Journal Year:
2022,
Volume and Issue:
3, P. 1 - 17
Published: Jan. 1, 2022
As
quantum
computing
is
still
in
its
infancy,
there
an
inherent
lack
of
knowledge
and
technology
to
test
a
program
properly.In
the
classical
realm,
mutation
testing
has
been
successfully
used
evaluate
how
well
program's
suite
detects
seeded
faults
(i.e.,
mutants).In
this
paper,
building
on
definition
syntactically
equivalent
operations,
we
propose
novel
set
operators
generate
mutants
based
qubit
measurements
gates.To
ease
adoption
testing,
further
QMutPy,
extension
well-known
fully
automated
opensource
tool
MutPy.To
QMutPy's
performance,
conducted
case
study
24
real
programs
written
IBM's
Qiskit
library.Furthermore,
show
better
coverage
improvements
assertions
can
increase
suites'
score
quality.QMutPy
proven
be
effective
tool,
providing
insight
into
current
state
tests
improve
them.
Software
testing
is
a
widely
used
technique
to
ensure
the
quality
of
software
systems.
Code
coverage
measures
are
commonly
evaluate
and
improve
existing
test
suites.
Based
on
our
industrial
open
source
studies,
state-of-the-art
code
tools
only
during
unit
integration
due
issues
like
engineering
challenges,
performance
overhead,
incomplete
results.
To
resolve
these
issues,
in
this
paper
we
have
proposed
an
automated
approach,
called
LogCoCo,
estimating
using
readily
available
execution
logs.
Using
program
analysis
techniques,
LogCoCo
matches
logs
with
their
corresponding
paths
estimates
three
different
criteria:
method
coverage,
statement
branch
coverage.
Case
studies
one
system
(HBase)
five
commercial
systems
from
Baidu
show
that:
(1)
results
highly
accurate
(>96%
seven
out
nine
experiments)
under
variety
activities
(unit
testing,
benchmarking);
(2)
can
be
Our
collaborators
at
currently
considering
adopting
use
it
daily
basis.