SSRN Electronic Journal,
Journal Year:
2019,
Volume and Issue:
unknown
Published: Jan. 1, 2019
Continuation
of
article
is
available
at:
https://ssrn.com/abstract=3499202Rigorous
tests
are
being
used
every
day
to
develop
effective
medical
treatments,
drive
consumer
engagement,
and,
more
generally,
discover
what
works.
But
so
far,
rigorous
policy
piloting
—
temporarily
introducing
a
change
in
law
or
order
learn
from
it
using
well-designed
and
well-implemented
methods
has
not
been
widely
because
the
perception
that
experimentation
unfair
possibly
illegal,
difficult,
rare.
This
Essay
draws
upon
case
agency
practice
show
that,
contrary,
pilots
presumptively
legal,
feasible,
increasingly
common,
proceeding
several
steps.
First,
finds
many
kinds
pilots,
including
those
vary
internal
processes,
which
opt-in
unlikely
be
controversial.
review
relevant
cases
suggests
courts
likely
uphold
even
treat
like
members
population
differently,
through
randomization,
when
they
advance
learning.
Further,
experimentation,
by
itself,
create
special
procedural
substantive
hurdles.
Second,
agencies
engaging
range
activities
fill
informational
gaps
policy-
law-
making,
some
simulate
others
effect
variation
on
temporary
basis,
developments
such
as
growth
open
data
making
forms
information
gathering
easier.
It
experience
framework
for
proposing
pilot
identify
steps
would
further
support
use
pilots.
A
companion
online
appendix
applies
this
propose
United
States
Patent
Trademark
Office
(“USPTO”),
building
its
already
strong
tradition
piloting,
could
try
evolve
own
policies
practices
with
respect
patent
quality
(through
robust
vetting
applications
view
non-patent
literature
team/time
examination
demand)
inclusion
innovation
automated
error
correction
addressing
gender
bias
examination).A
"search
first"
give
applicants
option
requesting
all
prior
art
their
provided
"up
front,"
examiner's
initial
searching
entire
specification
rather
than
just
claims,
following
approach
jurisdictions
bifurcate
search
examination.
The
early
certainty
help
make
determinations
about
whether
was
worth
pursuing.
Quality
general,
also
explicitly
measure
robustness
metric,
given
"prior
gap"
between
US
examiners
reviewing
patent.
Piloting
development
"error
detection
technology"
compliance
Act's
Section
112
disclosure
requirements,
previous
examples
adopting
outside
technology,
close
applicant
readiness
gap
essay
documents
smaller
larger
inventors,
enhancing
both
quality.
Finally,
proposes
testing
presence
implicit
award
patents
get
at
root
causes
7-21%
difference
grant
rate
male
female-led
inventions.
Context:
Software
testing
ensures
software
quality,
but
developers
often
disregard
it.
The
use
of
automated
generation
is
pursued
to
reduce
the
consequences
overlooked
test
cases
in
a
project.
Problem:
In
context
Java
programs,
several
tools
can
completely
automate
generating
unit
sets.
Additionally,
studies
are
conducted
offer
evidence
regarding
quality
generated
However,
it
worth
noting
that
these
rely
on
machine
learning
and
other
AI
algorithms
rather
than
incorporating
latest
advancements
Large
Language
Models
(LLMs).
Solution:
This
work
aims
evaluate
tests
by
an
OpenAI
LLM
algorithm,
using
metrics
like
code
coverage
mutation
score.
Method:
For
this
study,
33
programs
used
researchers
field
were
selected.
approach
was
employed
establish
baseline
for
comparison
purposes.
each
program,
sets
automatically,
without
human
interference,
changing
Open
API
parameters.
After
executing
set,
such
as
line
coverage,
score,
success
rate
execution
collected
efficiency
effectiveness
set.
Summary
Results:
Our
findings
revealed
set
demonstrated
similar
performance
across
all
evaluated
aspects
compared
traditional
previous
research.
These
results
particularly
remarkable
considering
simplicity
experiment
fact
did
not
undergo
analysis.
Despite
the
recent
improvements
in
automatic
test
case
generation,
handling
complex
data
structures
as
inputs
is
still
an
open
problem.
Search-based
approaches
can
generate
sequences
of
method
calls
that
instantiate
structured
to
exercise
a
relevant
portion
code,
but
fall
short
building
execute
program
elements
whose
reachability
determined
by
structural
features
input
themselves.
Symbolic
execution
techniques
effectively
handle
inputs,
do
not
identify
through
legal
interfaces.
In
this
paper,
we
propose
new
approach
automatically
cases
for
programs
with
inputs.
We
use
symbolic
path
conditions
characterise
dependencies
between
paths
and
structures,
convert
optimisation
problems
solve
search-based
produce
those
Our
preliminary
results
show
indeed
effective
generating
thus
opening
promising
research
direction.
IEEE Transactions on Software Engineering,
Journal Year:
2018,
Volume and Issue:
46(12), P. 1294 - 1317
Published: Oct. 24, 2018
Software
systems
fail.
These
failures
are
often
reported
to
issue
tracking
systems,
where
they
prioritized
and
assigned
responsible
developers
be
investigated.
When
debug
software,
need
reproduce
the
failure
in
order
verify
whether
their
fix
actually
prevents
from
happening
again.
Since
manually
reproducing
each
could
a
complex
task,
several
automated
techniques
have
been
proposed
tackle
this
problem.
Despite
showing
advancements
area,
showed
various
types
of
limitations.
In
paper,
we
present
EvoCrash,
new
approach
crash
reproduction
based
on
novel
evolutionary
algorithm,
called
Guided
Genetic
Algorithm
(GGA).
We
report
our
empirical
study
using
EvoCrash
54
real-world
crashes,
as
well
results
controlled
experiment,
involving
human
participants,
assess
impact
tests
debugging.
Based
results,
outperforms
state-of-the-art
uncovers
that
undetected
by
classical
coverage-based
unit
test
generation
tools.
addition,
observed
helps
provide
fixes
more
take
less
time
when
debugging,
compared
debugging
fixing
code
without
tests.
IEEE Transactions on Software Engineering,
Journal Year:
2021,
Volume and Issue:
48(7), P. 2295 - 2316
Published: Feb. 9, 2021
Java
projects
are
often
built
on
top
of
various
third-party
libraries.
If
multiple
versions
a
library
exist
the
classpath,
JVM
will
only
load
one
version
and
shadow
others,
which
we
refer
to
as
dependency
conflicts
.
This
would
give
rise
semantic
conflict
(SC)
issues,
if
APIs
referenced
by
project
have
identical
method
signatures
but
inconsistent
semantics
across
loaded
shadowed
SC
issues
difficult
for
developers
diagnose
in
practice,
since
understanding
them
typically
requires
domain
knowledge.
Although
adapting
existing
test
generation
technique
dependency
conflict
Riddle
,
detect
is
feasible,
its
effectiveness
greatly
compromised.
mainly
because
randomly
generates
inputs,
while
require
specific
arguments
tests
be
exposed.
To
address
that,
conducted
an
empirical
study
316
real
understand
characteristics
such
cases
that
can
capture
issues.
Inspired
our
findings,
propose
automated
testing
Sensor
synthesizes
using
ingredients
from
under
trigger
behaviors
with
same
conflicting
versions.
Our
evaluation
results
show
effective
useful:
it
achieved
$Precision$
0.898
notation="LaTeX">$Recall$
0.725
open-source
0.821
industrial
projects;
detected
306
semantic
50
projects,
70.4
percent
had
been
confirmed
bugs,
84.2
fixed
quickly.
IEEE Transactions on Reliability,
Journal Year:
2018,
Volume and Issue:
67(3), P. 771 - 785
Published: June 8, 2018
Automated
test
suite
generation
(ATSG)
is
an
important
topic
in
software
engineering,
with
a
wide
range
of
techniques
and
tools
being
used
academia
industry.
While
their
usefulness
widely
recognized,
due
to
the
labor-intensive
nature
task,
effectiveness
different
automatically
generating
cases
for
systems
not
thoroughly
understood.
Despite
many
studies
introducing
various
ATSG
techniques,
much
remains
be
learned,
however,
about
what
makes
particular
technique
work
well
(or
not)
specific
system.
In
this
paper,
we
seek
answer
question:
“What
features
system
impact
techniques?”
Once
these
are
identified,
can
they
select
most
effective
system?
To
end,
have
implemented
mapping
automation
(META)
tool,
new
framework
that
identifies
suitable
apply
systems.
We
evaluate
on
large
set
open-source
projects
three
techniques.
The
evaluation
indicates
number
methods
class,
coupling
between
object
classes,
response
class
indicative
hard
by
decision
tree
selection
generated
META
has
88%
accuracy,
as
shown
n-fold
cross
validation.
EvoSuite
is
a
search-based
tool
that
automatically
generates
executable
unit
tests
for
Java
code
(JUnit
tests).
This
paper
summarises
the
results
and
experiences
of
EvoSuite's
participation
at
seventh
testing
competition
SBST
2019,
where
achieved
highest
overall
score
(255.43
points)
sixth
time
in
seven
editions
competition.
EvoSuite
is
a
search-based
tool
that
automatically
generates
executable
unit
tests
for
Java
code
(JUnit
tests).
This
paper
summarizes
the
results
and
experiences
of
EvoSuite's
participation
at
eighth
testing
competition
SBST
2020,
where
achieved
highest
overall
score
(406.14
points)
seventh
time
in
eight
editions
competition.
EvoSuite
is
a
search-based
tool
that
automatically
generates
unit
tests
for
Java
code.
This
paper
summarises
the
results
and
experiences
of
EvoSuite's
participation
at
fifth
testing
competition
SBST
2017,
where
achieved
highest
overall
score.
EvoSuite
is
a
search-based
tool
that
automatically
generates
unit
tests
for
Java
code.
This
paper
summarises
the
results
and
experiences
of
EvoSuite's
participation
at
fifth
testing
competition
SBST
2017,
where
achieved
highest
overall
score.
The
test
is
a
mandatory
activity
for
software
quality
assurance.
knowledge
about
the
under
testing
necessary
to
generate
high-quality
cases,
but
execute
more
than
80%
of
its
source
code
not
an
easy
task,
and
demands
in-depth
business
rules
it
implements.
In
this
article,
we
investigate
adequacy,
effectiveness,
cost
manually
generated
sets
versus
automatically
Java
programs.
We
observed
that,
in
general,
manual
determine
higher
statement
coverage
mutation
score
sets.
But
one
interesting
aspect
recognized
that
are
complementary
set.
When
combined
with
automated
sets,
resultant
overcame
10%,
on
average,
when
compared
rates
set,
keeping
reasonable
cost.
Therefore,
advocate
should
concentrate
use
essential
critical
parts
software.