IEEE Transactions on Software Engineering,
Journal Year:
2022,
Volume and Issue:
49(1), P. 419 - 436
Published: Feb. 16, 2022
Automatic
program
repair
(APR)
offers
significant
potential
for
automating
some
coding
tasks.
Using
APR
could
reduce
the
high
costs
historically
associated
with
fixing
code
faults
and
deliver
benefits
to
software
engineering.
Adopting
also
have
profound
implications
developers'
daily
activities,
transforming
their
work
practices.
To
realise
of
it
is
vital
that
we
consider
how
developers
feel
about
impact
may
on
work.
Developing
tools
without
consideration
developer
likely
undermine
success
deployment.
In
this
paper,
critically
review
are
considered
in
research
by
analysing
human
factors
treated
260
studies
from
Monperrus's
Living
Review
APR.
Over
half
our
were
motivated
a
problem
faced
(e.g.,
difficulty
faults).
Despite
these
human-oriented
motivations,
fewer
than
7%
included
study.
We
looked
detail
at
found
quality
mixed
(for
example,
one
study
was
based
input
only
developer).
Our
results
suggest
often
talked
about
studies,
but
rarely
xmlns:xlink="http://www.w3.org/1999/xlink">with
.
A
more
comprehensive
reliable
understanding
relation
needed.
Without
understanding,
will
be
difficult
develop
techniques
which
integrate
effectively
into
workflows.
recommend
future
agenda
advance
Automated
Program
Repair
(APR)
helps
improve
the
efficiency
of
software
development
and
maintenance.
Recent
APR
techniques
use
deep
learning,
particularly
encoder-decoder
architecture,
to
generate
patches.
Though
existing
DL-based
approaches
have
proposed
different
encoder
architectures,
decoder
remains
be
standard
one,
which
generates
a
sequence
tokens
one
by
replace
faulty
statement.
This
has
multiple
limitations:
1)
allowing
syntactically
incorrect
programs,
2)
inefficiently
representing
small
edits,
3)
not
being
able
project-specific
identifiers.
Proceedings of the 44th International Conference on Software Engineering,
Journal Year:
2022,
Volume and Issue:
unknown, P. 1506 - 1518
Published: May 21, 2022
Neural
machine
translation
(NMT)
architectures
have
achieved
promising
results
for
automatic
program
repair.
Yet,
they
the
limitation
of
generating
low-quality
patches
(e.g.,
not
compilable
patches).
This
is
because
existing
works
only
optimize
a
purely
syntactic
loss
function
based
on
characters
and
tokens
without
incorporating
program-specific
information
during
neural
network
weight
optimization.
In
this
paper,
we
propose
novel
repair
model
called
RewardRepair.
The
core
novelty
RewardRepair
to
improve
NMT-based
with
compilation
test
execution
information,
rewarding
produce
that
compile
do
overfit.
We
conduct
several
experiments
evaluate
showing
it
feasible
effective
use
underlying
model.
correctly
repairs
207
bugs
over
four
benchmarks.
report
success
121
are
fixed
first
time
in
literature.
Also,
produces
up
45.3%
patches,
an
improvement
39%
by
state-of-the-art.
Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering,
Journal Year:
2022,
Volume and Issue:
unknown, P. 959 - 971
Published: Nov. 7, 2022
Due
to
the
promising
future
of
Automated
Program
Repair
(APR),
researchers
have
proposed
various
APR
techniques,
including
heuristic-based,
template-based,
and
constraint-based
techniques.
Among
such
classic
template-based
techniques
been
widely
recognized
as
state
art.
However,
require
predefined
templates
perform
repair,
their
effectiveness
is
thus
limited.
To
this
end,
leveraged
recent
advances
in
Deep
Learning
further
improve
APR.
Such
learning-based
typically
view
a
Neural
Machine
Translation
problem,
using
buggy/fixed
code
snippets
source/target
languages
for
translation.
In
way,
heavily
rely
on
large
numbers
high-quality
bug-fixing
commits,
which
can
be
extremely
costly/challenging
construct
may
limit
edit
variety
context
representation.
Fault
Localization
(FL)
aims
to
automatically
localize
buggy
lines
of
code,
a
key
first
step
in
many
manual
and
automatic
debugging
tasks.
Previous
FL
techniques
assume
the
provision
input
tests,
often
require
extensive
program
analysis,
instrumentation,
or
data
preprocessing.
Prior
work
on
deep
learning
for
APR
struggles
learn
from
small
datasets
produces
limited
results
real-world
programs.
Inspired
by
ability
large
language
models
(LLMs)
code
adapt
new
tasks
based
very
few
examples,
we
investigate
applicability
LLMs
line
level
fault
localization.
Specifically,
propose
overcome
left-to-right
nature
fine-tuning
set
bidirectional
adapter
layers
top
representations
learned
produce
LLMAO,
model
localization
approach
that
locates
without
any
test
coverage
information.
We
fine-tune
with
350
million,
6
billion,
16
billion
parameters
small,
manually
curated
corpora
programs
such
as
Defects4J
corpus.
observe
our
technique
achieves
substantially
more
confidence
when
built
larger
models,
bug
performance
scaling
consistently
LLM
size.
Our
empirical
evaluation
shows
LLMAO
improves
Top-1
over
state-of-the-art
machine
(MLFL)
baselines
2.3%--54.4%,
Top-5
14.4%-35.6%.
is
also
trained
using
architecture
can
detect
security
vulnerabilities
down
level.
Test-based
automated
program
repair
has
been
a
prolific
field
of
research
in
software
engineering
the
last
decade.
Many
approaches
have
indeed
proposed,
which
leverage
test
suites
as
weak,
but
affordable,
approximation
to
specifications.
Although
literature
regularly
sets
new
records
on
number
benchmark
bugs
that
can
be
fixed,
several
studies
increasingly
raise
concerns
about
limitations
and
biases
state-of-the-art
approaches.
For
example,
correctness
generated
patches
questioned
studies,
while
other
researchers
pointed
out
evaluation
schemes
may
misleading
with
respect
processing
fault
localization
results.
Nevertheless,
there
is
little
work
addressing
efficiency
patch
generation,
regard
practicality
repair.
In
this
paper,
we
fill
gap
literature,
by
providing
an
extensive
review
suite
based
Our
objective
assess
candidates,
since
information
correlated
(1)
strategy
traverse
search
space
efficiently
order
select
sensical
attempts,
(2)
minimize
effort
for
identifying
plausible
patch,
(3)
well
prioritize
generation
correct
patch.
To
end,
perform
large-scale
empirical
study
efficiency,
terms
quantity
candidates
16
open-source
tools
Java
programs.
The
experiments
are
carefully
conducted
under
same
configurations
limit
biases.
A
large
body
of
the
literature
automated
program
repair
develops
approaches
where
patches
are
generated
to
be
validated
against
an
oracle
(e.g.,
a
test
suite).
Because
such
can
imperfect,
patches,
although
by
oracle,
may
actually
incorrect.
While
state
art
explore
research
directions
that
require
dynamic
information
or
rely
on
manually-crafted
heuristics,
we
study
benefit
learning
code
representations
in
order
learn
deep
features
encode
properties
patch
correctness.
Our
empirical
work
mainly
investigates
different
representation
for
changes
derive
embeddings
amenable
similarity
computations.
We
report
findings
based
produced
pre-trained
and
re-trained
neural
networks.
Experimental
results
demonstrate
potential
empower
algorithms
reasoning
about
correctness:
machine
predictor
with
BERT
transformer-based
associated
logistic
regression
yielded
AUC
value
0.8
prediction
correctness
deduplicated
dataset
1000
labeled
patches.
investigations
show
learned
lead
reasonable
performance
when
comparing
state-of-the-art,
PATCH-SIM,
which
relies
information.
These
further
complementary
were
carefully
(manually)
engineered
literature.
ACM Transactions on Software Engineering and Methodology,
Journal Year:
2022,
Volume and Issue:
31(3), P. 1 - 29
Published: May 18, 2022
Despite
the
capability
in
successfully
fixing
more
and
real-world
bugs,
existing
Automated
Program
Repair
(APR)
techniques
are
still
challenged
by
long-standing
overfitting
problem
(i.e.,
a
generated
patch
that
passes
all
tests
is
actually
incorrect).
Plenty
of
approaches
have
been
proposed
for
automated
correctness
assessment
(APCA
).
Nonetheless,
dynamic
ones
those
needed
to
execute
tests)
time-consuming
while
static
built
on
top
code
features)
less
precise.
Therefore,
embedding
recently,
which
assess
via
token
sequences
extracted
from
changed
patch.
However,
rarely
considered
context
information
program
structures
patch,
crucial
as
revealed
studies.
In
this
study,
we
explore
idea
context-aware
change
considering
assessment.
Specifically,
given
not
only
focus
but
also
take
correlated
unchanged
part
into
consideration,
through
can
be
leveraged.
We
then
utilize
AST
path
technique
representation
where
structure
node
captured.
Finally,
based
several
pre-defined
heuristics,
build
deep
learning
classifier
predict
implemented
Cache
performed
extensive
experiments
its
effectiveness.
Our
results
demonstrate
(1)
perform
better
than
previous
(e.g.,
relatively
outperforms
\(
\approx
\)
6%,
3%,
16%,
respectively
under
three
diverse
experiment
settings),
(2)
achieve
overall
higher
performance
APCA
even
being
precise
certain
including
PATCH-SIM
(92.9%
vs.
83.0%).
Further
reveal
leveraged
contributed
significantly
outstanding
performance.
ACM Transactions on Software Engineering and Methodology,
Journal Year:
2023,
Volume and Issue:
33(2), P. 1 - 69
Published: Nov. 6, 2023
Automated
program
repair
(APR)
aims
to
fix
software
bugs
automatically
and
plays
a
crucial
role
in
development
maintenance.
With
the
recent
advances
deep
learning
(DL),
an
increasing
number
of
APR
techniques
have
been
proposed
leverage
neural
networks
learn
bug-fixing
patterns
from
massive
open-source
code
repositories.
Such
learning-based
usually
treat
as
machine
translation
(NMT)
task,
where
buggy
snippets
(i.e.,
source
language)
are
translated
into
fixed
target
automatically.
Benefiting
powerful
capability
DL
hidden
relationships
previous
datasets,
achieved
remarkable
performance.
In
this
article,
we
provide
systematic
survey
summarize
current
state-of-the-art
research
community.
We
illustrate
general
workflow
detail
components,
including
fault
localization,
patch
generation,
ranking,
validation,
correctness
phases.
then
discuss
widely
adopted
datasets
evaluation
metrics
outline
existing
empirical
studies.
several
critical
aspects
techniques,
such
domains,
industrial
deployment,
open
science
issue.
highlight
practical
guidelines
on
applying
for
future
studies,
exploring
explainable
generation
utilizing
features.
Overall,
our
article
can
help
researchers
gain
comprehensive
understanding
about
achievements
promote
application
these
techniques.
Our
artifacts
publicly
available
at
repository:
https://github.com/iSEngLab/AwesomeLearningAPR
.
Test-based
automated
program
repair
(APR)
has
attracted
huge
attention
from
both
industry
and
academia.
Despite
the
significant
progress
made
in
recent
studies,
overfitting
problem
(i.e.,
generated
patch
is
plausible
but
overfitting)
still
a
major
long-standing
challenge.
Therefore,
plenty
of
techniques
have
been
proposed
to
assess
correctness
patches
either
generation
phase
or
evaluation
APR
techniques.
However,
effectiveness
existing
not
systematically
compared
little
known
their
advantages
disadvantages.
To
fill
this
gap,
we
performed
large-scale
empirical
study
paper.
Specifically,
investigated
assessment
techniques,
including
static
dynamic
ones,
based
on
902
automatically
by
21
tools
4
different
categories.
Our
revealed
following
findings:
(1)
code
features
with
respect
syntax
semantics
are
generally
effective
differentiating
over
correct
ones;
(2)
can
achieve
high
precision
while
heuristics
more
towards
recall;
(3)
certain
projects
types
less
others;
(4)
highly
complementary
each
other.
For
instance,
single
technique
only
detect
at
most
53.5%
93.3%
them
be
detected
least
one
when
oracle
information
available.
Based
our
findings,
designed
an
integration
strategy
first
integrate
via
learning,
then
combine
others
majority
voting
strategy.
experiments
show
that
enhance
performance
significantly.
IEEE Transactions on Software Engineering,
Journal Year:
2021,
Volume and Issue:
48(8), P. 2920 - 2938
Published: April 9, 2021
Automatic
program
repair
(APR)
aims
to
reduce
the
cost
of
manually
fixing
software
defects.
However,
APR
suffers
from
generating
a
multitude
overfitting
patches,
those
patches
that
fail
correctly
defect
beyond
making
tests
pass.
This
paper
presents
novel
patch
detection
system
called
ODS
assess
correctness
patches.
first
statically
compares
patched
and
buggy
in
order
extract
code
features
at
abstract
syntax
tree
(AST)
level,
for
single
programming
language
Java.
Then,
uses
supervised
learning
with
captured
labels
automatically
learn
probabilistic
model.
The
learned
model
can
then
finally
be
applied
classify
new
unseen
We
conduct
large-scale
experiment
evaluate
effectiveness
on
classification
based
10,302
Defects4J,
Bugs.jar
Bears
benchmarks.
empirical
evaluation
shows
is
able
71.9
percent
26
projects,
which
improves
state-of-the-art.
applicable
practice
employed
as
post-processing
procedure
generated
by
different
systems.