2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE),
Journal Year:
2023,
Volume and Issue:
unknown, P. 1881 - 1886
Published: Sept. 11, 2023
Developing
effective
data-driven
automated
bug-fixing
approaches
is
heavily
relying
on
large
bug-fix
datasets.
However,
the
granularity
of
current
repository-mined
datasets
usually
at
function
level,
without
meta-information
such
as
fault
type.
In
order
to
alleviate
open
challenge
precisely
mining
code
snippets
with
bugs,
their
fix,
location,
and
types
from
source
repositories,
in
this
paper,
we
propose
a
flexible,
extensible,
multilingual
dataset
construction
system,
that
is,
Multilingual
Bug-Fix
Constructor
(MBFC).
Furthermore,
release
large-scale
fine-grained
Multi-lingual
(M-BF)
automatically
built
using
proposed
which
includes
total
921,825
pairs
are
442,164
different
open-source
software
projects
starting
January
2020
September
initial
version.
It
expected
our
system
can
benefit
development
innovative
practical
program
repair
methods,
thereby
improving
efficiency
debugging
review
processes.
IEEE Transactions on Information Forensics and Security,
Journal Year:
2024,
Volume and Issue:
19, P. 4374 - 4389
Published: Jan. 1, 2024
The
security
of
computer
systems
typically
relies
on
a
hardware
root
trust.
As
vulnerabilities
in
can
have
severe
implications
system,
there
is
need
for
techniques
to
support
verification
activities.
Assertion-based
popular
technique
that
involves
capturing
design
intent
set
assertions
be
used
formal
or
testing-based
checking.
However,
writing
security-centric
challenging
task.
In
this
work,
we
investigate
the
use
emerging
large
language
models
(LLMs)
code
generation
assertion
security,
where
primarily
natural
prompts,
such
as
those
one
would
see
comments
files,
are
produce
SystemVerilog
assertions.
We
focus
our
attention
LLM
and
characterize
its
ability
write
out
box,
given
varying
levels
detail
prompt.
an
evaluation
framework
generates
variety
create
benchmark
suite
comprising
real-world
designs
corresponding
golden
reference
want
generate
with
LLM.
arXiv (Cornell University),
Journal Year:
2023,
Volume and Issue:
unknown
Published: Jan. 1, 2023
The
security
of
computer
systems
typically
relies
on
a
hardware
root
trust.
As
vulnerabilities
in
can
have
severe
implications
system,
there
is
need
for
techniques
to
support
verification
activities.
Assertion-based
popular
technique
that
involves
capturing
design
intent
set
assertions
be
used
formal
or
testing-based
checking.
However,
writing
security-centric
challenging
task.
In
this
work,
we
investigate
the
use
emerging
large
language
models
(LLMs)
code
generation
assertion
security,
where
primarily
natural
prompts,
such
as
those
one
would
see
comments
files,
are
produce
SystemVerilog
assertions.
We
focus
our
attention
LLM
and
characterize
its
ability
write
out
box,
given
varying
levels
detail
prompt.
an
evaluation
framework
generates
variety
create
benchmark
suite
comprising
real-world
designs
corresponding
golden
reference
want
generate
with
LLM.
ACM Transactions on Software Engineering and Methodology,
Journal Year:
2025,
Volume and Issue:
unknown
Published: Feb. 20, 2025
Bug
fixing
holds
significant
importance
in
software
development
and
maintenance.
Recent
research
has
made
substantial
strides
exploring
the
potential
of
large
language
models
(LLMs)
for
automatically
resolving
bugs.
However,
a
noticeable
gap
existing
approaches
lies
oversight
collaborative
facets
intrinsic
to
bug
resolution,
treating
process
as
single-stage
endeavor.
Moreover,
most
solely
take
buggy
code
snippet
input
LLMs
during
patch
generation
stage.
To
mitigate
aforementioned
limitations,
we
introduce
novel
stage-wise
framework
named
PATCH.
Specifically,
first
augment
with
corresponding
dependence
context
intent
information
better
guide
generating
correct
candidate
patches.
Additionally,
by
taking
inspiration
from
management
practices,
decompose
bug-fixing
task
into
four
distinct
stages:
reporting,
diagnosis,
generation,
verification.
These
stages
are
performed
interactively
LLMs,
aiming
simulate
behavior
programmers
resolution
By
harnessing
these
collective
contributions,
PATCH
effectively
enhances
capability
LLMs.
We
implement
employing
powerful
dialogue-based
LLM
ChatGPT.
Our
evaluation
on
widely
used
benchmark
BFP
demonstrates
that
achieved
performance
than
state-of-the-art
Recently,
the
emerging
trend
in
automatic
program
repair
is
to
apply
deep
neural
networks
generate
fixed
code
from
buggy
ones,
called
NPR
(Neural
Program
Repair).
However,
existing
systems
are
trained
and
evaluated
under
very
different
settings
(e.g.,
training
data,
inconsistent
evaluation
wide-ranged
candidate
numbers),
which
makes
it
hard
draw
fair-enough
conclusions
when
comparing
them.
Motivated
by
this,
we
first
build
a
standard
benchmark
dataset
an
extensive
framework
tool
mitigate
threats
for
comparison.
The
consists
of
set,
validation
set
with
144,641,
13,739
13,706
bug-fix
pairs
Java
respectively.
supports
selecting
specific
training,
validation,
datasets
automatically
conducting
pipeline
evaluating
models,
as
well
easily
integrating
new
models
implementing
well-defined
interfaces.
Then,
based
on
tool,
conduct
comprehensive
empirical
comparison
six
SOTA
w.r.t
repairability,
inclination
generalizability.
experimental
results
reveal
deeper
characteristics
compared
subvert
some
comparative
conclusions,
further
verify
necessity
unifying
setups
exploring
progresses
systems.
Meanwhile,
common
features
they
good
at
dealing
code-delete
bugs).
Finally,
identify
promising
research
directions
derived
our
findings.
Repairing
software
bugs
with
automated
solutions
is
a
long-standing
goal
of
researchers.
Some
the
latest
program
repair
(APR)
tools
leverage
natural
language
processing
(NLP)
techniques
to
bugs.
But
languages
(NL)
and
programming
(PL)
have
significant
differences,
which
leads
fact
that
they
may
not
be
able
handle
PL
tasks
well.
Moreover,
due
difference
between
vulnerability
task
bug
task,
performance
these
on
yet
known.
To
address
issues,
we
attempt
use
large-scale
pre-trained
models
(CodeBERT
GraphCodeBERT)
for
based
characteristics
explore
real-world
state-of-the-art
data-driven
approaches
repair.
The
results
show
using
can
better
capture
process
features
accomplish
multi-line
Specifically,
our
solution
achieves
advanced
(single-line
accuracy
95.47%,
90.06%).
These
outperform
demonstrate
adding
rich
data-dependent
help
solve
more
complex
code
problems.
Besides,
also
discuss
previous
work
approach,
pointing
out
some
shortcomings
in
future.
ACM Transactions on Software Engineering and Methodology,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Aug. 19, 2024
Recent
years
have
seen
a
rise
in
neural
program
repair
systems
the
software
engineering
community,
which
adopt
advanced
deep
learning
techniques
to
automatically
fix
bugs.
Having
comprehensive
understanding
of
existing
can
facilitate
new
improvements
this
area
and
provide
practical
instructions
for
users.
However,
we
observe
two
potential
weaknesses
current
evaluation
NPR
systems:
①
published
are
trained
with
varying
data,
②
roughly
evaluated
through
number
totally
fixed
Questions
such
as
“what
types
bugs
repairable
systems”
cannot
be
answered
yet.
Consequently,
researchers
not
make
target
users
no
idea
real
affair
systems.
In
paper,
perform
systematic
nine
state-of-the-art
To
fair
detailed
comparison,
(1)
build
benchmark
framework
that
supports
training
validating
unified
(2)
evaluate
retrained
performance
analysis,
especially
on
effectiveness
efficiency.
We
believe
our
tool
results
could
offer
practitioners
affairs
implications
further
facilitating
NPR.
Journal of Software Evolution and Process,
Journal Year:
2023,
Volume and Issue:
36(4)
Published: May 23, 2023
Abstract
Due
to
the
high
cost
of
repairing
defective
programs,
many
researches
focus
on
automatic
program
repair
(APR).
In
recent
years,
new
trend
APR
is
apply
neural
networks
mine
relations
between
programs
and
corresponding
patches
automatically,
which
known
as
(NPR).
The
community,
however,
ignores
some
important
properties
that
could
impact
applicability
NPR
systems,
such
robustness.
For
semantic‐identical
buggy
systems
may
produce
totally
different
patches.
this
paper,
we
propose
an
evaluation
tool
named
RobustNPR
,
first
robustness
tool.
employs
several
mutators
generate
mutants
programs.
original
its
mutant,
it
checks
two
aspects
NPR:
(a)
Can
fix
when
can
program?
(b)
for
mutant?
Then,
evaluate
four
SOTA
models
analyze
results.
From
results,
find
even
best‐performing
model,
20.16%
success
unreliable,
indicates
not
perfect.
addition,
correlated
with
model
settings
other
factors.