Software Quality Journal, Journal Year: 2025, Volume and Issue: 33(1)
Published: Jan. 25, 2025
Language: Английский
Software Quality Journal, Journal Year: 2025, Volume and Issue: 33(1)
Published: Jan. 25, 2025
Language: Английский
Published: May 1, 2023
Automated Program Repair (APR) aims to help developers automatically patch software bugs. However, current state-of-the-art traditional and learning-based APR techniques face the problem of limited variety, failing fix complicated This is mainly due reliance on bug-fixing datasets craft templates (traditional) or directly predict potential patches (learning-based). Large Pre-Trained Language Models (LLMs), trained using billions text/code tokens, can potentially avoid this issue. Very recently, researchers have leveraged LLMs for without relying any datasets. Meanwhile, such existing work either failed include was not evaluated realistic Thus, true power modern important yet be revealed. In work, we perform first extensive study applying APR. We select 9 recent LLMs, including both generative infilling models, ranging from 125M 20B in size. designed 3 different repair settings evaluate ways use generate patches: 1) entire function, 2) fill a chunk code given prefix suffix 3) output single line fix. apply under these 5 across languages compare number bugs fixed, generation speed compilation rate. also against tools. Our demonstrates that already substantially outperform all our Among studied scaling effect exists where larger models tend achieve better performance. Also, show time after buggy (adopted infilling-style APR) only generating more fixes but with higher Besides generation, consider correct natural than other ones, even effective ranking correctness checking. Lastly, LLM-based further boosted via: increasing sample size, incorporating template information.
Language: Английский
Citations
148Published: May 1, 2023
Automated program repair (APR) aims to help developers improve software reliability by generating patches for buggy programs. Although many code language models (CLM) are developed and effective in tasks such as completion, there has been little comprehensive, in-depth work evaluate CLMs' fixing capabilities fine-tune CLMs the APR task. Firstly, this is first ten on four benchmarks, which shows that surprisingly, best CLM, is, fixes 72% more bugs than state-of-the-art deep-learning (DL)-based techniques. Secondly, one of benchmarks was created us paper avoid data leaking a fair evaluation. Thirdly, it with training data, fine-tuning brings 31%-1,267% improvement enables them fix 46%-164 % existing DL-based Fourthly, studies impact lines, showing CLMs, cannot make good use lines bugs, yet fine-tuned could potentially over-rely lines. Lastly, analyzes size, time, memory efficiency different CLMs. This promising directions domain, APR-specific designs, also raises awareness comprehensive evaluations calls transparent reporting open-source repositories used pre-training address problem.
Language: Английский
Citations
82Published: May 1, 2023
Large language models trained on massive code corpora can generalize to new tasks without the need for task-specific fine-tuning. In few-shot learning, these take as input a prompt, composed of natural instructions, few instances task demonstration, and query generate an output. However, creation effective prompt code-related in learning has received little attention. We present technique that automatically retrieves demonstrations similar developer task, based embedding or frequency analysis. apply our approach, Cedar, two different programming languages, statically dynamically typed, tasks, namely, test assertion generation program repair. For each we compare Cedar with state-of-the-art fine-tuned models. The empirical results show that, only relevant demonstrations, is both accuracy 76% 52% exact matches repair respectively. generation, outperforms existing by 333% 11%, repair, yields 189% better than competitive recent These findings have practical implications practitioners, could potentially be applied multilingual multitask settings language-specific training minimal examples effort.
Language: Английский
Citations
69Entropy, Journal Year: 2023, Volume and Issue: 25(6), P. 888 - 888
Published: June 1, 2023
This paper provides a comprehensive review of the literature concerning utilization Natural Language Processing (NLP) techniques, with particular focus on transformer-based large language models (LLMs) trained using Big Code, within domain AI-assisted programming tasks. LLMs, augmented software naturalness, have played crucial role in facilitating applications, including code generation, completion, translation, refinement, summarization, defect detection, and clone detection. Notable examples such applications include GitHub Copilot powered by OpenAI’s Codex DeepMind AlphaCode. presents an overview major LLMs their downstream tasks related to programming. Furthermore, it explores challenges opportunities associated incorporating NLP techniques naturalness these discussion extending capabilities Apple’s Xcode for mobile development. also empowering developers advanced coding assistance streamlining development process.
Language: Английский
Citations
51ACM Transactions on Software Engineering and Methodology, Journal Year: 2023, Volume and Issue: 33(2), P. 1 - 69
Published: Nov. 6, 2023
Automated program repair (APR) aims to fix software bugs automatically and plays a crucial role in development maintenance. With the recent advances deep learning (DL), an increasing number of APR techniques have been proposed leverage neural networks learn bug-fixing patterns from massive open-source code repositories. Such learning-based usually treat as machine translation (NMT) task, where buggy snippets (i.e., source language) are translated into fixed target automatically. Benefiting powerful capability DL hidden relationships previous datasets, achieved remarkable performance. In this article, we provide systematic survey summarize current state-of-the-art research community. We illustrate general workflow detail components, including fault localization, patch generation, ranking, validation, correctness phases. then discuss widely adopted datasets evaluation metrics outline existing empirical studies. several critical aspects techniques, such domains, industrial deployment, open science issue. highlight practical guidelines on applying for future studies, exploring explainable generation utilizing features. Overall, our article can help researchers gain comprehensive understanding about achievements promote application these techniques. Our artifacts publicly available at repository: https://github.com/iSEngLab/AwesomeLearningAPR .
Language: Английский
Citations
38Published: July 12, 2023
Security vulnerability repair is a difficult task that in dire need of automation. Two groups techniques have shown promise: (1) large code language models (LLMs) been pre-trained on source for tasks such as completion, and (2) automated program (APR) use deep learning (DL) to automatically fix software bugs.
Language: Английский
Citations
312021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), Journal Year: 2023, Volume and Issue: unknown, P. 1162 - 1174
Published: Sept. 11, 2023
The
advent
of
large
language
models
(LLMs)
has
opened
up
new
opportunities
for
automated
program
repair
(APR).
In
particular,
some
recent
studies
have
explored
how
to
leverage
code
(LLMCs)
tasks
and
show
promising
results.
However,
most
them
adopt
the
zero/few-shot
learning
paradigm
APR,
which
directly
use
LLMCs
generate
possibly
correct
given
its
surrounding
context.
Though
effective,
capabilities
based
on
fine-tuning
yet
be
extensively
explored.
Also,
it
remains
unknown
whether
potential
more
complicated
bugs
(e.g.,
multi-hunk
bugs).
To
fill
gap,
in
this
work,
we
conduct
a
comprehensive
study
capability
paradigm.
We
select
5
popular
with
representative
pre-training
architectures,
including
CodeBERT,
GraphCode-BERT,
PLBART,
CodeT5,
UniX
coder.
consider
3
typical
scenarios
(i.e.,
bugs,
vulnerabilities,
errors)
involving
programming
languages
Java,
Language: Английский
Citations
29Published: Nov. 30, 2023
Automatic program repair (APR) is crucial to reduce manual debugging efforts for developers and improve software reliability. While conventional search-based techniques typically rely on heuristic rules or a redundancy assumption mine fix patterns, recent years have witnessed the surge of deep learning (DL) based approaches automate process in data-driven manner. However, their performance often limited by fixed set parameters model highly complex search space APR.
Language: Английский
Citations
29Published: May 1, 2023
Automated Program Repair (APR) improves soft-ware reliability by generating patches for a buggy program automatically. Recent APR techniques leverage deep learning (DL) to build models learn generate from existing and code corpora. While promising, DL-based suffer the abundant syntactically or semantically incorrect in patch space. These often disobey syntactic semantic domain knowledge of source thus cannot be correct fix bug. We propose approach KNOD, which in-corporates guide generation direct comprehensive way. KNOD has two major novelties, including (1) novel three-stage tree decoder, directly generates Abstract Syntax Trees patched according inherent structure, (2) domain-rule distillation, leverages rules teacher-student distributions explicitly inject into decoding procedure during both training inference phases. evaluate on three widely-used benchmarks. fixes 72 bugs Defects4J v1.2, 25 QuixBugs, 50 additional v2.0 benchmarks, outperforming all tools.
Language: Английский
Citations
26Published: Nov. 30, 2023
During Automated Program Repair (APR), it can be challenging to synthesize correct patches for real-world systems in general-purpose programming languages. Recent Large Language Models (LLMs) have been shown helpful "copilots" assisting developers with various coding tasks, and also directly applied patch synthesis. However, most LLMs treat programs as sequences of tokens, meaning that they are ignorant the underlying semantics constraints target language. This results plenty statically invalid generated patches, impeding practicality technique. Therefore, we propose Repilot, a framework further copilot AI (i.e., LLMs) by synthesizing more valid during repair process. Our key insight is many produce outputs autoregressively token token), resembling human writing programs, which significantly boosted guided through Completion Engine. Repilot synergistically synthesizes candidate interaction between an LLM Engine, 1) prunes away infeasible tokens suggested 2) proactively completes based on suggestions provided evaluation subset widely-used Defects4j 1.2 2.0 datasets shows fixes 66 50 bugs, respectively, surpassing best-performing baseline 14 16 bugs fixed. More importantly, capable producing than base when given same generation budget.
Language: Английский
Citations
24