Let’s Talk With Developers, Not About Developers: A Review of Automatic Program Repair Research DOI Creative Commons
Emily Winter, Vesna Nowack, David Bowes

et al.

IEEE Transactions on Software Engineering, Journal Year: 2022, Volume and Issue: 49(1), P. 419 - 436

Published: Feb. 16, 2022

Automatic program repair (APR) offers significant potential for automating some coding tasks. Using APR could reduce the high costs historically associated with fixing code faults and deliver benefits to software engineering. Adopting also have profound implications developers' daily activities, transforming their work practices. To realise of it is vital that we consider how developers feel about impact may on work. Developing tools without consideration developer likely undermine success deployment. In this paper, critically review are considered in research by analysing human factors treated 260 studies from Monperrus's Living Review APR. Over half our were motivated a problem faced (e.g., difficulty faults). Despite these human-oriented motivations, fewer than 7% included study. We looked detail at found quality mixed (for example, one study was based input only developer). Our results suggest often talked about studies, but rarely xmlns:xlink="http://www.w3.org/1999/xlink">with . A more comprehensive reliable understanding relation needed. Without understanding, will be difficult develop techniques which integrate effectively into workflows. recommend future agenda advance

Language: Английский

A syntax-guided edit decoder for neural program repair DOI
Qihao Zhu, Zeyu Sun, Yuan-an Xiao

et al.

Published: Aug. 18, 2021

Automated Program Repair (APR) helps improve the efficiency of software development and maintenance. Recent APR techniques use deep learning, particularly encoder-decoder architecture, to generate patches. Though existing DL-based approaches have proposed different encoder architectures, decoder remains be standard one, which generates a sequence tokens one by replace faulty statement. This has multiple limitations: 1) allowing syntactically incorrect programs, 2) inefficiently representing small edits, 3) not being able project-specific identifiers.

Language: Английский

Citations

124

Neural program repair with execution-based backpropagation DOI Open Access
He Ye, Matías Martínez, Martin Monperrus

et al.

Proceedings of the 44th International Conference on Software Engineering, Journal Year: 2022, Volume and Issue: unknown, P. 1506 - 1518

Published: May 21, 2022

Neural machine translation (NMT) architectures have achieved promising results for automatic program repair. Yet, they the limitation of generating low-quality patches (e.g., not compilable patches). This is because existing works only optimize a purely syntactic loss function based on characters and tokens without incorporating program-specific information during neural network weight optimization. In this paper, we propose novel repair model called RewardRepair. The core novelty RewardRepair to improve NMT-based with compilation test execution information, rewarding produce that compile do overfit. We conduct several experiments evaluate showing it feasible effective use underlying model. correctly repairs 207 bugs over four benchmarks. report success 121 are fixed first time in literature. Also, produces up 45.3% patches, an improvement 39% by state-of-the-art.

Language: Английский

Citations

93

Less training, more repairing please: revisiting automated program repair via zero-shot learning DOI
Chunqiu Steven Xia, Lingming Zhang

Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Journal Year: 2022, Volume and Issue: unknown, P. 959 - 971

Published: Nov. 7, 2022

Due to the promising future of Automated Program Repair (APR), researchers have proposed various APR techniques, including heuristic-based, template-based, and constraint-based techniques. Among such classic template-based techniques been widely recognized as state art. However, require predefined templates perform repair, their effectiveness is thus limited. To this end, leveraged recent advances in Deep Learning further improve APR. Such learning-based typically view a Neural Machine Translation problem, using buggy/fixed code snippets source/target languages for translation. In way, heavily rely on large numbers high-quality bug-fixing commits, which can be extremely costly/challenging construct may limit edit variety context representation.

Language: Английский

Citations

92

Large Language Models for Test-Free Fault Localization DOI Creative Commons
Aidan Z. H. Yang, Claire Le Goues, Ruben Martins

et al.

Published: Feb. 6, 2024

Fault Localization (FL) aims to automatically localize buggy lines of code, a key first step in many manual and automatic debugging tasks. Previous FL techniques assume the provision input tests, often require extensive program analysis, instrumentation, or data preprocessing. Prior work on deep learning for APR struggles learn from small datasets produces limited results real-world programs. Inspired by ability large language models (LLMs) code adapt new tasks based very few examples, we investigate applicability LLMs line level fault localization. Specifically, propose overcome left-to-right nature fine-tuning set bidirectional adapter layers top representations learned produce LLMAO, model localization approach that locates without any test coverage information. We fine-tune with 350 million, 6 billion, 16 billion parameters small, manually curated corpora programs such as Defects4J corpus. observe our technique achieves substantially more confidence when built larger models, bug performance scaling consistently LLM size. Our empirical evaluation shows LLMAO improves Top-1 over state-of-the-art machine (MLFL) baselines 2.3%--54.4%, Top-5 14.4%-35.6%. is also trained using architecture can detect security vulnerabilities down level.

Language: Английский

Citations

22

On the efficiency of test suite based program repair DOI
Kui Liu, Shangwen Wang, Anil Koyuncu

et al.

Published: June 27, 2020

Test-based automated program repair has been a prolific field of research in software engineering the last decade. Many approaches have indeed proposed, which leverage test suites as weak, but affordable, approximation to specifications. Although literature regularly sets new records on number benchmark bugs that can be fixed, several studies increasingly raise concerns about limitations and biases state-of-the-art approaches. For example, correctness generated patches questioned studies, while other researchers pointed out evaluation schemes may misleading with respect processing fault localization results. Nevertheless, there is little work addressing efficiency patch generation, regard practicality repair. In this paper, we fill gap literature, by providing an extensive review suite based Our objective assess candidates, since information correlated (1) strategy traverse search space efficiently order select sensical attempts, (2) minimize effort for identifying plausible patch, (3) well prioritize generation correct patch. To end, perform large-scale empirical study efficiency, terms quantity candidates 16 open-source tools Java programs. The experiments are carefully conducted under same configurations limit biases.

Language: Английский

Citations

100

Evaluating representation learning of code changes for predicting patch correctness in program repair DOI
Haoye Tian, Kui Liu, Abdoul Kader Kaboré

et al.

Published: Dec. 21, 2020

A large body of the literature automated program repair develops approaches where patches are generated to be validated against an oracle (e.g., a test suite). Because such can imperfect, patches, although by oracle, may actually incorrect. While state art explore research directions that require dynamic information or rely on manually-crafted heuristics, we study benefit learning code representations in order learn deep features encode properties patch correctness. Our empirical work mainly investigates different representation for changes derive embeddings amenable similarity computations. We report findings based produced pre-trained and re-trained neural networks. Experimental results demonstrate potential empower algorithms reasoning about correctness: machine predictor with BERT transformer-based associated logistic regression yielded AUC value 0.8 prediction correctness deduplicated dataset 1000 labeled patches. investigations show learned lead reasonable performance when comparing state-of-the-art, PATCH-SIM, which relies information. These further complementary were carefully (manually) engineered literature.

Language: Английский

Citations

72

Context-Aware Code Change Embedding for Better Patch Correctness Assessment DOI
Bo Lin, Shangwen Wang, Ming Wen

et al.

ACM Transactions on Software Engineering and Methodology, Journal Year: 2022, Volume and Issue: 31(3), P. 1 - 29

Published: May 18, 2022

Despite the capability in successfully fixing more and real-world bugs, existing Automated Program Repair (APR) techniques are still challenged by long-standing overfitting problem (i.e., a generated patch that passes all tests is actually incorrect). Plenty of approaches have been proposed for automated correctness assessment (APCA ). Nonetheless, dynamic ones those needed to execute tests) time-consuming while static built on top code features) less precise. Therefore, embedding recently, which assess via token sequences extracted from changed patch. However, rarely considered context information program structures patch, crucial as revealed studies. In this study, we explore idea context-aware change considering assessment. Specifically, given not only focus but also take correlated unchanged part into consideration, through can be leveraged. We then utilize AST path technique representation where structure node captured. Finally, based several pre-defined heuristics, build deep learning classifier predict implemented Cache performed extensive experiments its effectiveness. Our results demonstrate (1) perform better than previous (e.g., relatively outperforms \( \approx \) 6%, 3%, 16%, respectively under three diverse experiment settings), (2) achieve overall higher performance APCA even being precise certain including PATCH-SIM (92.9% vs. 83.0%). Further reveal leveraged contributed significantly outstanding performance.

Language: Английский

Citations

49

A Survey of Learning-based Automated Program Repair DOI Open Access
Quanjun Zhang, Chunrong Fang, Yuxiang Ma

et al.

ACM Transactions on Software Engineering and Methodology, Journal Year: 2023, Volume and Issue: 33(2), P. 1 - 69

Published: Nov. 6, 2023

Automated program repair (APR) aims to fix software bugs automatically and plays a crucial role in development maintenance. With the recent advances deep learning (DL), an increasing number of APR techniques have been proposed leverage neural networks learn bug-fixing patterns from massive open-source code repositories. Such learning-based usually treat as machine translation (NMT) task, where buggy snippets (i.e., source language) are translated into fixed target automatically. Benefiting powerful capability DL hidden relationships previous datasets, achieved remarkable performance. In this article, we provide systematic survey summarize current state-of-the-art research community. We illustrate general workflow detail components, including fault localization, patch generation, ranking, validation, correctness phases. then discuss widely adopted datasets evaluation metrics outline existing empirical studies. several critical aspects techniques, such domains, industrial deployment, open science issue. highlight practical guidelines on applying for future studies, exploring explainable generation utilizing features. Overall, our article can help researchers gain comprehensive understanding about achievements promote application these techniques. Our artifacts publicly available at repository: https://github.com/iSEngLab/AwesomeLearningAPR .

Language: Английский

Citations

38

Automated patch correctness assessment DOI
Shangwen Wang, Ming Wen, Bo Lin

et al.

Published: Dec. 21, 2020

Test-based automated program repair (APR) has attracted huge attention from both industry and academia. Despite the significant progress made in recent studies, overfitting problem (i.e., generated patch is plausible but overfitting) still a major long-standing challenge. Therefore, plenty of techniques have been proposed to assess correctness patches either generation phase or evaluation APR techniques. However, effectiveness existing not systematically compared little known their advantages disadvantages. To fill this gap, we performed large-scale empirical study paper. Specifically, investigated assessment techniques, including static dynamic ones, based on 902 automatically by 21 tools 4 different categories. Our revealed following findings: (1) code features with respect syntax semantics are generally effective differentiating over correct ones; (2) can achieve high precision while heuristics more towards recall; (3) certain projects types less others; (4) highly complementary each other. For instance, single technique only detect at most 53.5% 93.3% them be detected least one when oracle information available. Based our findings, designed an integration strategy first integrate via learning, then combine others majority voting strategy. experiments show that enhance performance significantly.

Language: Английский

Citations

69

Automated Classification of Overfitting Patches With Statically Extracted Code Features DOI
He Ye, Jian Gu, Matías Martínez

et al.

IEEE Transactions on Software Engineering, Journal Year: 2021, Volume and Issue: 48(8), P. 2920 - 2938

Published: April 9, 2021

Automatic program repair (APR) aims to reduce the cost of manually fixing software defects. However, APR suffers from generating a multitude overfitting patches, those patches that fail correctly defect beyond making tests pass. This paper presents novel patch detection system called ODS assess correctness patches. first statically compares patched and buggy in order extract code features at abstract syntax tree (AST) level, for single programming language Java. Then, uses supervised learning with captured labels automatically learn probabilistic model. The learned model can then finally be applied classify new unseen We conduct large-scale experiment evaluate effectiveness on classification based 10,302 Defects4J, Bugs.jar Bears benchmarks. empirical evaluation shows is able 71.9 percent 26 projects, which improves state-of-the-art. applicable practice employed as post-processing procedure generated by different systems.

Language: Английский

Citations

54