Patch correctness assessment in automated program repair based on the impact of patches on production and test code DOI Open Access
Ali Ghanbari, Andrian Marcus

Published: July 15, 2022

Test-based generate-and-validate automated program repair (APR) systems often generate many patches that pass the test suite without fixing bug. The generated must be manually inspected by developers, so previous research proposed various techniques for automatic correctness assessment of APR-generated patches. Among them, dynamic patch rely on assumption that, when running originally passing cases, correct will not alter behavior in a significant way, e.g., removing code implementing functionality program. In this paper, we propose and evaluate novel technique, named Shibboleth, test-based APR systems. Unlike existing works, impact is captured along three complementary facets, allowing more effective assessment. Specifically, measure both production (via syntactic semantic similarity) coverage tests) to separate result similar programs do delete desired elements. Shibboleth assesses via ranking classification. We evaluated 1,871 patches, 29 Java-based Defects4J programs. technique outperforms state-of-the-art classification techniques. our data set, 43% (66%) ranks top-1 (top-2) positions, mode applied it achieves an accuracy F1-score 0.887 0.852, respectively.

Language: Английский

Evaluating Automatic Program Repair Capabilities to Repair API Misuses DOI
Maria Kechagia, Sergey Mechtaev, Federica Sarro

et al.

IEEE Transactions on Software Engineering, Journal Year: 2021, Volume and Issue: 48(7), P. 2658 - 2679

Published: March 18, 2021

API misuses are well-known causes of software crashes and security vulnerabilities. However, their detection repair is challenging given that the correct usages (third-party) api s might be obscure to developers client programs. This paper presents first empirical study assess ability existing automated bug tools misuses, which a class bugs previously unexplored. Our examines compares 14 Java test-suite-based (11 proposed before 2018, three afterwards) on manually curated benchmark ( xmlns:xlink="http://www.w3.org/1999/xlink">APIRepBench ) consisting 101 misuses. We develop an extensible execution framework xmlns:xlink="http://www.w3.org/1999/xlink">APIARTy automatically execute multiple tools. results show able generate patches for 28 percent considered. While 11 less recent generally fast (the median time attempts 3.87 minutes mean 30.79 minutes), most efficient (i.e., 98 slower) than predecessors. The mostly belong categories missing null check, value, exception, call. Most generated by all plausible (65 percent), but only few these semantically human (25 percent). findings suggest design future should support localisation complex bugs, including different handling timeout issues, configure large projects. Both have been made publicly available other researchers evaluate capabilities detecting fixing

Language: Английский

Citations

27

Show Me Why It’s Correct: Saving 1/3 of Debugging Time in Program Repair with Interactive Runtime Comparison DOI Open Access
Ruixin Wang, Zhongkai Zhao,

Le Fang

et al.

Proceedings of the ACM on Programming Languages, Journal Year: 2025, Volume and Issue: 9(OOPSLA1), P. 1831 - 1857

Published: April 9, 2025

Automated Program Repair (APR) holds the promise of alleviating burden debugging and fixing software bugs. Despite this, developers still need to manually inspect each patch confirm its correctness, which is tedious time-consuming. This challenge exacerbated in presence plausible patches, accidentally pass test cases but may not correctly fix bug. To address this challenge, we propose an interactive approach called iFix facilitate understanding comparison based on their runtime difference. performs static analysis identify variables related buggy statement captures values during execution for patch. These are then aligned across different candidates, allowing users compare contrast behavior. evaluate iFix, conducted a within-subjects user study with 28 participants. Compared manual inspection state-of-the-art filtering technique, reduced participants’ task completion time by 36% 33% while also improving confidence 50% 20%, respectively. Besides, quantitative experiments demonstrate that improves ranking correct patches at least 39% compared other methods generalizable APR tools.

Language: Английский

Citations

0

Neural Program Repair : Systems, Challenges and Solutions DOI
Wenkang Zhong, Chuanyi Li, Jidong Ge

et al.

Published: June 11, 2022

Automated Program Repair (APR) aims to automatically fix bugs in the source code. Recently, as advances Deep Learning (DL) field, there is a rise of Neural (NPR) studies, which formulate APR translation task from buggy code correct and adopt neural networks based on encoder-decoder architecture. Compared with other techniques, NPR approaches have great advantage applicability because they do not need any specification (i.e., test suite). Although has been hot research direction, isn't overview this field yet. In order help interested readers understand architectures, challenges corresponding solutions existing systems, we conduct literature review latest studies paper. We begin introducing background knowledge field. Next, be understandable, decompose procedure into series modules explicate various design choices each module. Furthermore, identify several discuss effect solutions. Finally, conclude provide some promising directions for future research.

Language: Английский

Citations

15

Automated patch assessment for program repair at scale DOI Creative Commons
He Ye, Matías Martínez, Martin Monperrus

et al.

Empirical Software Engineering, Journal Year: 2021, Volume and Issue: 26(2)

Published: Feb. 23, 2021

In this paper, we do automatic correctness assessment for patches generated by program repair systems. We consider the human-written patch as ground truth oracle and randomly generate tests based on it, a technique proposed Shamshiri et al., called Random testing with Ground Truth (RGT) in paper. build curated dataset of 638 Defects4J 14 state-of-the-art systems, evaluate automated dataset. The results study are novel significant: First, improve state art performance RGT 190% improving oracle; Second, show that is reliable enough to help scientists overfitting analysis when they systems; Third, external validity knowledge largest ever.

Language: Английский

Citations

19

Patch correctness assessment in automated program repair based on the impact of patches on production and test code DOI Open Access
Ali Ghanbari, Andrian Marcus

Published: July 15, 2022

Test-based generate-and-validate automated program repair (APR) systems often generate many patches that pass the test suite without fixing bug. The generated must be manually inspected by developers, so previous research proposed various techniques for automatic correctness assessment of APR-generated patches. Among them, dynamic patch rely on assumption that, when running originally passing cases, correct will not alter behavior in a significant way, e.g., removing code implementing functionality program. In this paper, we propose and evaluate novel technique, named Shibboleth, test-based APR systems. Unlike existing works, impact is captured along three complementary facets, allowing more effective assessment. Specifically, measure both production (via syntactic semantic similarity) coverage tests) to separate result similar programs do delete desired elements. Shibboleth assesses via ranking classification. We evaluated 1,871 patches, 29 Java-based Defects4J programs. technique outperforms state-of-the-art classification techniques. our data set, 43% (66%) ranks top-1 (top-2) positions, mode applied it achieves an accuracy F1-score 0.887 0.852, respectively.

Language: Английский

Citations

14