Cited by Adversarial patch generation for automated program repair

Automated Program Repair in the Era of Large Pre-trained Language Models DOI

Chunqiu Steven Xia, Yuxiang Wei, Lingming Zhang

et al.

Published: May 1, 2023

Automated Program Repair (APR) aims to help developers automatically patch software bugs. However, current state-of-the-art traditional and learning-based APR techniques face the problem of limited variety, failing fix complicated This is mainly due reliance on bug-fixing datasets craft templates (traditional) or directly predict potential patches (learning-based). Large Pre-Trained Language Models (LLMs), trained using billions text/code tokens, can potentially avoid this issue. Very recently, researchers have leveraged LLMs for without relying any datasets. Meanwhile, such existing work either failed include was not evaluated realistic Thus, true power modern important yet be revealed. In work, we perform first extensive study applying APR. We select 9 recent LLMs, including both generative infilling models, ranging from 125M 20B in size. designed 3 different repair settings evaluate ways use generate patches: 1) entire function, 2) fill a chunk code given prefix suffix 3) output single line fix. apply under these 5 across languages compare number bugs fixed, generation speed compilation rate. also against tools. Our demonstrates that already substantially outperform all our Among studied scaling effect exists where larger models tend achieve better performance. Also, show time after buggy (adopted infilling-style APR) only generating more fixes but with higher Besides generation, consider correct natural than other ones, even effective ranking correctness checking. Lastly, LLM-based further boosted via: increasing sample size, incorporating template information.

Language: Английский

Citations

148

Impact of Code Language Models on Automated Program Repair DOI

Nan Jiang, Kevin Liu, Thibaud Lutellier

et al.

Published: May 1, 2023

Automated program repair (APR) aims to help developers improve software reliability by generating patches for buggy programs. Although many code language models (CLM) are developed and effective in tasks such as completion, there has been little comprehensive, in-depth work evaluate CLMs' fixing capabilities fine-tune CLMs the APR task. Firstly, this is first ten on four benchmarks, which shows that surprisingly, best CLM, is, fixes 72% more bugs than state-of-the-art deep-learning (DL)-based techniques. Secondly, one of benchmarks was created us paper avoid data leaking a fair evaluation. Thirdly, it with training data, fine-tuning brings 31%-1,267% improvement enables them fix 46%-164 % existing DL-based Fourthly, studies impact lines, showing CLMs, cannot make good use lines bugs, yet fine-tuned could potentially over-rely lines. Lastly, analyzes size, time, memory efficiency different CLMs. This promising directions domain, APR-specific designs, also raises awareness comprehensive evaluations calls transparent reporting open-source repositories used pre-training address problem.

Language: Английский

Citations

Retrieval-Based Prompt Selection for Code-Related Few-Shot Learning DOI

Noor Nashid, Mifta Sintaha, Ali Mesbah

et al.

Published: May 1, 2023

Large language models trained on massive code corpora can generalize to new tasks without the need for task-specific fine-tuning. In few-shot learning, these take as input a prompt, composed of natural instructions, few instances task demonstration, and query generate an output. However, creation effective prompt code-related in learning has received little attention. We present technique that automatically retrieves demonstrations similar developer task, based embedding or frequency analysis. apply our approach, Cedar, two different programming languages, statically dynamically typed, tasks, namely, test assertion generation program repair. For each we compare Cedar with state-of-the-art fine-tuned models. The empirical results show that, only relevant demonstrations, is both accuracy 76% 52% exact matches repair respectively. generation, outperforms existing by 333% 11%, repair, yields 189% better than competitive recent These findings have practical implications practitioners, could potentially be applied multilingual multitask settings language-specific training minimal examples effort.

Language: Английский

Citations

Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review DOI

M.F. Wong,

Shangxin Guo,

Ching Nam Hang

et al.

Entropy, Journal Year: 2023, Volume and Issue: 25(6), P. 888 - 888

Published: June 1, 2023

This paper provides a comprehensive review of the literature concerning utilization Natural Language Processing (NLP) techniques, with particular focus on transformer-based large language models (LLMs) trained using Big Code, within domain AI-assisted programming tasks. LLMs, augmented software naturalness, have played crucial role in facilitating applications, including code generation, completion, translation, refinement, summarization, defect detection, and clone detection. Notable examples such applications include GitHub Copilot powered by OpenAI’s Codex DeepMind AlphaCode. presents an overview major LLMs their downstream tasks related to programming. Furthermore, it explores challenges opportunities associated incorporating NLP techniques naturalness these discussion extending capabilities Apple’s Xcode for mobile development. also empowering developers advanced coding assistance streamlining development process.

Language: Английский

Citations

A Survey of Learning-based Automated Program Repair DOI

Quanjun Zhang, Chunrong Fang, Yuxiang Ma

et al.

ACM Transactions on Software Engineering and Methodology, Journal Year: 2023, Volume and Issue: 33(2), P. 1 - 69

Published: Nov. 6, 2023

Automated program repair (APR) aims to fix software bugs automatically and plays a crucial role in development maintenance. With the recent advances deep learning (DL), an increasing number of APR techniques have been proposed leverage neural networks learn bug-fixing patterns from massive open-source code repositories. Such learning-based usually treat as machine translation (NMT) task, where buggy snippets (i.e., source language) are translated into fixed target automatically. Benefiting powerful capability DL hidden relationships previous datasets, achieved remarkable performance. In this article, we provide systematic survey summarize current state-of-the-art research community. We illustrate general workflow detail components, including fault localization, patch generation, ranking, validation, correctness phases. then discuss widely adopted datasets evaluation metrics outline existing empirical studies. several critical aspects techniques, such domains, industrial deployment, open science issue. highlight practical guidelines on applying for future studies, exploring explainable generation utilizing features. Overall, our article can help researchers gain comprehensive understanding about achievements promote application these techniques. Our artifacts publicly available at repository: https://github.com/iSEngLab/AwesomeLearningAPR .

Language: Английский

Citations

How Effective Are Neural Networks for Fixing Security Vulnerabilities DOI

Yi Wu, Nan Jiang, Hung Viet Pham

et al.

Published: July 12, 2023

Security vulnerability repair is a difficult task that in dire need of automation. Two groups techniques have shown promise: (1) large code language models (LLMs) been pre-trained on source for tasks such as completion, and (2) automated program (APR) use deep learning (DL) to automatically fix software bugs.

Language: Английский

Citations

An Empirical Study on Fine-Tuning Large Language Models of Code for Automated Program Repair DOI

Kai Ming Huang, Xiangxin Meng, Jian Zhang

et al.

2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), Journal Year: 2023, Volume and Issue: unknown, P. 1162 - 1174

Published: Sept. 11, 2023

The advent of large language models (LLMs) has opened up new opportunities for automated program repair (APR). In particular, some recent studies have explored how to leverage code (LLMCs) tasks and show promising results. However, most them adopt the zero/few-shot learning paradigm APR, which directly use LLMCs generate possibly correct given its surrounding context. Though effective, capabilities based on fine-tuning yet be extensively explored. Also, it remains unknown whether potential more complicated bugs (e.g., multi-hunk bugs). To fill gap, in this work, we conduct a comprehensive study capability paradigm. We select 5 popular with representative pre-training architectures, including CodeBERT, GraphCode-BERT, PLBART, CodeT5, UniX coder. consider 3 typical scenarios (i.e., bugs, vulnerabilities, errors) involving programming languages Java, $\mathrm{C}/\mathrm{C}++$ , JavaScript). Notably, take both single-hunk bugs/vulnerabilities into account. then fine-tune widely-used datasets compare existing state-of-the-art APR tools. also investigate impact different design choices, include abstractions, representations, model evaluation metrics. Our experimental results that can significantly outperform previous Through in-depth analysis, provide insights choosing appropriate strategies guide better performance. Lastly, reveal several limitations make suggestions future research LLMC-based APR.

Language: Английский

Citations

RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair DOI

Weishi Wang, Yue Wang, Shafiq Joty

et al.

Published: Nov. 30, 2023

Automatic program repair (APR) is crucial to reduce manual debugging efforts for developers and improve software reliability. While conventional search-based techniques typically rely on heuristic rules or a redundancy assumption mine fix patterns, recent years have witnessed the surge of deep learning (DL) based approaches automate process in data-driven manner. However, their performance often limited by fixed set parameters model highly complex search space APR.

Language: Английский

Citations

KNOD: Domain Knowledge Distilled Tree Decoder for Automated Program Repair DOI

Nan Jiang, Thibaud Lutellier, Yiling Lou

et al.

Published: May 1, 2023

Automated Program Repair (APR) improves soft-ware reliability by generating patches for a buggy program automatically. Recent APR techniques leverage deep learning (DL) to build models learn generate from existing and code corpora. While promising, DL-based suffer the abundant syntactically or semantically incorrect in patch space. These often disobey syntactic semantic domain knowledge of source thus cannot be correct fix bug. We propose approach KNOD, which in-corporates guide generation direct comprehensive way. KNOD has two major novelties, including (1) novel three-stage tree decoder, directly generates Abstract Syntax Trees patched according inherent structure, (2) domain-rule distillation, leverages rules teacher-student distributions explicitly inject into decoding procedure during both training inference phases. evaluate on three widely-used benchmarks. fixes 72 bugs Defects4J v1.2, 25 QuixBugs, 50 additional v2.0 benchmarks, outperforming all tools.

Language: Английский

Citations

Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair DOI

Yuxiang Wei, Chunqiu Steven Xia, Lingming Zhang

et al.

Published: Nov. 30, 2023

During Automated Program Repair (APR), it can be challenging to synthesize correct patches for real-world systems in general-purpose programming languages. Recent Large Language Models (LLMs) have been shown helpful "copilots" assisting developers with various coding tasks, and also directly applied patch synthesis. However, most LLMs treat programs as sequences of tokens, meaning that they are ignorant the underlying semantics constraints target language. This results plenty statically invalid generated patches, impeding practicality technique. Therefore, we propose Repilot, a framework further copilot AI (i.e., LLMs) by synthesizing more valid during repair process. Our key insight is many produce outputs autoregressively token token), resembling human writing programs, which significantly boosted guided through Completion Engine. Repilot synergistically synthesizes candidate interaction between an LLM Engine, 1) prunes away infeasible tokens suggested 2) proactively completes based on suggestions provided evaluation subset widely-used Defects4j 1.2 2.0 datasets shows fixes 66 50 bugs, respectively, surpassing best-performing baseline 14 16 bugs fixed. More importantly, capable producing than base when given same generation budget.

Language: Английский

Citations