An Automated and Flexible Multilingual Bug-Fix Dataset Construction System DOI
Wenkang Zhong, Chuanyi Li, Yunfei Zhang

et al.

2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), Journal Year: 2023, Volume and Issue: unknown, P. 1881 - 1886

Published: Sept. 11, 2023

Developing effective data-driven automated bug-fixing approaches is heavily relying on large bug-fix datasets. However, the granularity of current repository-mined datasets usually at function level, without meta-information such as fault type. In order to alleviate open challenge precisely mining code snippets with bugs, their fix, location, and types from source repositories, in this paper, we propose a flexible, extensible, multilingual dataset construction system, that is, Multilingual Bug-Fix Constructor (MBFC). Furthermore, release large-scale fine-grained Multi-lingual (M-BF) automatically built using proposed which includes total 921,825 pairs are 442,164 different open-source software projects starting January 2020 September initial version. It expected our system can benefit development innovative practical program repair methods, thereby improving efficiency debugging review processes.

Language: Английский

(Security) Assertions by Large Language Models DOI
Rahul Kande, Hammond Pearce, Benjamin Tan

et al.

IEEE Transactions on Information Forensics and Security, Journal Year: 2024, Volume and Issue: 19, P. 4374 - 4389

Published: Jan. 1, 2024

The security of computer systems typically relies on a hardware root trust. As vulnerabilities in can have severe implications system, there is need for techniques to support verification activities. Assertion-based popular technique that involves capturing design intent set assertions be used formal or testing-based checking. However, writing security-centric challenging task. In this work, we investigate the use emerging large language models (LLMs) code generation assertion security, where primarily natural prompts, such as those one would see comments files, are produce SystemVerilog assertions. We focus our attention LLM and characterize its ability write out box, given varying levels detail prompt. an evaluation framework generates variety create benchmark suite comprising real-world designs corresponding golden reference want generate with LLM.

Language: Английский

Citations

12

LLM-assisted Generation of Hardware Assertions DOI Creative Commons
Rahul Kande, Hammond Pearce, Benjamin Tan

et al.

arXiv (Cornell University), Journal Year: 2023, Volume and Issue: unknown

Published: Jan. 1, 2023

The security of computer systems typically relies on a hardware root trust. As vulnerabilities in can have severe implications system, there is need for techniques to support verification activities. Assertion-based popular technique that involves capturing design intent set assertions be used formal or testing-based checking. However, writing security-centric challenging task. In this work, we investigate the use emerging large language models (LLMs) code generation assertion security, where primarily natural prompts, such as those one would see comments files, are produce SystemVerilog assertions. We focus our attention LLM and characterize its ability write out box, given varying levels detail prompt. an evaluation framework generates variety create benchmark suite comprising real-world designs corresponding golden reference want generate with LLM.

Language: Английский

Citations

17

Adversarial patch generation for automated program repair DOI
Abdulaziz Alhefdhi, Hoa Khanh Dam, Thanh Le-Cong

et al.

Software Quality Journal, Journal Year: 2025, Volume and Issue: 33(1)

Published: Jan. 25, 2025

Language: Английский

Citations

0

PATCH: Empowering Large Language Model with Programmer-Intent Guidance and Collaborative-Behavior Simulation for Automatic Bug Fixing DOI Open Access

Yuwei Zhang,

Zhi Jin, Ying Xing

et al.

ACM Transactions on Software Engineering and Methodology, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 20, 2025

Bug fixing holds significant importance in software development and maintenance. Recent research has made substantial strides exploring the potential of large language models (LLMs) for automatically resolving bugs. However, a noticeable gap existing approaches lies oversight collaborative facets intrinsic to bug resolution, treating process as single-stage endeavor. Moreover, most solely take buggy code snippet input LLMs during patch generation stage. To mitigate aforementioned limitations, we introduce novel stage-wise framework named PATCH. Specifically, first augment with corresponding dependence context intent information better guide generating correct candidate patches. Additionally, by taking inspiration from management practices, decompose bug-fixing task into four distinct stages: reporting, diagnosis, generation, verification. These stages are performed interactively LLMs, aiming simulate behavior programmers resolution By harnessing these collective contributions, PATCH effectively enhances capability LLMs. We implement employing powerful dialogue-based LLM ChatGPT. Our evaluation on widely used benchmark BFP demonstrates that achieved performance than state-of-the-art

Language: Английский

Citations

0

StandUp4NPR: Standardizing SetUp for Empirically Comparing Neural Program Repair Systems DOI Open Access
Wenkang Zhong,

Hongliang Ge,

Hongfei Ai

et al.

Published: Oct. 10, 2022

Recently, the emerging trend in automatic program repair is to apply deep neural networks generate fixed code from buggy ones, called NPR (Neural Program Repair). However, existing systems are trained and evaluated under very different settings (e.g., training data, inconsistent evaluation wide-ranged candidate numbers), which makes it hard draw fair-enough conclusions when comparing them. Motivated by this, we first build a standard benchmark dataset an extensive framework tool mitigate threats for comparison. The consists of set, validation set with 144,641, 13,739 13,706 bug-fix pairs Java respectively. supports selecting specific training, validation, datasets automatically conducting pipeline evaluating models, as well easily integrating new models implementing well-defined interfaces. Then, based on tool, conduct comprehensive empirical comparison six SOTA w.r.t repairability, inclination generalizability. experimental results reveal deeper characteristics compared subvert some comparative conclusions, further verify necessity unifying setups exploring progresses systems. Meanwhile, common features they good at dealing code-delete bugs). Finally, identify promising research directions derived our findings.

Language: Английский

Citations

14

Repairing Security Vulnerabilities Using Pre-trained Programming Language Models DOI
Kai Huang,

Su Yang,

Hongyu Sun

et al.

Published: June 1, 2022

Repairing software bugs with automated solutions is a long-standing goal of researchers. Some the latest program repair (APR) tools leverage natural language processing (NLP) techniques to bugs. But languages (NL) and programming (PL) have significant differences, which leads fact that they may not be able handle PL tasks well. Moreover, due difference between vulnerability task bug task, performance these on yet known. To address issues, we attempt use large-scale pre-trained models (CodeBERT GraphCodeBERT) for based characteristics explore real-world state-of-the-art data-driven approaches repair. The results show using can better capture process features accomplish multi-line Specifically, our solution achieves advanced (single-line accuracy 95.47%, 90.06%). These outperform demonstrate adding rich data-dependent help solve more complex code problems. Besides, also discuss previous work approach, pointing out some shortcomings in future.

Language: Английский

Citations

7

When debugging encounters artificial intelligence: state of the art and open challenges DOI
Yi Song, Xiaoyuan Xie, Baowen Xu

et al.

Science China Information Sciences, Journal Year: 2024, Volume and Issue: 67(4)

Published: Feb. 21, 2024

Language: Английский

Citations

1

T5APR: Empowering automated program repair across languages through checkpoint ensemble DOI
Reza Gharibi,

Mohammad Hadi Sadreddini,

Seyed Mostafa Fakhrahmad

et al.

Journal of Systems and Software, Journal Year: 2024, Volume and Issue: 214, P. 112083 - 112083

Published: April 28, 2024

Language: Английский

Citations

1

Benchmarking and Categorizing the Performance of Neural Program Repair Systems for Java DOI Open Access
Wenkang Zhong, Chuanyi Li, Kui Liu

et al.

ACM Transactions on Software Engineering and Methodology, Journal Year: 2024, Volume and Issue: unknown

Published: Aug. 19, 2024

Recent years have seen a rise in neural program repair systems the software engineering community, which adopt advanced deep learning techniques to automatically fix bugs. Having comprehensive understanding of existing can facilitate new improvements this area and provide practical instructions for users. However, we observe two potential weaknesses current evaluation NPR systems: ① published are trained with varying data, ② roughly evaluated through number totally fixed Questions such as “what types bugs repairable systems” cannot be answered yet. Consequently, researchers not make target users no idea real affair systems. In paper, perform systematic nine state-of-the-art To fair detailed comparison, (1) build benchmark framework that supports training validating unified (2) evaluate retrained performance analysis, especially on effectiveness efficiency. We believe our tool results could offer practitioners affairs implications further facilitating NPR.

Language: Английский

Citations

1

RobustNPR: Evaluating the robustness of neural program repair models DOI

Hongliang Ge,

Wenkang Zhong, Chuanyi Li

et al.

Journal of Software Evolution and Process, Journal Year: 2023, Volume and Issue: 36(4)

Published: May 23, 2023

Abstract Due to the high cost of repairing defective programs, many researches focus on automatic program repair (APR). In recent years, new trend APR is apply neural networks mine relations between programs and corresponding patches automatically, which known as (NPR). The community, however, ignores some important properties that could impact applicability NPR systems, such robustness. For semantic‐identical buggy systems may produce totally different patches. this paper, we propose an evaluation tool named RobustNPR , first robustness tool. employs several mutators generate mutants programs. original its mutant, it checks two aspects NPR: (a) Can fix when can program? (b) for mutant? Then, evaluate four SOTA models analyze results. From results, find even best‐performing model, 20.16% success unreliable, indicates not perfect. addition, correlated with model settings other factors.

Language: Английский

Citations

2