Cited by An Automated and Flexible Multilingual Bug-Fix Dataset Construction System

(Security) Assertions by Large Language Models DOI

Rahul Kande, Hammond Pearce, Benjamin Tan

et al.

IEEE Transactions on Information Forensics and Security, Journal Year: 2024, Volume and Issue: 19, P. 4374 - 4389

Published: Jan. 1, 2024

The security of computer systems typically relies on a hardware root trust. As vulnerabilities in can have severe implications system, there is need for techniques to support verification activities. Assertion-based popular technique that involves capturing design intent set assertions be used formal or testing-based checking. However, writing security-centric challenging task. In this work, we investigate the use emerging large language models (LLMs) code generation assertion security, where primarily natural prompts, such as those one would see comments files, are produce SystemVerilog assertions. We focus our attention LLM and characterize its ability write out box, given varying levels detail prompt. an evaluation framework generates variety create benchmark suite comprising real-world designs corresponding golden reference want generate with LLM.

Language: Английский

Citations

LLM-assisted Generation of Hardware Assertions DOI

Rahul Kande, Hammond Pearce, Benjamin Tan

et al.

arXiv (Cornell University), Journal Year: 2023, Volume and Issue: unknown

Published: Jan. 1, 2023

Language: Английский

Citations

Adversarial patch generation for automated program repair DOI

Abdulaziz Alhefdhi, Hoa Khanh Dam, Thanh Le-Cong

et al.

Software Quality Journal, Journal Year: 2025, Volume and Issue: 33(1)

Published: Jan. 25, 2025

Language: Английский

Citations

PATCH: Empowering Large Language Model with Programmer-Intent Guidance and Collaborative-Behavior Simulation for Automatic Bug Fixing DOI

Yuwei Zhang,

Zhi Jin, Ying Xing

et al.

ACM Transactions on Software Engineering and Methodology, Journal Year: 2025, Volume and Issue: unknown

Published: Feb. 20, 2025

Bug fixing holds significant importance in software development and maintenance. Recent research has made substantial strides exploring the potential of large language models (LLMs) for automatically resolving bugs. However, a noticeable gap existing approaches lies oversight collaborative facets intrinsic to bug resolution, treating process as single-stage endeavor. Moreover, most solely take buggy code snippet input LLMs during patch generation stage. To mitigate aforementioned limitations, we introduce novel stage-wise framework named PATCH. Specifically, first augment with corresponding dependence context intent information better guide generating correct candidate patches. Additionally, by taking inspiration from management practices, decompose bug-fixing task into four distinct stages: reporting, diagnosis, generation, verification. These stages are performed interactively LLMs, aiming simulate behavior programmers resolution By harnessing these collective contributions, PATCH effectively enhances capability LLMs. We implement employing powerful dialogue-based LLM ChatGPT. Our evaluation on widely used benchmark BFP demonstrates that achieved performance than state-of-the-art

Language: Английский

Citations

StandUp4NPR: Standardizing SetUp for Empirically Comparing Neural Program Repair Systems DOI

Wenkang Zhong,

Hongliang Ge,

Hongfei Ai

et al.

Published: Oct. 10, 2022

Recently, the emerging trend in automatic program repair is to apply deep neural networks generate fixed code from buggy ones, called NPR (Neural Program Repair). However, existing systems are trained and evaluated under very different settings (e.g., training data, inconsistent evaluation wide-ranged candidate numbers), which makes it hard draw fair-enough conclusions when comparing them. Motivated by this, we first build a standard benchmark dataset an extensive framework tool mitigate threats for comparison. The consists of set, validation set with 144,641, 13,739 13,706 bug-fix pairs Java respectively. supports selecting specific training, validation, datasets automatically conducting pipeline evaluating models, as well easily integrating new models implementing well-defined interfaces. Then, based on tool, conduct comprehensive empirical comparison six SOTA w.r.t repairability, inclination generalizability. experimental results reveal deeper characteristics compared subvert some comparative conclusions, further verify necessity unifying setups exploring progresses systems. Meanwhile, common features they good at dealing code-delete bugs). Finally, identify promising research directions derived our findings.

Language: Английский

Citations

Repairing Security Vulnerabilities Using Pre-trained Programming Language Models DOI

Kai Huang,

Su Yang,

Hongyu Sun

et al.

Published: June 1, 2022

Repairing software bugs with automated solutions is a long-standing goal of researchers. Some the latest program repair (APR) tools leverage natural language processing (NLP) techniques to bugs. But languages (NL) and programming (PL) have significant differences, which leads fact that they may not be able handle PL tasks well. Moreover, due difference between vulnerability task bug task, performance these on yet known. To address issues, we attempt use large-scale pre-trained models (CodeBERT GraphCodeBERT) for based characteristics explore real-world state-of-the-art data-driven approaches repair. The results show using can better capture process features accomplish multi-line Specifically, our solution achieves advanced (single-line accuracy 95.47%, 90.06%). These outperform demonstrate adding rich data-dependent help solve more complex code problems. Besides, also discuss previous work approach, pointing out some shortcomings in future.

Language: Английский

Citations

When debugging encounters artificial intelligence: state of the art and open challenges DOI

Yi Song, Xiaoyuan Xie, Baowen Xu

et al.

Science China Information Sciences, Journal Year: 2024, Volume and Issue: 67(4)

Published: Feb. 21, 2024

Language: Английский

Citations

T5APR: Empowering automated program repair across languages through checkpoint ensemble DOI

Reza Gharibi,

Mohammad Hadi Sadreddini,

Seyed Mostafa Fakhrahmad

et al.

Journal of Systems and Software, Journal Year: 2024, Volume and Issue: 214, P. 112083 - 112083

Published: April 28, 2024

Language: Английский

Citations

Benchmarking and Categorizing the Performance of Neural Program Repair Systems for Java DOI

Wenkang Zhong, Chuanyi Li, Kui Liu

et al.

ACM Transactions on Software Engineering and Methodology, Journal Year: 2024, Volume and Issue: unknown

Published: Aug. 19, 2024

Recent years have seen a rise in neural program repair systems the software engineering community, which adopt advanced deep learning techniques to automatically fix bugs. Having comprehensive understanding of existing can facilitate new improvements this area and provide practical instructions for users. However, we observe two potential weaknesses current evaluation NPR systems: ① published are trained with varying data, ② roughly evaluated through number totally fixed Questions such as “what types bugs repairable systems” cannot be answered yet. Consequently, researchers not make target users no idea real affair systems. In paper, perform systematic nine state-of-the-art To fair detailed comparison, (1) build benchmark framework that supports training validating unified (2) evaluate retrained performance analysis, especially on effectiveness efficiency. We believe our tool results could offer practitioners affairs implications further facilitating NPR.

Language: Английский

Citations

RobustNPR: Evaluating the robustness of neural program repair models DOI

Hongliang Ge,

Wenkang Zhong, Chuanyi Li

et al.

Journal of Software Evolution and Process, Journal Year: 2023, Volume and Issue: 36(4)

Published: May 23, 2023

Abstract Due to the high cost of repairing defective programs, many researches focus on automatic program repair (APR). In recent years, new trend APR is apply neural networks mine relations between programs and corresponding patches automatically, which known as (NPR). The community, however, ignores some important properties that could impact applicability NPR systems, such robustness. For semantic‐identical buggy systems may produce totally different patches. this paper, we propose an evaluation tool named RobustNPR , first robustness tool. employs several mutators generate mutants programs. original its mutant, it checks two aspects NPR: (a) Can fix when can program? (b) for mutant? Then, evaluate four SOTA models analyze results. From results, find even best‐performing model, 20.16% success unreliable, indicates not perfect. addition, correlated with model settings other factors.

Language: Английский

Citations