Spotting Code Mutation for Predictive Mutation Testing DOI
Yifan Zhao, Yizhou Chen, Zeyu Sun

et al.

Published: Oct. 18, 2024

Language: Английский

Automatic Commit Message Generation: A Critical Review and Directions for Future Work DOI Creative Commons
Yuxia Zhang,

Zhiqing Qiu,

Klaas-Jan Stol

et al.

IEEE Transactions on Software Engineering, Journal Year: 2024, Volume and Issue: 50(4), P. 816 - 835

Published: Feb. 12, 2024

Commit messages are critical for code comprehension and software maintenance. Writing a high-quality message requires skill effort. To support developers reduce their effort on this task, several approaches have been proposed to automatically generate commit messages. Despite the promising performance reported, we identified three significant prevalent threats in these automated approaches: 1) datasets used train evaluate contain considerable amount of 'noise'; 2) current only consider commits limited diff size; 3) can subject message, not body. The first limitation may let models 'learn' inappropriate training stage, also lead inflated results evaluation. other two considerably weaken practical usability approaches. Further, with rapid emergence large language (LLMs) that show superior many engineering tasks, it is worth asking: LLMs address challenge long diffs whole generation? This article reports an empirical study assess impact state-of-the-art auto generators We collected data Top 1,000 most-starred Java projects GitHub systematically removed noisy bot-submitted meaningless then compared four representative before after removal messages, or different lengths diffs. conducted qualitative survey investigate perspectives simply generating subjects. Finally, LLMs, namely UniXcoder ChatGPT, more demonstrate great value, work needed mature state-of-the-art, be avenue trying limitations. Our analyses provide insights future achieve better practice.

Language: Английский

Citations

6

You Don’t Have to Say Where to Edit! jLED – Joint Learning to Localize and Edit Source Code DOI Open Access
Weiguo Pian, Yinghua Li, Haoye Tian

et al.

ACM Transactions on Software Engineering and Methodology, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 13, 2025

Learning to edit code automatically is becoming more and feasible. Thanks recent advances in Neural Machine Translation (NMT), various case studies are being investigated where patches produced assessed either (using test suites) or by developers themselves. An appealing setting remains when the developer must provide a natural language input of requirement for change. A proof concept literature showed that it indeed feasible translate these requirements into changes. advancement, MODIT [8], has shown promising results editing leveraging language, context, location information as input. However, struggles unavailable. While several [29, 81] have demonstrated ability source without explicitly specifying location, they still tend generate edits with less accuracy at line level. In this work, we address challenge generating precise information, scenario consider crucial practical adoption NMT development. To end, develop novel joint training approach both localization editions. Building benchmark based on over 70k commits (patches messages), demonstrate our jLED ( j oint L ocalize ED it) effective. ablation study further demonstrates importance design choice training.

Language: Английский

Citations

0

CodeDoctor: multi-category code review comment generation DOI
Yingling Li, Yuhan Wu, Z.M. Wang

et al.

Automated Software Engineering, Journal Year: 2025, Volume and Issue: 32(1)

Published: Feb. 27, 2025

Language: Английский

Citations

0

DeepCRCEval: Revisiting the Evaluation of Code Review Comment Generation DOI Creative Commons
Junyi Lu,

Xiaojia Li,

Zihan Hua

et al.

Lecture notes in computer science, Journal Year: 2025, Volume and Issue: unknown, P. 43 - 64

Published: Jan. 1, 2025

Language: Английский

Citations

0

Delving into Parameter-Efficient Fine-Tuning in Code Change Learning: An Empirical Study DOI
Shuo Liu, Jacky Keung, Zhen Yang

et al.

2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Journal Year: 2024, Volume and Issue: unknown, P. 465 - 476

Published: March 12, 2024

Language: Английский

Citations

3

Just-in-time software defect prediction via bi-modal change representation learning DOI
Yuze Jiang, Beijun Shen, Xiaodong Gu

et al.

Journal of Systems and Software, Journal Year: 2024, Volume and Issue: 219, P. 112253 - 112253

Published: Oct. 11, 2024

Language: Английский

Citations

1

Fusing Code Searchers DOI
Shangwen Wang, Mingyang Geng, Bo Lin

et al.

IEEE Transactions on Software Engineering, Journal Year: 2024, Volume and Issue: 50(7), P. 1852 - 1866

Published: May 20, 2024

Code search, which consists in retrieving relevant code snippets from a codebase based on given query, provides developers with useful references during software development. Over the years, techniques alternatively adopting different mechanisms to compute relevance score between query and snippet have been proposed advance state of art this domain, including those relying information retrieval, supervised learning, pre-training. Despite that, usefulness existing is still compromised since they cannot effectively handle all diversified queries practice. To tackle challenge, we present Dancer , data fusion searcher. Our intuition (also basic hypothesis study) that may complement each other because intrinsic differences their working mechanisms. We validated via an exploratory study. Based propose fuse results generated by search so advantage standalone technique can be fully leveraged. Specifically, treat as retrieval system leverage well-known approaches aggregate systems. evaluate six two large-scale datasets, exploit eight classic incorporate results. experiments show best approach able outperform 35% - 550% 65% 825% terms MRR (mean reciprocal rank) respectively.

Language: Английский

Citations

1

Parameter-efficient fine-tuning of pre-trained code models for just-in-time defect prediction DOI

Manar Abu Talib,

Ali Bou Nassif, Mohammad Azzeh

et al.

Neural Computing and Applications, Journal Year: 2024, Volume and Issue: 36(27), P. 16911 - 16940

Published: June 3, 2024

Language: Английский

Citations

1

Divide-and-Conquer: Automating Code Revisions via Localization-and-Revision DOI
Shangwen Wang, Bo Lin, Liqian Chen

et al.

ACM Transactions on Software Engineering and Methodology, Journal Year: 2024, Volume and Issue: unknown

Published: Sept. 24, 2024

Despite its effectiveness in ensuring software quality, code review remains a labor-intensive and time-consuming task. In order to alleviate this burden on developers, researchers have proposed the automation of activities, particularly focusing automating revisions. This can benefit both authors, as they are relieved from manual task revision, reviewers, spared addressing minor flaws through comments. While current revision approaches shown promising results, typically operate within single phase, which requiring is treated input deep learning model, revised directly generated sequence-to-sequence transformation. Consequently, these tackle challenges localization (i.e., where revise) how simultaneously. Attempting handle entire complex process with model goes against principle “Divide-and-Conquer”, encourages breaking down problems into smaller sub-problems them individually. fact, we observed that existing often yield inaccurate results phases. paper, present two-phase approach aims overcome aforementioned limitations by adhering “Divide-and-Conquer” principle. Our comprises two key components: localizer, responsible for identifying specific parts require revisions, reviser, tasked generating based result. Extensive experiments conducted widely-used datasets demonstrate substantial superiority our over approaches. For instance, when revising reviewer’s comments, achieves success rate 20% implementing ground-truth comparison, pre-trained CodeT5 less than 16% same test set, contains 16K+ cases.

Language: Английский

Citations

1

A code change‐oriented approach to just‐in‐time defect prediction with multiple input semantic fusion DOI Open Access
Teng Huang, Huiqun Yu, Guisheng Fan

et al.

Expert Systems, Journal Year: 2024, Volume and Issue: 41(12)

Published: Aug. 27, 2024

Abstract Recent research found that fine‐tuning pre‐trained models is superior to training from scratch in just‐in‐time (JIT) defect prediction. However, existing approaches using have their limitations. First, the input length constrained by models.Secondly, inputs are change‐agnostic.To address these limitations, we propose JIT‐Block, a JIT prediction method combines multiple semantics changed block as fundamental unit. We restructure JIT‐Defects4J dataset used previous research. then conducted comprehensive comparison eleven performance metrics, including both effort‐aware and effort‐agnostic measures, against six state‐of‐the‐art baseline models. The results demonstrate on task, our approach outperforms all showing improvements ranging 1.5% 800% metrics 0.3% 57% metrics. For code line localization three out of five 11% 140%.

Language: Английский

Citations

0