A machine and deep learning analysis among SonarQube rules, product, and process metrics for fault prediction DOI Creative Commons
Francesco Lomio, Sergio Moreschini, Valentina Lenarduzzi

et al.

Empirical Software Engineering, Journal Year: 2022, Volume and Issue: 27(7)

Published: Oct. 1, 2022

Abstract Background Developers spend more time fixing bugs refactoring the code to increase maintainability than developing new features. Researchers investigated quality impact on fault-proneness, focusing smells and metrics. Objective We aim at advancing fault-inducing commit prediction using different variables, such as SonarQube rules, product, process metrics, adopting techniques. Method designed conducted an empirical study among 29 Java projects analyzed with SZZ algorithm identify fault-fixing commits, computing product Moreover, we fault-proneness Machine Deep Learning models. Results 58,125 commits containing 33,865 faults infected by 174 rules violated 1.8M times, which 48 software metrics were calculated. clearly identified a set of features that provided highly accurate fault (more 95% AUC). Regarding performance classifiers, higher accuracy compared Conclusion Future works might investigate whether other static analysis tools, FindBugs or Checkstyle, can provide similar results. researchers consider adoption series anomaly detection

Language: Английский

Sampling Projects in GitHub for MSR Studies DOI
Ozren Dabić, Emad Aghajani, Gabriele Bavota

et al.

Published: May 1, 2021

Almost every Mining Software Repositories (MSR) study requires, as first step, the selection of subject software repositories. These repositories are usually collected from hosting services like GitHub using specific criteria dictated by goal. For example, a related to licensing might be interested in selecting projects explicitly declaring license. Once have been defined, utilities such APIs can used "query" service. However, researchers deal with usage limitations imposed these and lack required information. search allow 30 requests per minute and, when searching repositories, only provide limited information (e.g., number commits repository is not included). To support sampling GitHub, we present GHS (GitHub Search), dataset containing 25 characteristics commits, license, etc.) 735,669 written 10 programming languages. The set has derived looking for frequently project MSR studies continuously updated (i) always fresh data about existing projects, (ii) increase indexed projects. queried through web application built that allows many combinations needed download matching repositories: https://seart-ghs.si.usi.ch.

Language: Английский

Citations

73

The Secret Life of Software Vulnerabilities: A Large-Scale Empirical Study DOI
Emanuele Iannone,

Roberta Guadagni,

Filomena Ferrucci

et al.

IEEE Transactions on Software Engineering, Journal Year: 2022, Volume and Issue: 49(1), P. 44 - 63

Published: Jan. 6, 2022

Software vulnerabilities are weaknesses in source code that can be potentially exploited to cause loss or harm. While researchers have been devising a number of methods deal with vulnerabilities, there is still noticeable lack knowledge on their software engineering life cycle, for example how introduced and removed by developers. This information design more effective vulnerability prevention detection, as well understand the granularity at which these should aim. To investigate cycle known we focus how, when, under circumstances contributions introduction projects made, long, they xmlns:xlink="http://www.w3.org/1999/xlink">removed . We consider 3,663 public patches from National Vulnerability Database—pertaining 1,096 open-source GitHub —and define an eight-step process involving both automated parts (e.g., using procedure based SZZ algorithm find vulnerability-contributing commits) manual analyses were fixed). The investigated classified 144 categories, take average least 4 contributing commits before being introduced, half them remain unfixed than one year. Most xmlns:xlink="http://www.w3.org/1999/xlink">contributions done developers high workload, often when doing maintenance activities, mostly addition new aiming implementing further checks inputs. conclude distilling practical implications detectors work assist timely identifying issues.

Language: Английский

Citations

48

Detecting code smells using industry-relevant data DOI
Lech Madeyski, Tomasz Lewowski

Information and Software Technology, Journal Year: 2022, Volume and Issue: 155, P. 107112 - 107112

Published: Nov. 21, 2022

Language: Английский

Citations

26

A Systematic Literature Review on the Code Smells Datasets and Validation Mechanisms DOI Open Access
Morteza Zakeri‐Nasrabadi, Saeed Parsa, Ehsan Esmaili

et al.

ACM Computing Surveys, Journal Year: 2023, Volume and Issue: 55(13s), P. 1 - 48

Published: May 13, 2023

The accuracy reported for code smell-detecting tools varies depending on the dataset used to evaluate tools. Our survey of 45 existing datasets reveals that adequacy a detecting smells highly depends relevant properties such as size, severity level, project types, number each type smell, smells, and ratio smelly non-smelly samples in dataset. Most support God Class, Long Method, Feature Envy while six Fowler Beck's catalog are not supported by any datasets. We conclude suffer from imbalanced samples, lack supporting restriction Java language.

Language: Английский

Citations

14

A Novel Four-Way Approach Designed With Ensemble Feature Selection for Code Smell Detection DOI Creative Commons
Inderpreet Kaur, Arvinder Kaur

IEEE Access, Journal Year: 2021, Volume and Issue: 9, P. 8695 - 8707

Published: Jan. 1, 2021

Purpose: Code smells are residuals of technical debt induced by the developers. They hinder evolution, adaptability and maintenance software. Meanwhile, they very beneficial in indicating loopholes problems bugs Machine learning has been extensively used to predict Smells research. The current study aims optimise prediction using Ensemble Learning Feature Selection techniques on three open-source Java data sets. Design Results: work Compares four varied approaches detect code performance measures Accuracy(P1), G-mean1 (P2), G-mean2 (P3), F-measure (P4). found out that values did not degrade it instead either remained same or increased with feature selection Learning. Random Forest turns be best classifier while Correlation-based selection(BFS) is amongst techniques. aggregators, i.e. ET5C2 (BFS intersection Relief Forest), ET6C2 union ET5C1 Bagging) Majority Voting give results from all aggregation combinations studied. Conclusion: Though good, but needs a lot validation for variety sets before can standardised. also pose challenge concerning diversity reliability hence exhaustive studies.

Language: Английский

Citations

29

Enhanced Bug Priority Prediction via Priority-Sensitive Long Short-Term Memory–Attention Mechanism DOI Creative Commons
Guang Yang,

Jinfeng Ji,

Jaehee Kim

et al.

Applied Sciences, Journal Year: 2025, Volume and Issue: 15(2), P. 633 - 633

Published: Jan. 10, 2025

The rapid expansion of software applications has led to an increase in the frequency bugs, which are typically reported through user-submitted bug reports. Developers prioritize these reports based on severity and project schedules. However, manual process assigning priorities is time-consuming prone inconsistencies. To address limitations, this study presents a Priority-Sensitive LSTM–Attention mechanism for automating priority prediction. proposed approach extracts features such as product component details from repositories preprocesses data ensure consistency. Priority-based feature selection applied align input with task prioritization. These processed Long Short-Term Memory (LSTM) network capture sequential dependencies, outputs further refined using Attention focus most relevant information effectiveness model was evaluated datasets Eclipse Mozilla open-source projects. Compared baseline models Naïve Bayes, Random Forest, Decision Tree, SVM, CNN, LSTM, CNN-LSTM, achieved superior performance. It recorded accuracy 93.00% 84.11% Mozilla, representing improvements 31.11% 40.39%, respectively, over models. Statistical verification confirmed that performance gains were significant. This distinguishes itself by integrating priority-based hybrid architecture, enhances prediction robustness compared existing methods. results demonstrate potential streamline prioritization, improve management efficiency, assist developers resolving high-priority issues.

Language: Английский

Citations

0

Software Metric Based Impact Analysis of Code Smells ‐ A Large Scale Empirical Study DOI Open Access
Md. Masudur Rahman, Abdus Satter, Md. Mahbubul Alam Joarder

et al.

Software Practice and Experience, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 13, 2025

ABSTRACT Context Code smells are indicators of poor design and implementation choices that negatively affect software quality maintainability. Moreover, it is difficult time‐consuming to work with a long list the code smells, as not all those have equal impact on system. So, understanding individual significant while performing refactorings priority basis. Objective Despite research efforts aimed at detecting refactoring these their metrics such size, complexity, coupling, etc. remains still unclear. Methodology To mitigate this gap, we present an empirical investigation analysis based 25 cyclomatic best our knowledge, largest study about respect number metrics. Particularly for study, identify 13 in 35 open‐source systems, analyze (1) relationship between metrics, (2) which highly impactful (3) occur frequently systems. Results The results show varying degrees correlation‐based specific some showing strong correlations multiple Three categories been identified, namely High, Moderate Low, where Long Method, Anti Singleton, Complex Class, Large Class Parameter List high impact, but frequencies except Singleton; Refused Parent Bequest, Spaghetti Blob moderate impact; rest low impact. We also observe perceptions vary from developer they most cases refactor intuition. Conclusion Our findings will help them objective‐based instead intuition‐based, be more improve quality. For example, having coupling objects metric can objective. Furthermore, only assist developers prioritizing activities provide researchers valuable insights innovate tools prioritize smells. These target thus enhance overall maintainability

Language: Английский

Citations

0

A Semisupervised Learning Approach for Code Smell Detection DOI
Ishita Kheria, Dhruv Gada, Ruhina Karani

et al.

SN Computer Science, Journal Year: 2025, Volume and Issue: 6(2)

Published: Feb. 6, 2025

Language: Английский

Citations

0

The impact of feature selection and feature reduction techniques for code smell detection: A comprehensive empirical study DOI
Zexian Zhang, Lin Zhu, Shuang Yin

et al.

Automated Software Engineering, Journal Year: 2025, Volume and Issue: 32(2)

Published: May 16, 2025

Language: Английский

Citations

0

Understanding Code Smell Detection via Code Review: A Study of the OpenStack Community DOI
Xiaofeng Han, Amjed Tahir, Peng Liang

et al.

Published: May 1, 2021

Code review plays an important role in software quality control. A typical process would involve a careful check of piece code attempt to find defects and other issues/violations. One type issues that may impact the is smells - i.e., bad programming practices lead or maintenance issues. Yet, little known about extent which are identified during reviews. To investigate concept behind reviews what actions reviewers suggest developers take response smells, we conducted empirical study using two most active OpenStack projects (Nova Neutron). We manually checked 19,146 comments obtained by keywords search random selection, got 1,190 smell-related causes taken against smells. Our analysis found 1) were not commonly reviews, 2) usually caused violation coding conventions, 3) provided constructive feedback, including fixing (refactoring) recommendations help remove 4) generally followed those actioned changes. results should closely follow conventions their avoid introducing review-based detection perceived be trustworthy approach developers, mainly because context-sensitive (as more aware context given they part project's development team).

Language: Английский

Citations

22