Smart innovation, systems and technologies, Год журнала: 2025, Номер unknown, С. 573 - 585
Опубликована: Янв. 1, 2025
Язык: Английский
Smart innovation, systems and technologies, Год журнала: 2025, Номер unknown, С. 573 - 585
Опубликована: Янв. 1, 2025
Язык: Английский
Applied Sciences, Год журнала: 2022, Номер 12(20), С. 10321 - 10321
Опубликована: Окт. 13, 2022
Code smells are the result of not following software engineering principles during development, especially in design and coding phase. It leads to low maintainability. To evaluate quality its maintainability, code smell detection can be helpful. Many machine learning algorithms being used detect smells. In this study, we applied five ensemble two deep Four datasets were analyzed: Data class, God Feature-envy, Long-method datasets. previous works, stacking dataset results found acceptable, but there is scope improvement. A class balancing technique (SMOTE) was handle imbalance problem The Chi-square feature extraction select more relevant features each dataset. All obtained highest accuracy—100% for with different selected sets metrics, poorest accuracy, 91.45%, achieved by Max voting method Feature-envy twelve metrics.
Язык: Английский
Процитировано
42Journal of Computer Science and Technology, Год журнала: 2020, Номер 35(6), С. 1428 - 1445
Опубликована: Ноя. 1, 2020
Язык: Английский
Процитировано
52IEEE Access, Год журнала: 2021, Номер 9, С. 162869 - 162883
Опубликована: Янв. 1, 2021
Code smells detection helps in improving understandability and maintainability of software while reducing the chances system failure. In this study, six machine learning algorithms have been applied to predict code smells. For purpose, four smell datasets (God-class, Data-class, Feature-envy, Long-method) are considered which generated from 74 open-source systems. To evaluate performance on these datasets, 10-fold cross validation technique is that predicts model by partitioning original dataset into a training set train test it. Two feature selection techniques enhance our prediction accuracy. The Chi-squared Wrapper-based used improve accuracy total methods choosing top metrics each dataset. Results obtained applying two compared. algorithms, grid search-based parameter optimization applied. 100% was for Long-method using Logistic Regression algorithm with all features worst 95.20 % Naive Bayes chi-square technique.
Язык: Английский
Процитировано
52Опубликована: Май 1, 2021
When developers make changes to their code, they typically run regression tests detect if recent (re) introduce any bugs. However, many are flaky, and outcomes can change non-deterministically, failing without apparent cause. Flaky a significant nuisance in the development process, since it more difficult for trust outcome of tests, hence, is important know which flaky. The traditional approach identify flaky rerun them multiple times: test observed both passing on same definitely We conducted very large empirical study looking by rerunning suites 24 projects 10,000 times each, found that even with this reruns, some previously identified were still not detected. propose FlakeFlagger, novel collects set features describing behavior each test, then predicts likely be based similar behavioral features. FlakeFlagger correctly labeled as at least state-of-the-art classifier, but reported far fewer false positives. This lower positive rate translates directly saved time researchers who use classification result guide expensive detection processes. Evaluated our dataset 23 outperformed prior (by F1 score) 16 tied 4 projects. Our results indicate effective identifying running time-consuming detectors.
Язык: Английский
Процитировано
47Knowledge-Based Systems, Год журнала: 2022, Номер 255, С. 109737 - 109737
Опубликована: Авг. 22, 2022
Язык: Английский
Процитировано
36Information and Software Technology, Год журнала: 2021, Номер 144, С. 106783 - 106783
Опубликована: Ноя. 25, 2021
Code smells are symptoms of wrong design decisions or coding shortcuts that may increase defect rate and decrease maintainability. Research on code is accelerating, focusing smell detection using as predictors. Recent research shows even between software developers, agreement what constitutes a low, but several publications claim the high performance algorithms—which seems counterintuitive, considering algorithms should be taught data labeled by developers. This paper aims to investigate possible reasons for inconsistencies studies in applied machine learning compared It focuses reproducibility existing studies. A systematic literature review was performed among conference journal articles published 1999 2020 assess state those papers. quasi-gold standard procedure used validate search. Modeling process descriptions, reproduction scripts, sets, techniques their creation were analyzed. We obtained from 46 publications. 22 them contained detailed description modeling process, 17 included any (data set, results, scripts) 15 sets. In most publications, analyzed projects hand-picked researchers. Most do not include form an online package, although this has started change recently—8% before 2018 full 22% years 2018–2019. Ones package usually use group website personal one. Dedicated archives still rarely packages. recommend researchers complete packages well-established instead own websites.
Язык: Английский
Процитировано
36ACM Computing Surveys, Год журнала: 2023, Номер 55(13s), С. 1 - 48
Опубликована: Май 13, 2023
The accuracy reported for code smell-detecting tools varies depending on the dataset used to evaluate tools. Our survey of 45 existing datasets reveals that adequacy a detecting smells highly depends relevant properties such as size, severity level, project types, number each type smell, smells, and ratio smelly non-smelly samples in dataset. Most support God Class, Long Method, Feature Envy while six Fowler Beck's catalog are not supported by any datasets. We conclude suffer from imbalanced samples, lack supporting restriction Java language.
Язык: Английский
Процитировано
15Empirical Software Engineering, Год журнала: 2023, Номер 28(3)
Опубликована: Май 1, 2023
Язык: Английский
Процитировано
14Journal of Systems and Software, Год журнала: 2023, Номер 209, С. 111934 - 111934
Опубликована: Дек. 19, 2023
The advancements in machine learning techniques have encouraged researchers to apply these a myriad of software engineering tasks that use source code analysis, such as testing and vulnerability detection. Such large number studies hinders the community from understanding current research landscape. This paper aims summarize knowledge applied for analysis. We review belonging twelve categories corresponding techniques, tools, datasets been solve them. To do so, we conducted an extensive literature search identified 494 studies. our observations findings with help Our suggest analysis is consistently increasing. synthesize commonly used steps overall workflow each task employed. identify comprehensive list available tools useable this context. Finally, discusses perceived challenges area, including availability standard datasets, reproducibility replicability, hardware resources. Editor's note: Open Science material was validated by Journal Systems Software Board.
Язык: Английский
Процитировано
14IEEE Access, Год журнала: 2024, Номер 12, С. 53664 - 53676
Опубликована: Янв. 1, 2024
(1) Background: Code smell is the most popular and reliable method for detecting potential errors in code. In real-world circumstances, a single source code may have multiple smells. Multi-label detection research study. However, limited studies are available on it, there need standardized classifier reliably identifying various multi-label smells that belong to method-level category. The primary goal of this study develop rule-based (2) Methods: Binary Relevance, Label Powerset, Classifier Chain methods utilized with tree based single-label algorithms, including some ensemble algorithms paper. chi-square feature selection technique applied select relevant features. proposed model trained using 10-fold cross-validation, Random Search cross-validation parameter tuning, different performance measures used evaluate model. (3) Results: achieves 99.54% best jaccard accuracy Decision Tree. Tree incorporating outperforms alternative approaches classification. Single-label classifiers produced better results after considering correlation factor. (4) Conclusion: This will facilitate scientists programmers by providing systematic software projects saving time effort during reviews problems simultaneously. After smell, can create more organized, easier-to-understand, trustworthy programs.
Язык: Английский
Процитировано
6