Lecture notes in computer science, Год журнала: 2025, Номер unknown, С. 339 - 350
Опубликована: Янв. 1, 2025
Язык: Английский
Lecture notes in computer science, Год журнала: 2025, Номер unknown, С. 339 - 350
Опубликована: Янв. 1, 2025
Язык: Английский
Computers and Electronics in Agriculture, Год журнала: 2022, Номер 204, С. 107512 - 107512
Опубликована: Ноя. 26, 2022
Язык: Английский
Процитировано
43Information Fusion, Год журнала: 2023, Номер 105, С. 102217 - 102217
Опубликована: Дек. 30, 2023
Язык: Английский
Процитировано
36Field Crops Research, Год журнала: 2023, Номер 292, С. 108821 - 108821
Опубликована: Янв. 23, 2023
Язык: Английский
Процитировано
32Advanced Engineering Informatics, Год журнала: 2023, Номер 57, С. 102055 - 102055
Опубликована: Июнь 24, 2023
Язык: Английский
Процитировано
25Journal of Applied Biomedicine, Год журнала: 2022, Номер 42(2), С. 575 - 595
Опубликована: Апрель 1, 2022
Язык: Английский
Процитировано
35Journal of Intelligent Information Systems, Год журнала: 2023, Номер 60(3), С. 673 - 707
Опубликована: Май 16, 2023
Abstract Software defect prediction (SDP) plays a vital role in enhancing the quality of software projects and reducing maintenance-based risks through ability to detect defective components. SDP refers using historical data construct relationship between metrics defects via diverse methodologies. Several models, such as machine learning (ML) deep (DL), have been developed adopted recognize module defects, many methodologies frameworks presented. Class imbalance is one most challenging problems these models face binary classification. However, When distribution classes imbalanced, accuracy may be high, but cannot instances minority class, leading weak classifications. So far, little research has done previous studies that address problem class SDP. In this study, sampling method introduced improve performance ML The proposed approach based on convolutional neural network (CNN) gated recurrent unit (GRU) combined with synthetic oversampling technique plus Tomek link (SMOTE Tomek) predict defects. To establish efficiency experiments conducted benchmark datasets obtained from PROMISE repository. experimental results compared evaluated terms accuracy, precision, recall, F-measure, Matthew’s correlation coefficient (MCC), area under ROC curve (AUC), precision-recall (AUCPR), mean square error (MSE). showed more effectively balanced than original datasets, an improvement up 19% for CNN model 24% GRU AUC. We our existing approaches several standard measures. comparison demonstrated significantly outperforms state-of-the-art datasets.
Язык: Английский
Процитировано
19The Science of The Total Environment, Год журнала: 2024, Номер 948, С. 174584 - 174584
Опубликована: Июль 6, 2024
Язык: Английский
Процитировано
8Cluster Computing, Год журнала: 2023, Номер 27(3), С. 3615 - 3638
Опубликована: Окт. 28, 2023
Abstract Software defects are a critical issue in software development that can lead to system failures and cause significant financial losses. Predicting is vital aspect of ensuring quality. This significantly impact both saving time reducing the overall cost testing. During defect prediction (SDP) process, automated tools attempt predict source codes based on metrics. Several SDP models have been proposed identify prevent before they occur. In recent years, recurrent neural network (RNN) techniques gained attention for their ability handle sequential data learn complex patterns. Still, these not always suitable predicting due problem imbalanced data. To deal with this problem, study aims combine bidirectional long short-term memory (Bi-LSTM) oversampling techniques. establish effectiveness efficiency model, experiments conducted benchmark datasets obtained from PROMISE repository. The experimental results compared evaluated terms accuracy, precision, recall, f-measure, Matthew’s correlation coefficient (MCC), area under ROC curve (AUC), precision-recall (AUCPR) mean square error (MSE). average accuracy model original balanced (using random SMOTE) was 88%, 94%, And 92%, respectively. showed Bi-LSTM improves by 6 4% datasets. F-measure were 51%, 43 41% demonstrated combining positively affects performance class distributions.
Язык: Английский
Процитировано
17Software Practice and Experience, Год журнала: 2023, Номер 53(10), С. 1902 - 1927
Опубликована: Июнь 26, 2023
Summary Machine learning‐based code smell detection (CSD) has been demonstrated to be a valuable approach for improving software quality and enabling developers identify problematic patterns in code. However, previous researches have shown that the datasets commonly used train these models are heavily imbalanced. While some recent studies explored use of imbalanced learning techniques CSD, they only evaluated limited number thus their conclusions about most effective methods may biased inconclusive. To thoroughly evaluate effect machine we examine 31 with seven classifiers build CSD on four data sets. We employ evaluation metrics assess performance Wilcoxon signed‐rank test Cliff's . The results show (1) Not all significantly improve performance, but deep forest outperforms other (2) SMOTE (Synthetic Minority Over‐sampling TEchnique) is not technique resampling (3) best‐performing top‐3 little time cost detection. Therefore, provide practical guidelines. First, researchers practitioners should select appropriate (e.g., forest) ameliorate class imbalance problem. In contrast, blind application could harmful. Then, better than selected preprocess
Язык: Английский
Процитировано
14Journal of Systems and Software, Год журнала: 2023, Номер 209, С. 111934 - 111934
Опубликована: Дек. 19, 2023
The advancements in machine learning techniques have encouraged researchers to apply these a myriad of software engineering tasks that use source code analysis, such as testing and vulnerability detection. Such large number studies hinders the community from understanding current research landscape. This paper aims summarize knowledge applied for analysis. We review belonging twelve categories corresponding techniques, tools, datasets been solve them. To do so, we conducted an extensive literature search identified 494 studies. our observations findings with help Our suggest analysis is consistently increasing. synthesize commonly used steps overall workflow each task employed. identify comprehensive list available tools useable this context. Finally, discusses perceived challenges area, including availability standard datasets, reproducibility replicability, hardware resources. Editor's note: Open Science material was validated by Journal Systems Software Board.
Язык: Английский
Процитировано
14