Code smell severity classification at class and method level with a single manually labeled imbalanced dataset DOI
Fábio do Rosario Santos, Júlio César Duarte, Ricardo Choren Noya

et al.

Published: Sept. 30, 2024

Detecting code smells through machine learning (ML) poses challenges due to its unbalanced nature and potential interpretation bias. While previous studies focused on severity tended categorize smell’s specific types, this research aims detect classify smell in a single dataset containing instances of four distinct types: God-class, Data-Class, Feature-Envy, Long-Method. This study also explores the impact applying data scaling, feature selection techniques, ensemble methods enhance ML models for purpose above. The evaluation two combined reveals that using standardization methods, Chi-square outperforms result other combinations, achieving 81.04% 81.41% accuracy XGBoost CatBoost models. Additionally, algorithm attains highest at 80.67%, even without preprocessing. Comparatively with state-of-the-art, results obtained, an 85%, by proposed approach detecting are promising suggest improvements approaches techniques effectiveness reliability real-world scenarios.

Language: Английский

A study of dealing class imbalance problem with machine learning methods for code smell severity detection using PCA-based feature selection technique DOI Creative Commons

Rajwant Singh Rao,

Seema Dewangan,

Alok Mishra

et al.

Scientific Reports, Journal Year: 2023, Volume and Issue: 13(1)

Published: Sept. 27, 2023

Abstract Detecting code smells may be highly helpful for reducing maintenance costs and raising source quality. Code facilitate developers or researchers to understand several types of design flaws. with high severity can cause significant problems the software challenges system's maintainability. It is quite essential assess detected in software, as it prioritizes refactoring efforts. The class imbalance problem also further enhances difficulties smell detection. In this study, four datasets (Data class, God Feature envy, Long method) are selected detect severity. work, an effort made address issue imbalance, which, Synthetic Minority Oversampling Technique (SMOTE) balancing technique applied. Each dataset's relevant features chosen using a feature selection based on principal component analysis. determined five machine learning techniques: K-nearest neighbor, Random forest, Decision tree, Multi-layer Perceptron, Logistic Regression. This study obtained 0.99 accuracy score forest tree approach method smell. model's performance compared its three other measurements (Precision, Recall, F-measure) estimate classification models. impact presented without applying SMOTE. results promising beneficial paving way studies area.

Language: Английский

Citations

20

An Evaluation of Multi-Label Classification Approaches for Method-Level Code Smells Detection DOI Creative Commons
Pravin Singh Yadav,

Rajwant Singh Rao,

Alok Mishra

et al.

IEEE Access, Journal Year: 2024, Volume and Issue: 12, P. 53664 - 53676

Published: Jan. 1, 2024

(1) Background: Code smell is the most popular and reliable method for detecting potential errors in code. In real-world circumstances, a single source code may have multiple smells. Multi-label detection research study. However, limited studies are available on it, there need standardized classifier reliably identifying various multi-label smells that belong to method-level category. The primary goal of this study develop rule-based (2) Methods: Binary Relevance, Label Powerset, Classifier Chain methods utilized with tree based single-label algorithms, including some ensemble algorithms paper. chi-square feature selection technique applied select relevant features. proposed model trained using 10-fold cross-validation, Random Search cross-validation parameter tuning, different performance measures used evaluate model. (3) Results: achieves 99.54% best jaccard accuracy Decision Tree. Tree incorporating outperforms alternative approaches classification. Single-label classifiers produced better results after considering correlation factor. (4) Conclusion: This will facilitate scientists programmers by providing systematic software projects saving time effort during reviews problems simultaneously. After smell, can create more organized, easier-to-understand, trustworthy programs.

Language: Английский

Citations

6

Ensemble methods with feature selection and data balancing for improved code smells classification performance DOI Creative Commons
Pravin Singh Yadav,

Rajwant Singh Rao,

Alok Mishra

et al.

Engineering Applications of Artificial Intelligence, Journal Year: 2024, Volume and Issue: 139, P. 109527 - 109527

Published: Oct. 28, 2024

Language: Английский

Citations

4

Alleviating class imbalance in Feature Envy prediction: An oversampling technique based on code entity attributes DOI
Jiamin Guo, Yangyang Zhao, Tao Zheng

et al.

Information and Software Technology, Journal Year: 2025, Volume and Issue: 180, P. 107673 - 107673

Published: Jan. 15, 2025

Language: Английский

Citations

0

Adaptive Ensemble Learning Model-Based Binary White Shark Optimizer for Software Defect Classification DOI Creative Commons

Jameel Saraireh,

Mary Agoyi,

Sofian Kassaymeh

et al.

International Journal of Computational Intelligence Systems, Journal Year: 2025, Volume and Issue: 18(1)

Published: Jan. 23, 2025

Language: Английский

Citations

0

DeepCSS: severity classification for code smell based on deep learning DOI
Yang Zhang, Chunhui Zhang, Kun Zheng

et al.

Empirical Software Engineering, Journal Year: 2025, Volume and Issue: 30(3)

Published: March 25, 2025

Language: Английский

Citations

0

Data preprocessing for machine learning based code smell detection: A systematic literature review DOI
Fábio do Rosario Santos, Ricardo Choren Noya

Information and Software Technology, Journal Year: 2025, Volume and Issue: unknown, P. 107752 - 107752

Published: April 1, 2025

Language: Английский

Citations

0

The impact of feature selection and feature reduction techniques for code smell detection: A comprehensive empirical study DOI
Zexian Zhang, Lin Zhu, Shuang Yin

et al.

Automated Software Engineering, Journal Year: 2025, Volume and Issue: 32(2)

Published: May 16, 2025

Language: Английский

Citations

0

CBReT: A Cluster-Based Resampling Technique for dealing with imbalanced data in code smell prediction DOI

Praveen Singh Thakur,

Mahipal Jadeja, Satyendra Singh Chouhan

et al.

Knowledge-Based Systems, Journal Year: 2024, Volume and Issue: 286, P. 111390 - 111390

Published: Jan. 21, 2024

Language: Английский

Citations

3

Revisiting Code Smell Severity Prioritization using learning to rank techniques DOI
Lei Liu, Guancheng Lin, Lin Zhu

et al.

Expert Systems with Applications, Journal Year: 2024, Volume and Issue: 249, P. 123483 - 123483

Published: Feb. 14, 2024

Language: Английский

Citations

2