Simulating the Effect of Test Flakiness on Fault Localization Effectiveness DOI
Béla Vancsics, Tamás Gergely, Árpád Beszédes

et al.

Published: Feb. 1, 2020

Test flakiness (non-deterministic behavior of test cases) is an increasingly serious concern in industrial practice. However, there are relatively little research results available that systematically address the analysis and mitigation this phenomena. The dominant approach to handle flaky tests still detecting removing them from automated executions. some reports showed amount many cases so high we should rather start working on approaches operate presence tests. In work, investigate how affects effectiveness Spectrum Based Fault Localization (SBFL), a popular class software (FL), which heavily relies case execution outcomes. We performed simulation based experiment find out what relationship between level fault localization effectiveness. Our could help users FL methods understand implications area design novel algorithms take into account flakiness.

Language: Английский

Machine learning techniques for code smell detection: A systematic literature review and meta-analysis DOI
Muhammad Ilyas Azeem, Fabio Palomba, Lin Shi

et al.

Information and Software Technology, Journal Year: 2019, Volume and Issue: 108, P. 115 - 138

Published: Jan. 4, 2019

Language: Английский

Citations

246

Comparing Heuristic and Machine Learning Approaches for Metric-Based Code Smell Detection DOI
Fabiano Pecorelli, Fabio Palomba, Dario Di Nucci

et al.

Published: May 1, 2019

Code smells represent poor implementation choices performed by developers when enhancing source code. Their negative impact on code maintainability and comprehensibility has been widely shown in the past several techniques to automatically detect them have devised. Most of these are based heuristics, namely they compute a set metrics combine creating detection rules; while reasonable accuracy, recent trend is represented use machine learning where used as predictors smelliness artefacts. Despite advances field, there still noticeable lack knowledge whether can actually be more accurate than traditional heuristic-based approaches. To fill this gap, paper we propose large-scale study empirically compare performance machine-learning-based for metric-based smell detection. We consider five types models with DECOR, state-of-the-art approach. Key findings emphasize need further research aimed at improving effectiveness both heuristic approaches detection: DECOR generally achieves better baseline, its precision too low make it usable practice.

Language: Английский

Citations

99

Not all bugs are the same: Understanding, characterizing, and classifying bug types DOI
Gemma Catolino, Fabio Palomba, Andy Zaidman

et al.

Journal of Systems and Software, Journal Year: 2019, Volume and Issue: 152, P. 165 - 181

Published: March 7, 2019

Language: Английский

Citations

97

A large empirical assessment of the role of data balancing in machine-learning-based code smell detection DOI
Fabiano Pecorelli, Dario Di Nucci, Coen De Roover

et al.

Journal of Systems and Software, Journal Year: 2020, Volume and Issue: 169, P. 110693 - 110693

Published: June 8, 2020

Language: Английский

Citations

89

A Survey of Flaky Tests DOI Open Access
Owain Parry,

Gregory M. Kapfhammer,

Michael Hilton

et al.

ACM Transactions on Software Engineering and Methodology, Journal Year: 2021, Volume and Issue: 31(1), P. 1 - 74

Published: Oct. 26, 2021

Tests that fail inconsistently, without changes to the code under test, are described as flaky . Flaky tests do not give a clear indication of presence software bugs and thus limit reliability test suites contain them. A recent survey developers found 59% claimed deal with on monthly, weekly, or daily basis. As well being detrimental developers, have also been shown applicability useful techniques in testing research. In general, one can think threat validity any methodology assumes outcome only depends source it covers. this article, we systematically body literature relevant research, amounting 76 papers. We split our analysis into four parts: addressing causes tests, their costs consequences, detection strategies, approaches for mitigation repair. Our findings implications consequences how software-testing community deals flakiness, pertinent practitioners interest those wanting familiarize themselves research area.

Language: Английский

Citations

67

Assessing and restoring reproducibility of Jupyter notebooks DOI Open Access
Jiawei Wang,

Tzu-yang Kuo,

Li Li

et al.

Published: Dec. 21, 2020

Jupyter notebooks---documents that contain live code, equations, visualizations, and narrative text---now are among the most popular means to compute, present, discuss disseminate scientific findings. In principle, notebooks should easily allow reproduce extend computations their findings; but in practice, this is not case. The individual code cells can be executed any order, with identifier usages preceding definitions results computations. a sample of 936 published would executable we found 73% them reproducible straightforward approaches, requiring humans infer (and often guess) order which authors created cells.

Language: Английский

Citations

64

How bugs are born: a model to identify how bugs are introduced in software components DOI Creative Commons
Gema Rodríguez-Pérez, Gregório Robles, Alexander Serebrenik

et al.

Empirical Software Engineering, Journal Year: 2020, Volume and Issue: 25(2), P. 1294 - 1340

Published: Feb. 4, 2020

Abstract When identifying the origin of software bugs, many studies assume that “a bug was introduced by lines code were modified to fix it”. However, this assumption does not always hold and at least in some cases, these are responsible for introducing bug. For example, when caused a change an external API. The lack empirical evidence makes it impossible assess how important cases therefore, which extent is valid. To advance direction, better understand bugs “are born”, we propose model defining criteria identify first snapshot evolving system exhibits This model, based on perfect test idea, decides whether observed after software. Furthermore, studied model’s carefully analyzing 116 two different open source projects. manual analysis helped classify root cause those created manually curated datasets with bug-introducing changes any code. Finally, used evaluate performance four existing SZZ-based algorithms detecting changes. We found very accurate, especially multiple commits found; F-Score varies from 0.44 0.77, while percentage true positives exceed 63%. Our results show prevalent assumption, it”, just one case system. Finding what trivial: can be developers code, or irrespective Thus, further research towards understanding projects could help improve design integration tests other procedures make development more robust.

Language: Английский

Citations

55

Understanding flaky tests: the developer’s perspective DOI

Moritz Eck,

Fabio Palomba, Marco Castelluccio

et al.

Published: Aug. 9, 2019

Flaky tests are software that exhibit a seemingly random outcome (pass or fail) when run against the same, identical code. Previous work has examined fixes to flaky and proposed automated solutions locate as well fix tests--we complement it by examining perceptions of developers about nature, relevance, challenges this phenomenon. We asked 21 professional classify 200 they previously fixed, in terms nature flakiness, origin fixing effort. analysis with information strategy. Subsequently, we conducted an online survey 121 median industrial programming experience five years. Our research shows that: The flakiness is due several different causes, four which have never been reported before, despite being most costly fix; perceived significant vast majority developers, regardless their team's size project's domain, can effects on resource allocation, scheduling, reliability test suite; report face regard mostly reproduction behavior identification cause for flakiness. Data materials [https://doi.org/10.5281/zenodo.3265785].

Language: Английский

Citations

52

On the role of data balancing for machine learning-based code smell detection DOI
Fabiano Pecorelli, Dario Di Nucci, Coen De Roover

et al.

Published: Aug. 8, 2019

Code smells can compromise software quality in the long term by inducing technical debt. For this reason, many approaches aimed at identifying these design flaws have been proposed last decade. Most of them are based on heuristics which a set metrics (e.g., code metrics, process metrics) is used to detect smelly components. However, techniques suffer subjective interpretation, low agreement between detectors, and threshold dependability. To overcome limitations, previous work applied Machine Learning that learn from datasets without needing any definition. more recent has shown not always suitable for smell detection due highly unbalanced nature problem. In study we investigate several able mitigate data unbalancing issues understand their impact ML-based detection. Our findings highlight number limitations open with respect usage balancing

Language: Английский

Citations

50

Scented since the beginning: On the diffuseness of test smells in automatically generated test code DOI
Giovanni Grano, Fabio Palomba, Dario Di Nucci

et al.

Journal of Systems and Software, Journal Year: 2019, Volume and Issue: 156, P. 312 - 327

Published: July 9, 2019

Language: Английский

Citations

46