Improved Flaky Test Detection with Black-Box Approach and Test Smells DOI

David Carmo,

Luísa Gonçalves, Ana Flávia Nepomuceno Dias

et al.

2022 IEEE Symposium on Computers and Communications (ISCC), Journal Year: 2023, Volume and Issue: unknown, P. 245 - 251

Published: July 9, 2023

Flaky tests can pose a challenge for software development, as they produce inconsistent results even when there are no changes to the code or test. This leads unreliable and makes it difficult diagnose troubleshoot any issues. In this study, we aim identify flaky test cases in development using black-box approach. indicators of quality cause issues development. Our proposed model, Fast-Flaky, achieved best cross-validation results. per-project validation, showed an overall increase accuracy but decreased other metrics. However, were some projects where improved with pre-processing techniques. These provide practitioners method identifying may inspire further research on effectiveness different techniques use additional smells.

Language: Английский

FlakyCat: Predicting Flaky Tests Categories using Few-Shot Learning DOI
Amal Akli, Guillaume Haben, Sarra Habchi

et al.

Published: May 1, 2023

Flaky tests are that yield different outcomes when run on the same version of a program. This non-deterministic behaviour plagues continuous integration with false signals, wasting developers' time and reducing their trust in test suites. Studies highlighted importance keeping flakiness-free. Recently, research community has been pushing towards detection flaky by suggesting many static dynamic approaches. While promising, those approaches mainly focus classifying as or not and, even high performances reported, it remains challenging to understand cause flakiness. part is crucial for researchers developers aim fix it. To help comprehension given test, we propose FlakyCat, first approach classify based root category. FlakyCat relies CodeBERT code representation leverages Siamese networks train multi-class classifier. We evaluate set 451 collected from open-source Java projects. Our evaluation shows categorises accurately, an F1 score 73%. Furthermore, investigate performance our each category, revealing Async waits, Unordered collections Time-related accurately classified, while Concurrency-related more predict. Finally, facilitate FlakyCat's predictions, present new technique CodeBERT-based model interpretability highlights statements influencing categorization.

Language: Английский

Citations

7

Quantizing Large-Language Models for Predicting Flaky Tests DOI
Shanto Rahman,

Abdelrahman Baz,

Saša Misailovíc

et al.

Published: May 27, 2024

Language: Английский

Citations

2

Test flakiness’ causes, detection, impact and responses: A multivocal review DOI Creative Commons
Amjed Tahir, Shawn Rasheed, Jens Dietrich

et al.

Journal of Systems and Software, Journal Year: 2023, Volume and Issue: 206, P. 111837 - 111837

Published: Sept. 7, 2023

Flaky tests (tests with non-deterministic outcomes) pose a major challenge for software testing. They are known to cause significant issues, such as reducing the effectiveness and efficiency of testing delaying releases. In recent years, there has been an increased interest in flaky tests, research focusing on different aspects flakiness, identifying causes, detection methods mitigation strategies. Test flakiness also become key discussion point practitioners (in blog posts, technical magazines, etc.) impact is felt across industry. This paper presents multivocal review that investigates how topic, have addressed both practice. Out 560 articles we reviewed, identified analysed total 200 focused (composed 109 academic 91 grey literature articles/posts) structured body relevant knowledge using four dimensions: detection, responses. For each those dimensions, provide categorization classify existing research, discussions, tools With this, comprehensive current snapshot thinking test covering views industrial practices, identify limitations opportunities future research.

Language: Английский

Citations

5

Do Automatic Test Generation Tools Generate Flaky Tests? DOI Creative Commons
Martin Gruber, Muhammad Firhard Roslan, Owain Parry

et al.

Published: Feb. 6, 2024

Non-deterministic test behavior, or flakiness, is common and dreaded among developers. Researchers have studied the issue proposed approaches to mitigate it. However, vast majority of previous work has only considered developer-written tests. The prevalence nature flaky tests produced by generation tools remain largely unknown. We ask whether such also produce how these differ from ones. Furthermore, we evaluate mechanisms that suppress generation. sample 6 356 projects written in Java Python. For each project, generate using EvoSuite (Java) Pynguin (Python), execute 200 times, looking for inconsistent outcomes. Our results show flakiness at least as generated Nevertheless, existing suppression implemented are effective alleviating this (71.7 % fewer tests). Compared tests, causes distributed differently. Their non-deterministic behavior more frequently caused randomness, rather than networking concurrency. Using suppression, remaining significantly any previously reported, where most attributable runtime optimizations EvoSuite-internal resource thresholds. These insights, with accompanying dataset, can help maintainers improve tools, give recommendations developers serve a foundation future research

Language: Английский

Citations

1

Can ChatGPT Repair Non-Order-Dependent Flaky Tests? DOI
Yang Chen, Reyhaneh Jabbarvand

Published: April 14, 2024

Language: Английский

Citations

1

iPFlakies: A Framework for Detecting and Fixing Python Order-Dependent Flaky Tests DOI
Ruixin Wang, Yang Chen, Wing Lam

et al.

2022 IEEE/ACM 44th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), Journal Year: 2022, Volume and Issue: unknown, P. 120 - 124

Published: May 1, 2022

Developers typically run tests after code changes. Flaky tests, which are that can nondeterministically pass and fail when on the same version of code, mislead developers about their recent Much prior work flaky is focused Java projects. One prominent category order-dependent (OD) or depending order in run. For example, our proposed using other test suite to fix (or correctly set up) state needed by OD pass.Unlike programming languages have received less attention. To help with this problem, another piece recently studied Python projects detected many tests. Unfortunately, did not identify suites be used fill gap, we propose iPFlakies, a framework for automatically detecting fixing Using extend work's dataset include (1) reproduce (2) patches Our finds reproducing passing failing results difficult iPFlakies effective at aid future research, make framework, improvements, experimental infrastructure publicly available.

Language: Английский

Citations

5

Test Code Flakiness in Mobile Apps: The Developer's Perspective DOI
Valeria Pontillo, Fabio Palomba, Filomena Ferrucci

et al.

Published: Jan. 1, 2023

Context: Test flakiness arises when test cases have a non-deterministic, intermittent behavior that leads them to either pass or fail run against the same code. While researchers been contributing detection, classification, and removal of flaky tests with several empirical studies automated techniques, little is known about how problem in mobile applications.Objective: We point out lack knowledge on: (1) The prominence harmfulness problem; (2) most frequent root causes inducing flakiness; (3) strategies applied by practitioners deal it practice. An improved understanding these matters may lead software engineering research community assess need for tailoring existing instruments context brand-new approaches focus on peculiarities identified.Method: address this gap means an study into developer's perception flakiness. first perform systematic grey literature review elicit developers discuss wild. Then, we complement through survey involves 130 aims at analyzing their experience matter.Result: results indicate are often concerned connected user interface elements. In addition, our reveals perceived as critical developers, who pointed major production code- source code design-related flakiness, other than long-term effects recurrent tests. Furthermore, lets diagnosing fixing processes currently adopted limitations emerge.Conclusion: conclude distilling lessons learned, implications, future directions.

Language: Английский

Citations

1

Test Code Flakiness in Mobile Apps: The Developer’s Perspective DOI Creative Commons
Valeria Pontillo, Fabio Palomba, Filomena Ferrucci

et al.

Information and Software Technology, Journal Year: 2024, Volume and Issue: 168, P. 107394 - 107394

Published: Jan. 6, 2024

Test flakiness arises when test cases have a non-deterministic, intermittent behavior that leads them to either pass or fail run against the same code. While researchers been contributing detection, classification, and removal of flaky tests with several empirical studies automated techniques, little is known about how problem in mobile applications. We point out lack knowledge on: (1) The prominence harmfulness problem; (2) most frequent root causes inducing flakiness; (3) strategies applied by practitioners deal it practice. An improved understanding these matters may lead software engineering research community assess need for tailoring existing instruments context brand-new approaches focus on peculiarities identified. address this gap means an study into developer's perception flakiness. first perform systematic grey literature review elicit developers discuss wild. Then, we complement through survey involves 130 aims at analyzing their experience matter. results indicate are often concerned connected user interface elements. In addition, our reveals perceived as critical developers, who pointed major production code- source code design-related flakiness, other than long-term effects recurrent tests. Furthermore, lets diagnosing fixing processes currently adopted limitations emerge. conclude distilling lessons learned, implications, future directions.

Language: Английский

Citations

0

Neurosymbolic Repair of Test Flakiness DOI
Yang Chen, Reyhaneh Jabbarvand

Published: Sept. 11, 2024

Language: Английский

Citations

0

Non-Flaky and Nearly-Optimal Time-based Treatment of Asynchronous Wait Web Tests DOI
Yu Pei, Jeongju Sohn, Sarra Habchi

et al.

ACM Transactions on Software Engineering and Methodology, Journal Year: 2024, Volume and Issue: unknown

Published: Sept. 13, 2024

Asynchronous waits are a common root cause of flaky tests and major time-influential factor web application testing. We build dataset 49 reproducible asynchronous wait their fixes from 26 open-source projects to study characteristics in Our reveals that developers adjusted time address flakiness about 63% cases (31 out 49), even when the underlying causes lie elsewhere. From this, we introduce TRaf , an automated time-based repair for applications. determines appropriate times calls applications by analyzing code similarity past change history. Its key insight is efficient can be inferred current or codebase since tend repeat similar mistakes. analysis shows statically suggest shorter alleviate async immediately upon detection, reducing test execution 11.1% compared timeout values initially chosen developers. With optional dynamic tuning, reduce 16.8% its initial refinement developer-written patches 6.2% post-refinements these original patches. Overall, sent 16 pull requests our dataset, each fixing one test, So far, three have been accepted

Language: Английский

Citations

0