Testing with Fewer Resources: An Adaptive Approach to Performance-Aware Test Case Generation DOI
Giovanni Grano, Christoph Laaber, Annibale Panichella

et al.

IEEE Transactions on Software Engineering, Journal Year: 2019, Volume and Issue: 47(11), P. 2332 - 2347

Published: Oct. 11, 2019

Automated test case generation is an effective technique to yield high-coverage suites. While the majority of research effort has been devoted satisfying coverage criteria, a recent trend emerged towards optimizing other non-coverage aspects. In this regard, runtime and memory usage are two essential dimensions: less expensive tests reduce resource demands for process later regression testing phases. This study shows that performance-aware requires solving main challenges: providing good approximation with minimal overhead avoiding detrimental effects on both final fault detection effectiveness. To tackle these challenges, we conceived set performance proxies -- inspired by previous work provide reasonable estimation execution costs (i.e., usage). Thus, propose adaptive strategy, called aDynaMOSA, which leverages extending DynaMOSA, state-of-the-art evolutionary algorithm in unit testing. Our empirical involving 110 non-trivial Java classes reveals our approach generates suite statistically significant improvements (-25%) heap consumption (-15%) compared DynaMOSA. Additionally, aDynaMOSA comparable results DynaMOSA over seven different criteria similar investigation also highlights without adaptiveness) not sufficient generate more performant cases compromising overall coverage.

Language: Английский

Scented since the beginning: On the diffuseness of test smells in automatically generated test code DOI
Giovanni Grano, Fabio Palomba, Dario Di Nucci

et al.

Journal of Systems and Software, Journal Year: 2019, Volume and Issue: 156, P. 312 - 327

Published: July 9, 2019

Language: Английский

Citations

46

RETRACTED ARTICLE: The smell of fear: on the relation between test smells and flaky tests DOI
Fabio Palomba, Andy Zaidman

Empirical Software Engineering, Journal Year: 2019, Volume and Issue: 24(5), P. 2907 - 2946

Published: Feb. 28, 2019

Language: Английский

Citations

34

How Do Automatically Generated Unit Tests Influence Software Maintenance? DOI
Sina Shamshiri, José Miguel Rojas, Juan Pablo Galeotti

et al.

Published: April 1, 2018

Generating unit tests automatically saves time over writing manually and can lead to higher code coverage. However, generated are usually not based on realistic scenarios, therefore generally considered be less readable. This places a question mark their practical value: Every test fails, developer has decide whether this failure revealed regression fault in the program under test, or itself needs updated. Does fact that harder read outweigh time-savings gained by automated generation, render them more of hindrance than help for software maintenance? In order answer question, we performed an empirical study which participants were presented with written failing asked identify fix cause failure. Our experiment two replications resulted total 150 data points 75 participants. Whilst maintenance activities take longer when working tests, found developers equally effective tests. implications how generation is best used practice, it indicates need research into

Language: Английский

Citations

31

Branch coverage prediction in automated testing DOI
Giovanni Grano,

Timofey V. Titov,

Sebastiano Panichella

et al.

Journal of Software Evolution and Process, Journal Year: 2019, Volume and Issue: 31(9)

Published: March 8, 2019

Abstract Software testing is crucial in continuous integration (CI). Ideally, at every commit, all the test cases should be executed, and moreover, new generated for source code. This especially true a Continuous Test Generation (CTG) environment, where automatic generation of integrated into pipeline. In this context, developers want to achieve certain minimum level coverage software build. However, executing and, generating ones classes commit not feasible. As consequence, have select which subset has tested and/or targeted by test‐case generation. We argue that knowing priori branch can achieved with test‐data tools help taking informed decision about those issues. paper, we investigate possibility use source‐code metrics predict tools. four different categories features assess prediction on large data set involving more than 3'000 Java classes. compare machine learning algorithms conduct fine‐grained feature analysis aimed investigating factors most impact accuracy. Moreover, extend our investigation search budgets. Our evaluation shows best model achieves an average 0.15 0.21 MAE nested cross‐validation over budgets, respectively, EVOSUITE RANDOOP . Finally, discussion results demonstrate relevance coupling‐related

Language: Английский

Citations

28

How Students Unit Test: Perceptions, Practices, and Pitfalls DOI
Gina R. Bai, Justin Smith, Kathryn T. Stolee

et al.

Published: June 18, 2021

Unit testing is reported as one of the skills that graduating students lack, yet it an essential skill for professional software developers. Understanding challenges face during can help inform practices education. To end, we conduct exploratory study to reveal students' perceptions unit and encounter when practicing testing. We surveyed 54 from two universities gave them tasks, involving black-box test design white-box implementation. For used projects prior work in studying test-first development among quantitatively analyzed survey responses code properties, qualitatively identified mistakes smells code. further report on our experience running this with students.

Language: Английский

Citations

21

Methods2Test DOI Open Access
Michele Tufano,

Shao Kun Deng,

Neel Sundaresan

et al.

Published: May 23, 2022

Unit testing is an essential part of the software development process, which helps to identify issues with source code in early stages and prevent regressions. Machine learning has emerged as viable approach help developers generate automated unit tests. However, generating reliable test cases that are semantically correct capable catching bugs or unintended behavior via machine requires large, metadata-rich, datasets. In this paper we present Methods2Test: a supervised dataset mapped corresponding methods under (i.e., focal methods). This contains 780,944 pairs JUnit tests methods, extracted from total 91,385 Java open projects hosted on GitHub licenses permitting re-distribution. The main challenge behind creation Methods2Test was establish mapping between case relevant method. To aim, designed set heuristics, based developers' best practices testing, likely method for given case. facilitate further analysis, store rich metadata each method-test pair JSON-formatted files. Additionally, extract textual corpus at different context levels, provide both raw tokenized forms, order enable researchers train evaluate models Automated Test Generation. publicly available at: https://github.com/microsoft/methods2test

Language: Английский

Citations

15

Developer-centric test amplification DOI Creative Commons
Carolin Brandt, Andy Zaidman

Empirical Software Engineering, Journal Year: 2022, Volume and Issue: 27(4)

Published: May 2, 2022

Abstract Automatically generating test cases for software has been an active research topic many years. While current tools can generate powerful regression or crash-reproducing cases, these are often kept separately from the maintained suite. In this paper, we leverage developer’s familiarity with amplified existing, manually written developer tests. Starting issues reported by developers in previous studies, investigate what aspects important to design a developer-centric amplification approach, that provides taken over into their We conduct 16 semi-structured interviews supported our prototypical designs of approach and corresponding exploration tool. extend tool DSpot, easier understand. Our IntelliJ plugin TestCube "Image missing" empowers explore familiar environment. From interviews, gather 52 observations summarize 23 result categories give two key recommendations on how future designers make better suited amplification.

Language: Английский

Citations

15

Finding Safety Violations of AI-Enabled Control Systems through the Lens of Synthesized Proxy Programs DOI Open Access
Jieke Shi, Zhou Yang, Junda He

et al.

ACM Transactions on Software Engineering and Methodology, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 27, 2025

Given the increasing adoption of modern AI-enabled control systems, ensuring their safety and reliability has become a critical task in software testing. One prevalent approach to testing systems is falsification, which aims find an input signal that causes system violate formal specification using optimization algorithms. However, applying falsification poses two significant challenges: (1) it requires execute numerous candidate test inputs, can be time-consuming, particularly for with AI models have many parameters, (2) multiple requirements are typically defined as conjunctive specification, difficult existing approaches comprehensively cover. This paper introduces Synthify , framework tailored i.e., equipped controllers. Our performs two-phase process. At start, synthesizes program implements one or few linear controllers serve proxy controller. mimics controller's functionality but computationally more efficient. Then, employs \(\epsilon\) -greedy strategy sample promising sub-specification from specification. It then uses Simulated Annealing-based algorithm violations sampled system. To evaluate we compare PSY-TaLiRo state-of-the-art industrial-strength tool, on 8 publicly available systems. On average, achieves 83.5% higher success rate compared same budget trials. Additionally, our method 12.8 \(\times\) faster finding single violation than baseline. The found by also diverse those covering 137.7% sub-specifications.

Language: Английский

Citations

0

REACCEPT: Automated Co-evolution of Production and Test Code Based on Dynamic Validation and Large Language Models DOI
Jianlei Chi,

X. Wang,

Yuhan Huang

et al.

Proceedings of the ACM on software engineering., Journal Year: 2025, Volume and Issue: 2(ISSTA), P. 1234 - 1256

Published: June 22, 2025

Synchronizing production and test code, known as PT co-evolution, is critical for software quality. Given the significant manual effort involved, researchers have tried automating co-evolution using predefined heuristics machine learning models. However, existing solutions are still incomplete. Most approaches only detect flag obsolete cases, leaving developers to manually update them. Meanwhile, may suffer from low accuracy, especially when applied real-world projects. In this paper, we propose ReAccept, a novel approach leveraging large language models (LLMs), retrievalaugmented generation (RAG), dynamic validation fully automate with high accuracy. ReAccept employs an experience-guided generate prompt templates identification subsequent processes. After updating case, performs by checking syntax, verifying semantics, assessing coverage. If fails, leverages error messages iteratively refine patch. To evaluate ReAccept's effectiveness, conducted extensive experiments dataset of 537 Java projects compared performance several stateof-the-art methods. The evaluation results show that achieved accuracy 60.16% on correctly identified surpassing state-of-the-art technique CEPROT 90%. These findings demonstrate can effectively maintain improve overall quality, significantly reduce maintenance effort.

Language: Английский

Citations

0

What Do We Know About Readability of Test Code? - A Systematic Mapping Study DOI
Dietmar Winkler, Pirmin Urbanke, Rudolf Ramler

et al.

2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Journal Year: 2022, Volume and Issue: 10, P. 1167 - 1174

Published: March 1, 2022

The readability of software code is a key success criterion for understanding and maintaining systems tests. In industry practice, limited number guidelines aim improving assessing the (test) code. Although several studies focus on investigating code, we observed research work that focuses test this paper systematically characteristics, factors, assessment criteria have an impact We build Systematic Mapping Study (SMS) to identify readability, legibility, understandability support improve maintenance tasks. result set includes 16 further analysis. majority publications investigations automatically generated (88%), often evaluated with surveys access (44 %). approaches at isolated combination different aspects within framework can help better assess justify system maintenance.

Language: Английский

Citations

14