On the Effectiveness of Manual and Automatic Unit Test Generation: Ten Years Later DOI
Domenico Serra, Giovanni Grano, Fabio Palomba

et al.

Published: May 1, 2019

Good unit tests play a paramount role when it comes to foster and evaluate software quality. However, writing effective is an extremely costly time consuming practice. To reduce such burden for developers, researchers devised ingenious techniques automatically generate test suite existing code bases. Nevertheless, how generated cases fare against manually written ones open research question. In 2008, Bacchelli et.al. conducted initial case study comparing automatic suites. Since in the last ten years we have witnessed huge amount of work on novel approaches tools generation, this paper revise their using current as well complementing method by evaluating these tools' ability finding regressions. Preprint [https://doi.org/10.5281/zenodo.2595232], dataset [https://doi.org/10.6084/m9.figshare.7628642].

Language: Английский

Scented since the beginning: On the diffuseness of test smells in automatically generated test code DOI
Giovanni Grano, Fabio Palomba, Dario Di Nucci

et al.

Journal of Systems and Software, Journal Year: 2019, Volume and Issue: 156, P. 312 - 327

Published: July 9, 2019

Language: Английский

Citations

46

RETRACTED ARTICLE: The smell of fear: on the relation between test smells and flaky tests DOI
Fabio Palomba, Andy Zaidman

Empirical Software Engineering, Journal Year: 2019, Volume and Issue: 24(5), P. 2907 - 2946

Published: Feb. 28, 2019

Language: Английский

Citations

34

Finding Safety Violations of AI-Enabled Control Systems through the Lens of Synthesized Proxy Programs DOI Open Access
Jieke Shi, Zhou Yang, Junda He

et al.

ACM Transactions on Software Engineering and Methodology, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 27, 2025

Given the increasing adoption of modern AI-enabled control systems, ensuring their safety and reliability has become a critical task in software testing. One prevalent approach to testing systems is falsification, which aims find an input signal that causes system violate formal specification using optimization algorithms. However, applying falsification poses two significant challenges: (1) it requires execute numerous candidate test inputs, can be time-consuming, particularly for with AI models have many parameters, (2) multiple requirements are typically defined as conjunctive specification, difficult existing approaches comprehensively cover. This paper introduces Synthify , framework tailored i.e., equipped controllers. Our performs two-phase process. At start, synthesizes program implements one or few linear controllers serve proxy controller. mimics controller's functionality but computationally more efficient. Then, employs \(\epsilon\) -greedy strategy sample promising sub-specification from specification. It then uses Simulated Annealing-based algorithm violations sampled system. To evaluate we compare PSY-TaLiRo state-of-the-art industrial-strength tool, on 8 publicly available systems. On average, achieves 83.5% higher success rate compared same budget trials. Additionally, our method 12.8 \(\times\) faster finding single violation than baseline. The found by also diverse those covering 137.7% sub-specifications.

Language: Английский

Citations

0

How Do Automatically Generated Unit Tests Influence Software Maintenance? DOI
Sina Shamshiri, José Miguel Rojas, Juan Pablo Galeotti

et al.

Published: April 1, 2018

Generating unit tests automatically saves time over writing manually and can lead to higher code coverage. However, generated are usually not based on realistic scenarios, therefore generally considered be less readable. This places a question mark their practical value: Every test fails, developer has decide whether this failure revealed regression fault in the program under test, or itself needs updated. Does fact that harder read outweigh time-savings gained by automated generation, render them more of hindrance than help for software maintenance? In order answer question, we performed an empirical study which participants were presented with written failing asked identify fix cause failure. Our experiment two replications resulted total 150 data points 75 participants. Whilst maintenance activities take longer when working tests, found developers equally effective tests. implications how generation is best used practice, it indicates need research into

Language: Английский

Citations

31

Branch coverage prediction in automated testing DOI
Giovanni Grano,

Timofey V. Titov,

Sebastiano Panichella

et al.

Journal of Software Evolution and Process, Journal Year: 2019, Volume and Issue: 31(9)

Published: March 8, 2019

Abstract Software testing is crucial in continuous integration (CI). Ideally, at every commit, all the test cases should be executed, and moreover, new generated for source code. This especially true a Continuous Test Generation (CTG) environment, where automatic generation of integrated into pipeline. In this context, developers want to achieve certain minimum level coverage software build. However, executing and, generating ones classes commit not feasible. As consequence, have select which subset has tested and/or targeted by test‐case generation. We argue that knowing priori branch can achieved with test‐data tools help taking informed decision about those issues. paper, we investigate possibility use source‐code metrics predict tools. four different categories features assess prediction on large data set involving more than 3'000 Java classes. compare machine learning algorithms conduct fine‐grained feature analysis aimed investigating factors most impact accuracy. Moreover, extend our investigation search budgets. Our evaluation shows best model achieves an average 0.15 0.21 MAE nested cross‐validation over budgets, respectively, EVOSUITE RANDOOP . Finally, discussion results demonstrate relevance coupling‐related

Language: Английский

Citations

28

How Students Unit Test: Perceptions, Practices, and Pitfalls DOI
Gina R. Bai, Justin Smith, Kathryn T. Stolee

et al.

Published: June 18, 2021

Unit testing is reported as one of the skills that graduating students lack, yet it an essential skill for professional software developers. Understanding challenges face during can help inform practices education. To end, we conduct exploratory study to reveal students' perceptions unit and encounter when practicing testing. We surveyed 54 from two universities gave them tasks, involving black-box test design white-box implementation. For used projects prior work in studying test-first development among quantitatively analyzed survey responses code properties, qualitatively identified mistakes smells code. further report on our experience running this with students.

Language: Английский

Citations

21

Methods2Test DOI Open Access
Michele Tufano,

Shao Kun Deng,

Neel Sundaresan

et al.

Published: May 23, 2022

Unit testing is an essential part of the software development process, which helps to identify issues with source code in early stages and prevent regressions. Machine learning has emerged as viable approach help developers generate automated unit tests. However, generating reliable test cases that are semantically correct capable catching bugs or unintended behavior via machine requires large, metadata-rich, datasets. In this paper we present Methods2Test: a supervised dataset mapped corresponding methods under (i.e., focal methods). This contains 780,944 pairs JUnit tests methods, extracted from total 91,385 Java open projects hosted on GitHub licenses permitting re-distribution. The main challenge behind creation Methods2Test was establish mapping between case relevant method. To aim, designed set heuristics, based developers' best practices testing, likely method for given case. facilitate further analysis, store rich metadata each method-test pair JSON-formatted files. Additionally, extract textual corpus at different context levels, provide both raw tokenized forms, order enable researchers train evaluate models Automated Test Generation. publicly available at: https://github.com/microsoft/methods2test

Language: Английский

Citations

15

Developer-centric test amplification DOI Creative Commons
Carolin Brandt, Andy Zaidman

Empirical Software Engineering, Journal Year: 2022, Volume and Issue: 27(4)

Published: May 2, 2022

Abstract Automatically generating test cases for software has been an active research topic many years. While current tools can generate powerful regression or crash-reproducing cases, these are often kept separately from the maintained suite. In this paper, we leverage developer’s familiarity with amplified existing, manually written developer tests. Starting issues reported by developers in previous studies, investigate what aspects important to design a developer-centric amplification approach, that provides taken over into their We conduct 16 semi-structured interviews supported our prototypical designs of approach and corresponding exploration tool. extend tool DSpot, easier understand. Our IntelliJ plugin TestCube "Image missing" empowers explore familiar environment. From interviews, gather 52 observations summarize 23 result categories give two key recommendations on how future designers make better suited amplification.

Language: Английский

Citations

15

What Do We Know About Readability of Test Code? - A Systematic Mapping Study DOI
Dietmar Winkler, Pirmin Urbanke, Rudolf Ramler

et al.

2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Journal Year: 2022, Volume and Issue: 10, P. 1167 - 1174

Published: March 1, 2022

The readability of software code is a key success criterion for understanding and maintaining systems tests. In industry practice, limited number guidelines aim improving assessing the (test) code. Although several studies focus on investigating code, we observed research work that focuses test this paper systematically characteristics, factors, assessment criteria have an impact We build Systematic Mapping Study (SMS) to identify readability, legibility, understandability support improve maintenance tasks. result set includes 16 further analysis. majority publications investigations automatically generated (88%), often evaluated with surveys access (44 %). approaches at isolated combination different aspects within framework can help better assess justify system maintenance.

Language: Английский

Citations

14

How the Experience of Development Teams Relates to Assertion Density of Test Classes DOI
Gemma Catolino, Fabio Palomba, Andy Zaidman

et al.

Published: Sept. 1, 2019

The impact of developers' experience on several development practices has been widely investigated in the past. One most promising research fields is software testing, as many researchers found significant correlations between and testing effectiveness. In this paper, we aim at further studying relation, by focusing how teams' associated with assertion density, i.e., number assertions per test class KLOC, that previously shown an effective way to decrease fault density. We perform a mixed-methods empirical study. First, devise statistical model relating other control factors density classes belonging 12 projects. This enables us investigate whether comes out statistically factor explain Second, contrast findings survey study conducted 57 developers, who were asked their opinions developer's related they add code. Our suggest existence relationship: one hand, team's systems have investigated; developers confirm importance team composition for production

Language: Английский

Citations

20