Testing with Fewer Resources: An Adaptive Approach to Performance-Aware Test Case Generation DOI
Giovanni Grano, Christoph Laaber, Annibale Panichella

et al.

IEEE Transactions on Software Engineering, Journal Year: 2019, Volume and Issue: 47(11), P. 2332 - 2347

Published: Oct. 11, 2019

Automated test case generation is an effective technique to yield high-coverage suites. While the majority of research effort has been devoted satisfying coverage criteria, a recent trend emerged towards optimizing other non-coverage aspects. In this regard, runtime and memory usage are two essential dimensions: less expensive tests reduce resource demands for process later regression testing phases. This study shows that performance-aware requires solving main challenges: providing good approximation with minimal overhead avoiding detrimental effects on both final fault detection effectiveness. To tackle these challenges, we conceived set performance proxies -- inspired by previous work provide reasonable estimation execution costs (i.e., usage). Thus, propose adaptive strategy, called aDynaMOSA, which leverages extending DynaMOSA, state-of-the-art evolutionary algorithm in unit testing. Our empirical involving 110 non-trivial Java classes reveals our approach generates suite statistically significant improvements (-25%) heap consumption (-15%) compared DynaMOSA. Additionally, aDynaMOSA comparable results DynaMOSA over seven different criteria similar investigation also highlights without adaptiveness) not sufficient generate more performant cases compromising overall coverage.

Language: Английский

Automated Test Case Generation as a Many-Objective Optimisation Problem with Dynamic Selection of the Targets DOI
Annibale Panichella, Fitsum Meshesha Kifetew, Paolo Tonella

et al.

IEEE Transactions on Software Engineering, Journal Year: 2017, Volume and Issue: 44(2), P. 122 - 158

Published: Feb. 7, 2017

The test case generation is intrinsically a multi-objective problem, since the goal covering multiple targets (e.g., branches). Existing search-based approaches either consider one target at time or aggregate all into single fitness function (whole-suite approach). Multi and many-objective optimisation algorithms (MOAs) have never been applied to this because existing do not scale number of coverage objectives that are typically found in real-world software. In addition, final for MOAs find alternative trade-off solutions objective space, while interesting only those cases more uncovered targets. paper, we present Dynamic Many-Objective Sorting Algorithm (DynaMOSA), novel solver specifically designed address problem context testing. DynaMOSA extends our previous technique (MOSA) with dynamic selection based on control dependency hierarchy. Such extension makes approach effective efficient limited search budget. We carried out an empirical study 346 Java classes using three criteria (i.e., statement, branch, strong mutation coverage) assess performance respect whole-suite (WS), its archive-based variant (WSA) MOSA. results show outperforms WSA 28 percent branch (+8 average) 27 (+11 killed mutants average). It WS 51 statement coverage, leading +11 average. Moreover, predecessor MOSA 19 +8 code

Language: Английский

Citations

283

ChatGPT vs SBST: A Comparative Assessment of Unit Test Suite Generation DOI
Yutian Tang,

Zhijie Liu,

Zhichao Zhou

et al.

IEEE Transactions on Software Engineering, Journal Year: 2024, Volume and Issue: 50(6), P. 1340 - 1359

Published: March 29, 2024

Recent advancements in large language models (LLMs) have demonstrated exceptional success a wide range of general domain tasks, such as question answering and following instructions. Moreover, LLMs shown potential various software engineering applications. In this study, we present systematic comparison test suites generated by the ChatGPT LLM state-of-the-art SBST tool EvoSuite. Our is based on several critical factors, including correctness, readability, code coverage, bug detection capability. By highlighting strengths weaknesses (specifically ChatGPT) generating unit cases compared to EvoSuite, work provides valuable insights into performance solving problems. Overall, our findings underscore pave way for further research area.

Language: Английский

Citations

22

An Industrial Evaluation of Unit Test Generation: Finding Real Faults in a Financial Application DOI

M. Almasi,

Hadi Hemmati, Gordon Fraser

et al.

Published: May 1, 2017

Automated unit test generation has been extensively studied in the literature recent years. Previous studies on open source systems have shown that tools are quite effective at detecting faults, but how and applicable they an industrial application? In this paper, we investigate question using a life insurance pension products calculator engine owned by SEB Life & Pension Holding AB Riga Branch. To study fault-finding effectiveness, extracted 25 real faults from version history of software project, applied two up-to-date for Java, EVOSUITE RANDOOP, which implement search-based feedback-directed random generation, respectively. Automatically generated suites detected up to 56.40% (EVOSUITE) 38.00% (RANDOOP) these faults. The analysis our results demonstrates challenges need be addressed order improve fault detection tools. particular, classification undetected shows 97.62% them depend either "specific primitive values" (50.00%) or construction "complex state configuration objects" (47.62%). applicability, surveyed developers application under their experience opinions about cases. This leads insights requirements academic prototypes successful technology transfer research practice, such as integrate with popular build tools, readability tests.

Language: Английский

Citations

136

An empirical evaluation of evolutionary algorithms for unit test suite generation DOI Creative Commons
José Campos, Yan Ge,

Nasser Albunian

et al.

Information and Software Technology, Journal Year: 2018, Volume and Issue: 104, P. 207 - 235

Published: Aug. 22, 2018

Evolutionary algorithms have been shown to be effective at generating unit test suites optimised for code coverage. While many specific aspects of these evaluated in detail (e.g., length and different kinds techniques aimed improving performance, like seeding), the influence choice evolutionary algorithm has date seen less attention literature. Since it is theoretically impossible design an that best on all possible problems, a common approach software engineering problems first try most algorithm, genetic only afterwards refine or compare with other see if any them more suited addressed problem. The objective this paper perform analysis, order shed light search applied generation. We empirically evaluate thirteen two random approaches selection non-trivial open source classes. All are implemented EvoSuite generation tool, which includes recent optimisations such as use archive during optimisation multiple coverage criteria. Our study shows makes clearly better than testing, confirms DynaMOSA many-objective results show can substantial performance whole suite optimisation. Although we make recommendation practice, no superior cases, suggesting future work improved

Language: Английский

Citations

79

A large scale empirical comparison of state-of-the-art search-based test case generators DOI
Annibale Panichella, Fitsum Meshesha Kifetew, Paolo Tonella

et al.

Information and Software Technology, Journal Year: 2018, Volume and Issue: 104, P. 236 - 256

Published: Aug. 21, 2018

Language: Английский

Citations

56

Generating unit tests with descriptive names or: would you name your children thing1 and thing2? DOI Open Access
Ermira Daka, José Miguel Rojas, Gordon Fraser

et al.

Published: July 10, 2017

The name of a unit test helps developers to understand the purpose and scenario test, names support when navigating amongst sets tests. When tests are generated automatically, however, they tend be given non-descriptive such as "test0", which provide none benefits descriptive can give test. underlying challenge is that automatically typically do not represent real scenarios have no clear other than covering code, makes naming them di cult. In this paper, we present an automated approach generates for by summarizing API-level coverage goals. optimized short, relation covered code under allow uniquely distinguish in suite. An empirical evaluation with 47 participants shows agree synthesized names, equally manually written names. Study were even more accurate faster at matching compared derived

Language: Английский

Citations

51

Search-Based Crash Reproduction and Its Impact on Debugging DOI
Mozhan Soltani, Annibale Panichella, Arie van Deursen

et al.

IEEE Transactions on Software Engineering, Journal Year: 2018, Volume and Issue: 46(12), P. 1294 - 1317

Published: Oct. 24, 2018

Software systems fail. These failures are often reported to issue tracking systems, where they prioritized and assigned responsible developers be investigated. When debug software, need reproduce the failure in order verify whether their fix actually prevents from happening again. Since manually reproducing each could a complex task, several automated techniques have been proposed tackle this problem. Despite showing advancements area, showed various types of limitations. In paper, we present EvoCrash, new approach crash reproduction based on novel evolutionary algorithm, called Guided Genetic Algorithm (GGA). We report our empirical study using EvoCrash 54 real-world crashes, as well results controlled experiment, involving human participants, assess impact tests debugging. Based results, outperforms state-of-the-art uncovers that undetected by classical coverage-based unit test generation tools. addition, observed helps provide fixes more take less time when debugging, compared debugging fixing code without tests.

Language: Английский

Citations

45

Test smells 20 years later: detectability, validity, and reliability DOI Creative Commons
Annibale Panichella, Sebastiano Panichella, Gordon Fraser

et al.

Empirical Software Engineering, Journal Year: 2022, Volume and Issue: 27(7)

Published: Sept. 20, 2022

Abstract Test smells aim to capture design issues in test code that reduces its maintainability. These have been extensively studied and generally found quite prevalent both human-written automatically generated test-cases. However, most evidence of prevalence is based on specific static detection rules. Although those are the original, conceptual definitions various smells, recent empirical studies indicate developers perceive warnings raised by tools as overly strict non-representative maintainability quality suites. This leads us re-assess smell tools’ accuracy investigate detectability more broadly. Specifically, we construct a hand-annotated dataset spanning hundreds suites written two generation ( EvoSuite JTExpert ) performed multi-stage, cross-validated manual analysis identify presence six types these. We then use this labeling benchmark performance external validity tools—one widely used prior work one recently introduced with express goal match developer perceptions smells. Our results primarily show current vocabulary highly mismatched real concerns: multiple were ubiquitous developer-written tests but virtually never correlated semantic or flaws; machine-generated actually often scored better, reality, suffered from host problems not well-captured Current strategies poorly characterized these suites; particular, older tool’s misclassified over 70% missing instances (false negatives) marking many smell-free smelly positives). common patterns can be improve tools, refine update definition certain highlight yet uncharacterized issues. findings suggest need for (i) appropriate metrics development practice, (ii) accurate evaluated industrial contexts.

Language: Английский

Citations

22

Designing PairBuddy—A Conversational Agent for Pair Programming DOI
Peter Robe, Sandeep Kaur Kuttal

ACM Transactions on Computer-Human Interaction, Journal Year: 2022, Volume and Issue: 29(4), P. 1 - 44

Published: May 5, 2022

From automated customer support to virtual assistants, conversational agents have transformed everyday interactions, yet despite phenomenal progress, no agent exists for programming tasks. To understand the design space of such an agent, we prototyped PairBuddy—an interactive pair partner—based on research from agents, software engineering, education, human-robot psychology, and artificial intelligence. We iterated PairBuddy’s using a series Wizard-of-Oz studies. Our pilot study six programmers showed promising results provided insights toward interface design. second 14 was positively praised across all skill levels. active application soft skills—adaptability, motivation, social presence—as navigator increased participants’ confidence trust, while its technical skills—code contributions, just-in-time feedback, creativity support—as driver helped participants realize their own solutions. PairBuddy takes first step towards Alexa-like partner.

Language: Английский

Citations

21

Learning how to search: generating effective test cases through adaptive fitness function selection DOI Creative Commons
Hussein Almulla, Gregory Gay

Empirical Software Engineering, Journal Year: 2022, Volume and Issue: 27(2)

Published: Jan. 11, 2022

Abstract Search-based test generation is guided by feedback from one or more fitness functions—scoring functions that judge solution optimality. Choosing informative crucial to meeting the goals of a tester. Unfortunately, many goals—such as forcing class-under-test throw exceptions, increasing suite diversity, and attaining Strong Mutation Coverage— do not have effective function formulations. We propose such requires treating identification secondary optimization step. An adaptive algorithm can vary selection could adjust its throughout process maximize goal attainment, based on current population suites. To this hypothesis, we implemented two reinforcement learning algorithms in EvoSuite unit framework, used these dynamically set during for three identified above. evaluated our EvoSuiteFIT, Java case examples. EvoSuiteFIT techniques attain significant improvements goals, show limited third when number generations evolution fixed. Additionally, detects faults missed other techniques. The ability allows strategic choices efficiently produce suites, examining offers insight into how testing goals. find powerful technique apply an does already exist achieving goal.

Language: Английский

Citations

20