Developer-centric test amplification DOI Creative Commons
Carolin Brandt, Andy Zaidman

Empirical Software Engineering, Journal Year: 2022, Volume and Issue: 27(4)

Published: May 2, 2022

Abstract Automatically generating test cases for software has been an active research topic many years. While current tools can generate powerful regression or crash-reproducing cases, these are often kept separately from the maintained suite. In this paper, we leverage developer’s familiarity with amplified existing, manually written developer tests. Starting issues reported by developers in previous studies, investigate what aspects important to design a developer-centric amplification approach, that provides taken over into their We conduct 16 semi-structured interviews supported our prototypical designs of approach and corresponding exploration tool. extend tool DSpot, easier understand. Our IntelliJ plugin TestCube "Image missing" empowers explore familiar environment. From interviews, gather 52 observations summarize 23 result categories give two key recommendations on how future designers make better suited amplification.

Language: Английский

CodaMosa: Escaping Coverage Plateaus in Test Generation with Pre-trained Large Language Models DOI
Caroline Lemieux, Jeevana Priya Inala, Shuvendu K. Lahiri

et al.

Published: May 1, 2023

Search-based software testing (SBST) generates high-coverage test cases for programs under with a combination of case generation and mutation. SBST's performance relies on there being reasonable probability generating that exercise the core logic program test. Given such cases, SBST can then explore space around them to various parts program. This paper explores whether Large Language Models (LLMs) code, as OpenAI's Codex, be used help exploration. Our proposed algorithm, CodaMosa, conducts until its coverage improvements stall, asks Codex provide example under-covered functions. These examples redirect search more useful areas space. On an evaluation over 486 benchmarks, CodaMosa achieves statistically significantly higher many benchmarks (173 279) than it reduces (10 4), compared LLM-only baselines.

Language: Английский

Citations

110

An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation DOI
Max Schäfer, Sarah Nadi, Aryaz Eghbali

et al.

IEEE Transactions on Software Engineering, Journal Year: 2023, Volume and Issue: 50(1), P. 85 - 105

Published: Nov. 28, 2023

Unit tests play a key role in ensuring the correctness of software. However, manually creating unit is laborious task, motivating need for automation. Large Language Models (LLMs) have recently been applied to various aspects software development, including their suggested use automated generation tests, but while requiring additional training or few-shot learning on examples existing tests. This paper presents large-scale empirical evaluation effectiveness LLMs test without manual effort. Concretely, we consider an approach where LLM provided with prompts that include signature and implementation function under test, along usage extracted from documentation. Furthermore, if generated fails, our attempts generate new fixes problem by re-prompting model failing error message. We implement TestPilot , adaptive LLM-based tool JavaScript automatically generates methods given project's API. evaluate using OpenAI's gpt3.5-turbo 25 npm packages total 1,684 API functions. The achieve median statement coverage 70.2% branch 52.8%. In contrast, state-of-the feedback-directed technique, Nessie, achieves only 51.3% 25.6% coverage. experiments excluding parts information included show all components contribute towards effective suites. also find 92.8% 's $\leq$ 50% similarity (as measured normalized edit distance), none them being exact copies. Finally, run two LLMs, older code-cushman-002 StarCoder which process publicly documented. Overall, observed similar results former (68.2% coverage), somewhat worse latter (54.0% suggesting influenced size set LLM, does not fundamentally depend specific model.

Language: Английский

Citations

69

On learning meaningful assert statements for unit test cases DOI

Cody Watson,

Michele Tufano, Kevin Moran

et al.

Published: June 27, 2020

Software testing is an essential part of the software lifecycle and requires a substantial amount time effort. It has been estimated that developers spend close to 50% their on code they write. For these reasons, long standing goal within research community (partially) automate testing. While several techniques tools have proposed automatically generate test methods, recent work criticized quality usefulness assert statements generate. Therefore, we employ Neural Machine Translation (NMT) based approach called Atlas(AuTomatic Learning Assert Statements) meaningful for methods. Given method focal (i.e.,the main under test), Atlas can predict statement assess correctness method. We applied thousands methods from GitHub projects it was able exact manually written by in 31% cases when only considering top-1 predicted assert. When top-5 statements, matches cases. These promising results hint potential ofour as (i) complement automatic case generation techniques, (ii) completion support developers, whocan benefit recommended while writing code.

Language: Английский

Citations

94

How many of all bugs do we find? a study of static bug detectors DOI
Andrew Habib, Michael Pradel

Published: Aug. 20, 2018

Static bug detectors are becoming increasingly popular and widely used by professional software developers. While most work on focuses whether they find bugs at all, how many false positives report in addition to legitimate warnings, the inverse question is often neglected: How of all real-world do static find? This paper addresses this studying results applying three an extended version Defects4J dataset that consists 15 Java projects with 594 known bugs. To decide which these tools detect, we use a novel methodology combines automatic analysis warnings manual validation each candidate detected bug. The study show that: (i) non-negligible amount bugs, (ii) different mostly complementary other, (iii) current miss large majority studied A detailed missed shows some could have been found variants existing detectors, while others domain-specific problems not match any pattern. These findings help potential users such assess their utility, motivate outline directions for future detection, provide basis comparisons detection other finding techniques, as automated testing.

Language: Английский

Citations

91

Generating accurate assert statements for unit test cases using pretrained transformers DOI
Michele Tufano, Dawn Drain, A. Svyatkovskiy

et al.

Published: May 17, 2022

Unit testing represents the foundational basis of software pyramid, beneath integration and end-to-end testing. Automated researchers have proposed a variety techniques to assist developers in this time-consuming task.

Language: Английский

Citations

53

Automated assertion generation via information retrieval and its integration with deep learning DOI
Hao Yu, Yiling Lou, Ke Sun

et al.

Proceedings of the 44th International Conference on Software Engineering, Journal Year: 2022, Volume and Issue: unknown, P. 163 - 174

Published: May 21, 2022

Unit testing could be used to validate the correctness of basic units software system under test. To reduce manual efforts in conducting unit testing, research community has contributed with tools that automatically generate test cases, including inputs and oracles (e.g., assertions). Recently, ATLAS, a deep learning (DL) based approach, was proposed assertions for on other already written tests. Despite promising, effectiveness ATLAS is still limited. improve effectiveness, this work, we make first attempt leverage Information Retrieval (IR) assertion generation propose an IR-based technique retrieval retrieved-assertion adaptation. In addition, integration approach combine our DL-based ATLAS) further effectiveness. Our experimental results show outperforms state-of-the-art integrating can achieve higher accuracy. convey important message information competitive worthwhile pursue engineering tasks such as generation, should seriously considered by given recent years solutions have been over-popularly adopted tasks.

Language: Английский

Citations

31

Mutation Testing of Quantum Programs: A Case Study With Qiskit DOI Creative Commons
Daniel Fortunato, José Campos, Rui Abreu

et al.

IEEE Transactions on Quantum Engineering, Journal Year: 2022, Volume and Issue: 3, P. 1 - 17

Published: Jan. 1, 2022

As quantum computing is still in its infancy, there an inherent lack of knowledge and technology to test a program properly.In the classical realm, mutation testing has been successfully used evaluate how well program's suite detects seeded faults (i.e., mutants).In this paper, building on definition syntactically equivalent operations, we propose novel set operators generate mutants based qubit measurements gates.To ease adoption testing, further QMutPy, extension well-known fully automated opensource tool MutPy.To QMutPy's performance, conducted case study 24 real programs written IBM's Qiskit library.Furthermore, show better coverage improvements assertions can increase suites' score quality.QMutPy proven be effective tool, providing insight into current state tests improve them.

Language: Английский

Citations

30

An automated approach to estimating code coverage measures via execution logs DOI
Boyuan Chen, Song Jian, Peng Xu

et al.

Published: Aug. 20, 2018

Software testing is a widely used technique to ensure the quality of software systems. Code coverage measures are commonly evaluate and improve existing test suites. Based on our industrial open source studies, state-of-the-art code tools only during unit integration due issues like engineering challenges, performance overhead, incomplete results. To resolve these issues, in this paper we have proposed an automated approach, called LogCoCo, estimating using readily available execution logs. Using program analysis techniques, LogCoCo matches logs with their corresponding paths estimates three different criteria: method coverage, statement branch coverage. Case studies one system (HBase) five commercial systems from Baidu show that: (1) results highly accurate (>96% seven out nine experiments) under variety activities (unit testing, benchmarking); (2) can be Our collaborators at currently considering adopting use it daily basis.

Language: Английский

Citations

56

Alleviating patch overfitting with automatic test generation: a study of feasibility and effectiveness for the Nopol repair system DOI
Zhongxing Yu, Matías Martínez, Benjamin Danglot

et al.

Empirical Software Engineering, Journal Year: 2018, Volume and Issue: 24(1), P. 33 - 67

Published: May 11, 2018

Language: Английский

Citations

42

Automatic test improvement with DSpot: a study with ten mature open-source projects DOI
Benjamin Danglot,

Oscar Luis Vera-Pérez,

Benoît Baudry

et al.

Empirical Software Engineering, Journal Year: 2019, Volume and Issue: 24(4), P. 2603 - 2635

Published: April 24, 2019

Language: Английский

Citations

36