Cited by B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible Tests

Towards an understanding of large language models in software engineering tasks DOI

Zibin Zheng, Kaiwen Ning,

Qingyuan Zhong

et al.

Empirical Software Engineering, Journal Year: 2024, Volume and Issue: 30(2)

Published: Dec. 26, 2024

Language: Английский

Citations

Investigating large language models capabilities for automatic code repair in Python DOI

Safwan Omari,

Kshitiz Basnet,

Mohammad Wardat

et al.

Cluster Computing, Journal Year: 2024, Volume and Issue: 27(8), P. 10717 - 10731

Published: May 9, 2024

Language: Английский

Citations

A Systematic Approach for Assessing Large Language Models’ Test Case Generation Capability DOI

Hung–Fu Chang,

Mohammad Shokrolah Shirazi

Software, Journal Year: 2025, Volume and Issue: 4(1), P. 5 - 5

Published: March 10, 2025

Software testing ensures the quality and reliability of software products, but manual test case creation is labor-intensive. With rise Large Language Models (LLMs), there growing interest in unit with LLMs. However, effective assessment LLM-generated cases limited by lack standardized benchmarks that comprehensively cover diverse programming scenarios. To address an LLM’s generation ability lacking a dataset for evaluation, we propose Generated Benchmark from Control-Flow Structure Variable Usage Composition (GBCV) approach, which systematically generates programs used evaluating LLMs’ capabilities. By leveraging basic control-flow structures variable usage, GBCV provides flexible framework to create spectrum ranging simple complex. Because GPT-4o GPT-3.5-Turbo are publicly accessible models, present real-world regular users’ use cases, assess LLM performance on them. Our findings indicate performs better composite program structures, while all models effectively detect boundary values conditions face challenges arithmetic computations. This study highlights strengths limitations LLMs generation, benchmark framework, suggests directions future improvement.

Language: Английский

Citations

Harnessing the Power of AI in Qualitative Research: Exploring, Using and Redesigning ChatGPT DOI

H. Zhang, Chuhao Wu, Jingyi Xie

et al.

Computers in Human Behavior Artificial Humans, Journal Year: 2025, Volume and Issue: unknown, P. 100144 - 100144

Published: March 1, 2025

Language: Английский

Citations

Test Oracle Automation in the Era of LLMs DOI

Facundo Molina, Alessandra Gorla, Marcelo d’Amorim

et al.

ACM Transactions on Software Engineering and Methodology, Journal Year: 2025, Volume and Issue: unknown

Published: Jan. 27, 2025

The effectiveness of a test suite in detecting faults highly depends on the quality its oracles. Large Language Models (LLMs) have demonstrated remarkable proficiency tackling diverse software testing tasks. This paper aims to present roadmap for future research use LLMs oracle automation. We discuss progress made field automation before introduction LLMs, identifying main limitations and weaknesses existing techniques. Additionally, we recent studies this task, highlighting challenges that arise from their use, e.g., how assess usefulness generated conclude with discussion about directions opportunities LLM-based

Language: Английский

Citations

A 2030 Roadmap for Software Engineering DOI

Mauro Pezzè, Silvia Abrahão, Birgit Penzenstadler

et al.

ACM Transactions on Software Engineering and Methodology, Journal Year: 2025, Volume and Issue: unknown

Published: April 25, 2025

The landscape of software engineering has dramatically changed in recent years. impressive advances artificial intelligence are just the latest and most disruptive innovation that remarkably research practice. This special issue shares a roadmap to guide community this confused era. is outcome two-day intensive discussion at 2030 Software Engineering workshop. spotlights discusses seven main landmarks new landscape: engineering, human aspects security, verification validation, sustainable quantum engineering. editorial summarizes core discussed 37 papers comprise sections guides interested readers throughout issue. living body we will refine with follow-up workshops update for series forthcoming ACM TOSEM issues.

Language: Английский

Citations

RAG-Driven multiple assertions generation with large language models DOI

Zhuang Liu, Hailong Wang, Tongtong Xu

et al.

Empirical Software Engineering, Journal Year: 2025, Volume and Issue: 30(3)

Published: April 26, 2025

Language: Английский

Citations

DAnTE: A Taxonomy for the Automation Degree of Software Engineering Tasks DOI

Jorge Melegati, Eduardo Guerra

Published: Jan. 1, 2024

Software engineering researchers and practitioners have pursued manners to reduce the amount of time effort required develop code increase productivity since emergence discipline. Generative language models are another step in this journey, but it will probably not be last one. In chapter, we propose DAnTE, a Degree Automation Taxonomy for software Engineering, describing several levels automation based on idiosyncrasies field. Based taxonomy, evaluated tools used past present practices. Then, give particular attention AI-based tools, including generative models, discussing how they located within proposed taxonomy reasoning about possible limitations currently have. analysis, discuss novel that could emerge middle long term.

Language: Английский

Citations

MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing DOI

Congying Xu, Songqiang Chen, Jiarong Wu

et al.

Published: Oct. 18, 2024

Language: Английский

Citations

Open-Source AI-based SE Tools: Opportunities and Challenges of Collaborative Software Learning DOI

Zhihao Lin, Wei Ma, Tao Lin

et al.

ACM Transactions on Software Engineering and Methodology, Journal Year: 2024, Volume and Issue: unknown

Published: Dec. 18, 2024

Large Language Models (LLMs) have become instrumental in advancing software engineering (SE) tasks, showcasing their efficacy code understanding and beyond. AI models has demonstrated value not only generating but also defect detection, enhancing security measures, improving overall quality. They are emerging as crucial tools for both development maintaining Like traditional SE tools, open-source collaboration is key realising the excellent products. However, with models, essential need data. The of these AI-based hinges on maximising sources high-quality data especially high quality, often holds commercial or sensitive value, making it less accessible projects. This reality presents a significant barrier to enhancement within community. Therefore, researchers find solutions enabling tap into resources by different organizations. Addressing this challenge, our position paper investigates one solution facilitate access diverse organizational ensuring privacy sensitivities respected. We introduce governance framework centered federated learning (FL), designed foster joint maintenance while safeguarding security. Additionally, we present guidelines developers tool collaboration, covering requirements, model architecture, updating strategies, version control. Given influence characteristics learning, research examines effect heterogeneity performance. consider 6 scenarios distributions include 4 models. most common algorithms. Our experimental findings highlight potential employing collaborative discuss issues be addressed co-construction process future directions.

Language: Английский

Citations