Lecture notes in computer science, Journal Year: 2024, Volume and Issue: unknown, P. 145 - 166
Published: Dec. 29, 2024
Language: Английский
Lecture notes in computer science, Journal Year: 2024, Volume and Issue: unknown, P. 145 - 166
Published: Dec. 29, 2024
Language: Английский
Published: Feb. 18, 2025
The research in AI-based formal mathematical reasoning has shown an unstoppable growth trend. These studies have excelled competitions like IMO and made significant progress. However, these intertwined multiple skills simultaneously—problem-solving, reasoning, writing specifications—making it hard to precisely identify the LLMs’ strengths weaknesses each task. This paper focuses on verification, immediate application scenario of breaks down into sub-tasks. We constructed 18k high-quality instruction-response pairs across five mainstream specification languages (Coq, Lean4, Dafny, ACSL, TLA+) six tasks by distilling gpt-4o evaluated against ten open-sourced LLMs, including recent popular DeepSeek-R1. found that LLMs are good at proof segments when given either code, or detailed description steps. Also, fine-tuning brought about a nearly threefold improvement most. Interestingly, we observed with data also enhances mathematics, coding capabilities. Fine-tuned models released facilitate subsequent https://huggingface.co/fm-universe.
Language: Английский
Citations
0Published: April 4, 2025
AI for software engineering has made remarkable progress recently, becoming a notable success within generative AI. Despite this, there are still many challenges that need to be addressed before automated reaches its full potential. It should possible reach high levels of automation where humans can focus on the critical decisions what build and how balance difficult tradeoffs while most routine development effort is away. Reaching this level will require substantial research efforts across academia industry. In paper, we aim discuss towards in threefold manner. First, provide structured taxonomy concrete tasks engineering, emphasizing other beyond code generation completion. Second, outline several key bottlenecks limit current approaches. Finally, an opinionated list promising directions toward making these bottlenecks, hoping inspire future rapidly maturing field.
Language: Английский
Citations
0Proceedings of the ACM on Programming Languages, Journal Year: 2025, Volume and Issue: 9(OOPSLA1), P. 759 - 785
Published: April 9, 2025
Program synthesis aims to produce code that adheres user-provided specifications. In this work, we focus on synthesizing sequences of calls formally specified APIs generate objects satisfy certain properties. This problem is particularly relevant in automated test generation, where a engine may need an object with specific properties trigger given execution path. Constructing instances complex data structures require dozens method calls, but reasoning about consecutive computationally expensive, and existing work typically limits the number solution. paper, such long Dafny programming language. To end, introduce Metamorph, tool uses counterexamples returned by verifier reason effects one at time, limiting complexity solver queries. We also aim limit overall SMT queries comparing using two distance metrics develop for guiding process. particular, novel piecewise metric, which puts provably correct lower bound solution allows us frame as weighted A* search. When computing distance, view states conjunctions atomic constraints, identify constraints each call can satisfy, combine information integer programming. evaluate Metamorph’s ability large six benchmarks defining key structures: linked lists, queues, arrays, binary trees, graphs. Metamorph successfully construct programs up 57 per instance compares favorably alternative baseline approach. Additionally, integrate DTest, Dafny’s generation toolkit, show synthesize inputs parts AWS Cryptographic Material Providers Library DTest alone not able cover. Finally, use executable bytecode simple virtual machine, demonstrating techniques described here are more broadly applicable context specification-guided synthesis.
Language: Английский
Citations
0Proceedings of the ACM on Programming Languages, Journal Year: 2025, Volume and Issue: 9(OOPSLA1), P. 1519 - 1545
Published: April 9, 2025
Program verifiers such as Dafny automate proofs by outsourcing them to an SMT solver. This automation is not perfect, however, and the solver often requires hints in form of assertions , creating a burden for proof engineer. In this paper, we propose tool that alleviates automatically generating using large language models (LLMs). To improve success rate LLMs task, design two domain-specific prompting techniques. First, help LLM determine location missing assertion analyzing verifier’s error message inserting placeholder at location. Second, provide with example from same codebase, which select based on new similarity metric. We evaluate our techniques benchmark dataset complex lemmas extracted three real-world codebases. Our evaluation shows able generate over 56.6% required given only few attempts, making affordable unblocking program without human intervention.
Language: Английский
Citations
0Lecture notes in computer science, Journal Year: 2024, Volume and Issue: unknown, P. 280 - 301
Published: Jan. 1, 2024
Abstract Pre-trained Large Language Models (LLMs) are beginning to dominate the discourse around automatic code generation with natural language specifications. In contrast, best-performing synthesizers in domain of formal synthesis precise logical specifications still based on enumerative algorithms. this paper, we evaluate abilities LLMs solve benchmarks by carefully crafting a library prompts for domain. When one-shot fails, propose novel algorithm, which integrates calls an LLM into weighted probabilistic search. This allows synthesizer provide information about progress enumerator, and enumerator syntactic guidance iterative loop. We our techniques from Syntax-Guided Synthesis (SyGuS) competition. find that GPT-3.5 as stand-alone tool is easily outperformed state-of-the-art algorithms, but approach integrating algorithm shows significant performance gains over both alone winning SyGuS competition tool.
Language: Английский
Citations
2Lecture notes in computer science, Journal Year: 2024, Volume and Issue: unknown, P. 242 - 257
Published: Oct. 25, 2024
Language: Английский
Citations
1Lecture notes in computer science, Journal Year: 2024, Volume and Issue: unknown, P. 145 - 166
Published: Dec. 29, 2024
Language: Английский
Citations
1