Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Cobblestone: A Divide-and-Conquer Approach for Automating Formal Verification

Created by
  • Haebom

Author

Saketh Ram Kasibatla, Arpan Agarwal, Yuriy Brun, Sorin Lerner, Talia Ringer, Emily First

Outline

This paper proposes Cobblestone, a novel proof synthesis tool that leverages a large-scale language model (LLM) to improve the efficiency of formal verification using proof assistants like Coq. Cobblestone generates potential proofs using the LLM, which are then used to divide the problem into smaller parts. It automatically identifies successfully proven parts and iterates over the remaining parts to ultimately produce a correct proof. Despite the uncertainty of the LLM, Cobblestone guarantees the correctness of the generated proofs. Experimental results show that Cobblestone outperforms existing state-of-the-art tools, proving many theorems that other LLM-based tools fail to prove. Its efficiency is demonstrated by an average execution cost of $1.25 and a runtime of 14.7 minutes. Furthermore, it can utilize external input from users or other tools to leverage the proof structure or related lemmas, proving up to 58% of theorems. This study demonstrates the potential of leveraging partial progress and external input to improve the efficiency of formal verification automation.

Takeaways, Limitations

Takeaways:
We show that proof synthesis using divide-and-conquer methods utilizing LLM can significantly improve the efficiency of formal verification automation.
It overcomes the limitations of existing LLM-based tools and presents new possibilities for proving more theorems.
We highlight the importance of strategies for effectively leveraging partial progress and external inputs in formal verification automation.
Provides a cost-effective method for automating formal verification.
Limitations:
It depends on the performance of LLM, and the limitations of LLM may affect the performance of Cobblestone.
Performance may still be limited for certain types of cleanups.
The scope of benchmark datasets may be limited, and performance evaluation on a wider variety of problems is needed.
👍