Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Program Semantic Inequivalence Game with Large Language Models

Created by
  • Haebom

Author

Antonio Valerio Miceli-Barone, Vaishak Belle, Ali Payani

Outline

This paper presents a novel method for improving the complex code inference ability of large-scale language models (LLMs). While LLMs perform well on routine coding tasks, they can fail on complex tasks requiring non-trivial inferences about program semantics. To address this issue, this study explores a method for synthetically generating code inference training data based on the Semantic Inequity Game (SInQ). A generator agent generates semantically distinct program variants derived from a dataset of real-world programming tasks, and an evaluation agent identifies input examples that cause the behavior of the original program and the generated variants to differ. The two agents train each other semi-adversarially, and we demonstrate that this setup can theoretically improve infinitely through self-play assuming infinite computational resources. We validate the effectiveness of the proposed method through experiments on various code generation and understanding benchmarks, including multilingual vulnerability detection and the Python built-in identifier exchange benchmark. Despite being trained solely on Python code, the proposed method improves vulnerability detection in C/C++ code and achieves significant performance gains on the Python built-in identifier exchange benchmark, a benchmark where existing LLMs struggle. We have made public the code required to reproduce the experiment and the generated synthetic data, allowing other researchers to fine-tune the LLM.

Takeaways, Limitations

Takeaways:
We present the possibility of improving LLM's complex code reasoning ability through a synthetic data generation method based on semantic inequality games (SInQ).
It demonstrates the potential for performance improvement on multilingual and diverse types of code inference problems even with limited data.
Contribute to the advancement of LLM research through the disclosure of generated synthetic data.
Presenting the possibility of continuous performance improvement based on self-play.
Limitations:
There is a need to examine the applicability of theoretical proofs that assume infinite computational resources to real-world environments.
Further research is needed on the quality and diversity of the generated synthetic data.
Further validation is needed to determine whether performance improvements for specific benchmarks can be generalized to all other types of code inference problems.
There is a need to evaluate generalization performance for complex and diverse real-world code inference problems.
👍