Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Tool-integrated Reinforcement Learning for Repo Deep Search

Created by
  • Haebom

Author

Zexiong Ma, Chao Peng, Qunhong Zeng, Pengfei Gao, Yanzhen Zou, Bing Xie

Outline

This paper addresses software issue localization, the process of identifying the code locations that require modification to resolve software problems. The semantic gap between natural language issue descriptions and faulty code requires complex, multi-step reasoning via code dependencies. Existing LLM-based agents attempt to address this problem by integrating repository search tools, but this translates into a challenging task known as "Repo Deep Search," requiring LLMs to effectively leverage multiple repository search tools throughout the multi-step inference and exploration process. To address this challenge, this paper presents ToolTrain, a two-step tool-integration training framework that combines rejection-sampled supervised fine-tuning and tool-integrated reinforcement learning. Experimental results demonstrate that models trained with ToolTrain achieve state-of-the-art performance, with the 32B model outperforming Claude-3.7 in function-level localization. Furthermore, we demonstrate that improved localization performance translates into improved end-to-end issue resolution, demonstrating that training for issue localization is a viable and effective strategy for improving automated software development.

Takeaways, Limitations

Takeaways:
We demonstrate that the ToolTrain framework can significantly improve software issue localization performance by enhancing the ability to leverage LLM's repository search tools.
The 32B model outperforms Claude-3.7, suggesting the potential of LLM-based issue localization.
We demonstrate the effectiveness of issue localization training by demonstrating that improved localization performance leads to improved end-to-end issue resolution performance.
Presenting a new strategy for improving automated software development.
Limitations:
Further research is needed on the generalization performance of the ToolTrain framework and its applicability to various software projects.
Results are focused on a specific size of LLM (32B), and there is a lack of performance evaluation for LLMs of other sizes.
There is a possibility of performance bias depending on the characteristics of the experimental dataset.
Further evaluation is needed for robustness against complex codebases or multiple programming languages.
👍