Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

RepoDebug: Repository-Level Multi-Task and Multi-Language Debugging Evaluation of Large Language Models

Created by
  • Haebom

Author

Jingjing Liu, Zeming Liu, Zihao Cheng, Mengliang He, Xiaoming Shi, Yuhang Guo, Xiangrong Zhu, Yuanfang Guo, Yunhong Wang, Haifeng Wang

Outline

This paper focuses on the code debugging capabilities of large-scale language models (LLMs), particularly their automatic program repair capabilities. We highlight the limitations of existing code debugging datasets, which primarily focus on function-level code repair and fail to account for realistic repository-level scenarios. Therefore, we present RepoDebug, a multi-task and multi-language repository-level code debugging dataset that encompasses a wide range of tasks, languages, and error types. RepoDebug supports eight programming languages, 22 error types, and three debugging tasks. Experimental results on ten LLMs demonstrate that even the best-performing model, Claude 3.5 Sonnect, fails to perform well in repository-level debugging.

Takeaways, Limitations

Takeaways:
We provide RepoDebug, a realistic repository-level code debugging dataset, setting a new standard for evaluating LLM's code debugging performance.
It helps to assess the generalizability of LLM by including various programming languages and error types.
It clearly presents the current status and limitations of LLM's repository-level code debugging capabilities.
Limitations:
The RepoDebug dataset may not yet fully cover all types of repository-level errors and programming languages.
The types of LLMs used in the evaluation may be limited.
It may not fully reflect the complexity of repository-level debugging.
👍