Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving

Created by
  • Haebom

Author

Huacan Wang, Ziyi Ni, Shuo Zhang, Shuo Lu, Sen Hu, Ziyang He, Chen Hu, Jiaye Lin, Yifu Guo, Ronghao Chen, Xin Li, Daxin Jiang, Yuntao Du, Pin Lyu

Outline

This paper addresses the ultimate goal of code agents that autonomously solve complex tasks. While large-scale language models (LLMs) have made significant progress in code generation, real-world tasks require complete code repositories, not just scripts. Building such repositories from scratch remains a challenging task. GitHub offers a vast collection of open-source repositories that developers frequently reuse as modular components for complex tasks, but existing frameworks such as OpenHands and SWE-Agent struggle to effectively utilize this valuable resource. In this paper, we propose RepoMaster, an autonomous agent framework designed to explore and reuse GitHub repositories to solve complex tasks. RepoMaster constructs function call graphs, module dependency graphs, and hierarchical code trees to identify key components for efficient comprehension, and provides only these key components to LLMs, rather than the entire repository. During autonomous execution, we use exploration tools to incrementally explore relevant components and remove information to optimize contextual utilization. When evaluated on the adjusted MLE-Bench, RepoMaster achieved a 110% relative improvement in valid submissions compared to the most powerful benchmark, OpenHands. On the newly released GitTaskBench, it increased task pass rates from 40.7% to 62.9%, while reducing token usage by 95%.

Takeaways, Limitations

Takeaways:
Presenting the possibility of solving complex code generation problems through efficient use of GitHub repositories.
Presenting effective strategies to address the limited context window problem of LLMs (utilizing function call graphs, module dependency graphs, and hierarchical code trees).
Experimentally demonstrated performance improvements over existing frameworks (significant performance gains in MLE-bench and GitTaskBench).
Expanding research and ensuring reproducibility through open-sourcing RepoMaster.
Limitations:
Because GitTaskBench is a newly proposed benchmark, comparative analysis with other existing benchmarks is lacking.
There is a possibility that RepoMaster's performance improvements may be biased towards certain types of tasks.
Further research is needed to determine generalizability to more complex and diverse tasks in the real world.
Further research is needed on the scalability of RepoMaster depending on the complexity and size of the repository.
👍