Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation

Created by
  • Haebom

Author

Jane Luo, Xin Zhang, Steven Liu, Jie Wu, Yiming Huang, Yangyu Huang, Chengyu Yin, Ying Xin, Jianfeng Liu, Yuefeng Zhan, Hao Sun, Qi Chen, Scarlett Li, Mao Yang

Creating a complete code repository using the Repository Planning Graph (RPG).

Outline

Large-Scale Language Models (LLMs) are adept at generating individual functions or single code files, but they struggle to generate complete code repositories from scratch. This paper addresses the generation of a complete repository, a key challenge for building consistent software systems from high-level specifications and maximizing the potential of automated code generation. This requires two stages of planning: defining features and modules (the proposal phase) and defining implementation details (the implementation phase). We point out that existing natural language-based approaches suffer from ambiguity and lack of structure, leading to unclear specifications, component inconsistencies, and design vulnerabilities. To overcome these limitations, we propose the Repository Planning Graph (RPG), a structured representation that encodes functions, file structures, data flows, and functions as a unified graph. RPG enables consistent long-term planning for repository creation through explicit blueprints. Based on RPG, we develop the ZeroRepo framework, which performs graph-based code generation through proposal-level planning, implementation-level configuration, and test verification. We evaluated ZeroRepo on the RepoCraft benchmark, which consists of six real-world projects and 1,052 tasks. ZeroRepo generated 36,000 lines of code and 445,000 code tokens, an average of 3.9x more than the previous best-performing model (Claude Code) and 68x more than other baselines. It also achieved 81.5% coverage and 69.7% test accuracy, outperforming Claude Code by 27.3 and 35.8 points, respectively. Further analysis revealed that RPG models complex dependencies, enables more sophisticated planning that scales nearly linearly, and accelerates localization by improving the agent's understanding of the repository.

Takeaways, Limitations

Takeaways:
The ZeroRepo framework using RPG has shown successful results in creating a complete code repository.
Improve consistency in code generation through structured RPG representations.
Significantly improved performance compared to existing methodologies is demonstrated through the RepoCraft benchmark.
It offers the potential for complex dependency modeling, sophisticated planning, and improved agent understanding.
Limitations:
The specific Limitations is not directly mentioned in the paper.
Generalizability to benchmarks other than the RepoCraft benchmark requires further research.
There is no explicit mention of the model's complexity and computational cost.
👍