Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

TENET: Leveraging Tests Beyond Validation for Code Generation

Created by
  • Haebom

Author

Yiran Hu, Nan Jiang, Shanchao Liang, Yi Wu, Lin Tan

Outline

This paper introduces TENET, an LLM agent for generating functions from complex real-world code repositories in a TDD environment. TENET features three components: (1) concise test suite selection that maximizes the variety of usage scenarios, (2) efficient retrieval of relevant code through interactive debugging, and (3) a reflection-based improvement workflow that iterates through failure analysis, context augmentation, and code improvement. TENET achieves 69.08% and 81.77% Pass@1 on the RepoCod and RepoEval benchmarks, respectively, outperforming the best agent-based baselines. Furthermore, as the first study on TDD code generation using repository-level context, we investigate the impact of various aspects of the test suite on the performance of the LLM agent in a TDD environment.

Takeaways, Limitations

We present an effective approach to code generation using LLM in a TDD environment.
Achieves higher performance than existing methodologies on RepoCod and RepoEval benchmarks.
Provides concrete methodologies for test suites, code search, and feedback-based improvements.
Improve the accuracy of code generation by leveraging repository-level context.
In a context where research on test-driven code generation is lacking, this paper presents a new direction.
Emphasizes the practicality of research by focusing on generating functions for actual code repositories.
It may be difficult to generalize based solely on performance evaluations for specific benchmarks.
Due to the LLM-dependent nature, performance may vary depending on changes in the LLM model.
Further research is needed on its application to complex real-world code repositories.
Further verification of the scalability and applicability of the proposed methodology to other domains is needed.
Optimization is needed for efficient selection of test suites.
👍