Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

ReCode: Updating Code API Knowledge with Reinforcement Learning

Created by
  • Haebom

Author

Haoze Wu, Yunzhi Yao, Wenhao Yu, Huajun Chen, Ningyu Zhang

Outline

In this paper, we propose the ReCode framework to address the limitation of the code generation capability of large-scale language models (LLMs) that cannot adapt to frequent updates of external library APIs. ReCode imitates the way human programmers adapt to API changes, trains LLMs to perform version migration using about 2,000 data, and uses a modified string similarity measure as a reward for reinforcement learning. Experimental results show that ReCode significantly improves the code generation performance of LLMs, especially on the unknown CodeUpdateArena task, and has less impact on the general code generation capability than supervised learning fine-tuning. We apply ReCode to various LLMs and reinforcement learning algorithms (GRPO and DAPO) to achieve consistent performance improvements, and Qwen2.5-Coder-7B outperforms the 32B-parameter code-directed tuning model and the inference model with the same architecture. The source code is available on GitHub.

Takeaways, Limitations

Takeaways:
Presenting an effective framework (ReCode) for solving the problem of adapting LLM's API updates
Improving Code Generation Performance of LLM with a Reinforcement Learning-Based Approach
Minimize degradation of general code generation ability compared to fine-tuning supervised learning
Consistent performance improvements observed across a variety of LLM and reinforcement learning algorithms
A relatively small model (Qwen2.5-Coder-7B) achieves performance that outperforms the large model
Limitations:
Further research is needed to determine how ReCode's performance improvements generalize to a specific dataset (CodeUpdateArena).
Need to review whether the dataset size of 2,000 is sufficient. Need to analyze the performance change when using a larger dataset.
Further experiments are needed to determine generalizability across different APIs and programming languages.
👍