[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

AI-Assisted Fixes to Code Review Comments at Scale

Created by
  • Haebom

Author

Chandra Maddila, Negar Ghorbani, James Saindon, Parth Thakkar, Vijayaraghavan Murali, Rui Abreu, Jingyue Shen, Brian Zhou, Nachiappan Nagappan, Peter C. Rigby

Outline

Meta processes tens of thousands of code review comments every week. This paper presents the process and results of developing Metamate for Code Review (MetaMateCR), a system that provides AI-assisted corrections to code reviewer comments at scale. We fine-tuned the Llama model using 64,000 data points, and deployed it to a production environment after the offline results reached a satisfactory level. Comparison results with GPT-4o show that the developed LargeLSFT model generates accurate patches in 68% of cases, which is 9%p higher than GPT-4o, and uses a more recent Hack function. Through safety tests, we evaluate the impact of AI patch suggestions on review time, and address the delay in review time through UX improvements. When deployed to a production environment, the LargeLSFT model achieved an ActionableToApplied rate of 19.7%, which is 9.2%p higher than GPT-4o.

Takeaways, Limitations

Takeaways:
Demonstrates the feasibility of effectively building and operating an AI-based automatic patch generation system in a large-scale code review environment.
Presenting ways to increase the practical applicability of AI systems through safety testing and UX improvements.
Achieved performance superior to GPT-4o based on the Llama model.
Presenting a case study of successful large-scale deployment of AI-enabled systems.
Limitations:
Since the results are based on internal Meta data, generalizability to other environments may be limited.
The fact that problems with the initial UX design were revealed during the safety testing process suggests that caution is needed when developing similar systems in the future.
With an ActionableToApplied ratio of less than 20%, AI is not effectively generating patches for all code review comments.
👍