[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Can LLMs Generate User Stories and Assess Their Quality?

Created by
  • Haebom

Author

Giovanni Quattrocchi, Liliana Pasquale, Paola Spoletini, Luciano Baresi

Outline

This paper highlights the challenges of requirements analysis and the need for automation, and explores a method for automatically generating and evaluating user stories (US) within an agile framework using large-scale language models (LLMs). Using ten state-of-the-art LLMs, we automatically generate USs that mimic customer interviews and compare them with those generated by domain experts and students. We also investigate the ability of LLMs to automatically evaluate the semantic quality of USs. Our experiments show that LLMs generate USs that are similar to human-generated USs in terms of scope and style, but have lower variety and creativity. LLM-generated USs are qualitatively similar to human-generated USs, but have lower rates of meeting acceptance criteria, regardless of model size. Finally, LLMs demonstrate the potential to reliably evaluate the semantic quality of USs when given explicit evaluation criteria, thereby reducing human effort in large-scale evaluations.

Takeaways, Limitations

Takeaways:
Demonstrates that LLM can be used to automatically generate user stories in an agile environment.
Presenting the possibility of automatically assessing the semantic quality of user stories using LLM.
Identifying the potential to reduce human effort in large-scale evaluations.
Limitations:
User stories generated by LLMs have less variety and creativity than stories generated by humans.
User stories generated by LLM have a low acceptance rate.
Clear evaluation criteria are needed.
👍