This paper highlights the challenges of requirements analysis and the need for automation, and explores a method for automatically generating and evaluating user stories (US) within an agile framework using large-scale language models (LLMs). Using ten state-of-the-art LLMs, we automatically generate USs that mimic customer interviews and compare them with those generated by domain experts and students. We also investigate the ability of LLMs to automatically evaluate the semantic quality of USs. Our experiments show that LLMs generate USs that are similar to human-generated USs in terms of scope and style, but have lower variety and creativity. LLM-generated USs are qualitatively similar to human-generated USs, but have lower rates of meeting acceptance criteria, regardless of model size. Finally, LLMs demonstrate the potential to reliably evaluate the semantic quality of USs when given explicit evaluation criteria, thereby reducing human effort in large-scale evaluations.