This paper studies the classification of open-ended responses in surveys using large-scale language models (LLMs). Unlike previous studies that mainly focused on English data and simple topics, we compared and analyzed various state-of-the-art LLMs and prompt methods using German survey participation reasons data. Through performance comparison with human expert coding, we confirmed the difference in the performance of LLMs, and in particular, we showed that only fine-tuned LLMs achieved satisfactory prediction performance. We found that the effectiveness of prompt methods varied depending on the LLM, and that without fine-tuning, LLMs could classify each category of survey participation reasons unevenly, which could distort the category distribution. In conclusion, we discuss the conditions and constraints for efficient and accurate use of LLMs in survey research, and suggest implications for practitioners’ data processing and analysis.