This paper investigates the reliability and vulnerability to response bias of large-scale language models (LLMs) as surrogates for human subjects in social science surveys. Using the World Values Survey (WVS) questionnaire, we conducted over 167,000 mock interviews with nine different LLMs, applying 11 changes to the question format and response option structure. We find that LLMs are not only vulnerable to change, but also exhibit consistent recency bias across all models, with varying strengths, and over-prefer the last response option presented. Although larger models are generally more robust, all models are still sensitive to semantic changes such as rephrasing and complex changes. By applying a series of changes, we find that LLMs partially match the survey response biases observed in humans. This highlights the importance of prompt design and robustness testing when generating synthetic survey data using LLMs.