This paper presents Jigsaw-Puzzles, a new benchmark for evaluating the spatial reasoning capabilities of vision-language models (VLMs). Jigsaw-Puzzles consists of 1,100 real-world images with high spatial complexity and includes five tasks assessing spatial perception, structure understanding, and reasoning. When evaluated against 24 state-of-the-art VLMs, even the top-performing model, Gemini-2.5-Pro, achieved only 77.14% overall accuracy, and in particular, only 30% accuracy in the sequence generation task, significantly lower than the over 90% performance of human participants. This highlights the need for continued research to improve the spatial reasoning capabilities of VLMs.