This paper investigates whether advanced vision language models (VLMs) possess the ability to recognize fragmented or overlapping characters, similar to humans. We build two psychophysically inspired benchmarks using Chinese ideographic characters and English alphabetic words. By combining and overlapping characters, we create stimuli that are readable to humans but "visible but unreadable" to the model. Experiments show that VLMs perform well on clean text, but when these transformations are applied, their performance degrades significantly and they produce irrelevant or inconsistent results. This suggests a structural limitation in the model, which relies heavily on general visual invariance but not sufficiently on the configural prior information necessary for robust reading performance. This paper discloses the stimuli generation code, prompts, and evaluation protocol to facilitate transparent replication and further research.