This paper reveals a phenomenon in which large-scale vision-language models (LVLMs) mistakenly perceive text inputs without visual evidence as part of an image, leading to errors. By investigating the ability of LVLMs to determine whether text concepts are rooted in an image, we discovered visual absence awareness (VA) neurons, a specific subset of feedforward network (FFN) neurons that signal visual absence with a unique activation pattern. Leveraging this pattern, we develop a detection module that classifies input tokens as visually rooted. Based on this prediction, we propose a method to improve the output by reinterpreting the question prompt or replacing absent tokens detected during generation. Extensive experiments demonstrate that the proposed method effectively mitigates the model's tendency to make incorrect assumptions about visual presence and is generalizable across a variety of LVLMs.