Large-scale visual-language models (LVLMs) demonstrate powerful performance on multimodal tasks, but tend to rely on pre-trained text patterns known as linguistic biases (LPs). This paper systematically analyzes linguistic biases using chain-of-embedding, which investigates the layer-by-layer representational dynamics within LVLMs. We find that each model exhibits a Visual Integration Point (VIP), a crucial layer where visual information meaningfully reshapes hidden representations and influences decoding. We propose a Total Visual Integration (TVI) estimator that aggregates representational distances beyond the VIP to quantify how strongly visual queries influence response generation. We demonstrate that VIPs consistently emerge across 54 model-dataset combinations covering nine LVLMs and six benchmarks, and that TVI reliably predicts the strength of linguistic biases. This provides a tool for diagnosing and understanding linguistic biases in LVLMs.