Object hallucinations in large-scale vision-language models pose significant challenges for secure deployment in real-world applications. Recent studies have proposed object-level hallucination scores to estimate the likelihood of object hallucinations, but these methods typically employ only global or local perspectives independently, which can limit detection reliability. In this paper, we introduce GLSim, a novel, training-free object hallucination detection framework that leverages complementary global and local embedding similarity cues between image and text modalities, enabling more accurate and reliable hallucination detection across a variety of scenarios. Through comprehensive benchmarking of existing object hallucination detection methods, we demonstrate that GLSim achieves significantly better detection performance than competitive baselines.