This paper proposes MESH, a novel benchmark for systematically evaluating hallucinations in large-scale video models (LVMs). Moving beyond the manual classification approach of existing benchmarks (Limitations), MESH leverages a question-answering approach to evaluate basic objects, detailed features, and subject-action pairs, mimicking the human video understanding process. MESH focuses on effectively identifying hallucinations in LVMs, including binary and multiple-choice questions and target and trap instances. Experimental results demonstrate that while LVMs excel at basic object and feature recognition, their hallucination rate increases significantly when processing detailed information or multiple actions in long-running videos involving multiple subjects.