This paper addresses the problem of hallucination in large-scale language models (LLMs). Hallucination refers to the phenomenon in which an LLM generates confident responses but actually produces incorrect or nonsensical responses. This paper formulates hallucination detection as a hypothesis testing problem and demonstrates its similarity to the out-of-distribution detection problem in machine learning models. We propose a novel method inspired by multiple testing and present extensive experimental results to verify its robustness against state-of-the-art methods.