This paper is the first to study the hallucination phenomenon that large-scale language models (LLMs) cause in code generation. We define 'code hallucination' as problems such as syntactic and logical errors, security vulnerabilities, and memory leaks that LLMs cause in code generation, and comprehensively classify their types. We present the CodeMirage benchmark dataset, which consists of 1,137 hallucinatory code snippets generated by GPT-3.5 based on Python programming problems. We experiment with code hallucination detection methodologies using models such as CodeLLaMA, GPT-3.5, and GPT-4, and show that GPT-4 performs the best on the HumanEval dataset and obtains similar results to the fine-tuned CodeBERT baseline on the MBPP dataset. Finally, we discuss various strategies to mitigate code hallucination.