This paper deals with an in-depth analysis of hallucinations occurring in vision-language (VL) models, especially hallucinations in image captioning models. We propose a hallucination detection framework called HalCECE, which transforms hallucination captions into non-hallucination captions with minimal semantic modifications based on hierarchical knowledge, leveraging existing conceptual and semi-empirical explanation techniques. HalCECE provides high interpretability by providing meaningful modifications instead of numbers, and enables thorough hallucination analysis through hierarchical decomposition of hallucination concepts. It is also one of the first studies to investigate role hallucinations by considering interconnections between visual concepts. In conclusion, HalCECE presents an explainable approach to VL hallucination detection, facilitating reliable evaluation of current and future VL systems.