This paper proposes a novel methodology that leverages human cognitive abilities to address the challenges of multi-face deepfake video detection. While existing methods are excellent at single-face detection, they struggle in multi-face situations because they fail to consider contextual cues. Based on human research, we identify key cues (scene-motion consistency, inter-face appearance compatibility, mutual gaze alignment, and face-body consistency) that humans use to detect deepfakes, and develop HICOM, a multi-face deepfake detection framework, based on this. HICOM demonstrates higher accuracy than existing methods in benchmark dataset experiments, and in particular, shows excellent generalization performance on new datasets. In addition, we enhance interpretability by providing human-understandable explanations for the detection results using LLM.