[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Seeing Through Deepfakes: A Human-Inspired Framework for Multi-Face Detection

Created by
  • Haebom

Author

Juan Hu, Shaojing Fan, Terence Sim

Outline

This paper proposes a novel methodology that leverages human cognitive abilities to address the challenges of multi-face deepfake video detection. While existing methods are excellent at single-face detection, they struggle in multi-face situations because they fail to consider contextual cues. Based on human research, we identify key cues (scene-motion consistency, inter-face appearance compatibility, mutual gaze alignment, and face-body consistency) that humans use to detect deepfakes, and develop HICOM, a multi-face deepfake detection framework, based on this. HICOM demonstrates higher accuracy than existing methods in benchmark dataset experiments, and in particular, shows excellent generalization performance on new datasets. In addition, we enhance interpretability by providing human-understandable explanations for the detection results using LLM.

Takeaways, Limitations

Takeaways:
We demonstrate that human cognitive processes can be leveraged to improve multi-face deepfake detection performance.
The HICOM framework has higher accuracy and generalization performance than existing methods.
Improving the reliability of detection results with LLM-based explainability.
A novel approach to defending against multi-face deepfakes.
Limitations:
Further research is needed on the generalizability of the human cognitive cues presented in this paper.
Further evaluation of HICOM's performance on different types of deepfake videos is needed.
The accuracy and reliability of the explanations using LLM need to be verified.
Additional performance evaluation in real social environments is needed.
👍