Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Learning Human-Perceived Fakeness in AI-Generated Videos via Multimodal LLMs

Created by
  • Haebom

Author

Xingyu Fu, Siyi Liu, Yinuo Xu, Pan Lu, Guangqiuse Hu, Tianbo Yang, Taran Anantasagar, Christopher Shen, Yikai Mao, Yuanzhe Liu, Keyush Shah, Chung Un Lee, Yejin Choi, James Zou, Dan Roth, Chris Callison-Burch

Outline

This paper explores whether humans can identify AI-generated (fake) videos and explain why they are so. Despite advances in video generation models, research on the ability of humans to detect deepfake traces within generated videos remains limited. To address this issue, the authors present DeeptraceReward, a fine-grained spatiotemporal benchmark annotating human-recognized fake traces. This benchmark consists of 4.3K detailed annotations for 3.3K high-quality generated videos. Each annotation provides a natural language description, a bounding box containing the recognized trace, and precise start and end timestamps. The authors synthesized these annotations into nine key deepfake trace categories that contribute to human identification of AI-generated videos and trained a multimodal language model (LM) as a reward model to mimic human judgment and localization. The DeeptraceReward 7B reward model outperformed GPT-5 by an average of 34.7% in identifying fake cues, providing justification, and explaining them. Furthermore, we found that binary classification of fake vs. real was significantly easier than fine-grained deepfake trace detection. Within the latter, natural language explanations were the easiest, followed by spatial evidence and then temporal labeling, which tended to decrease performance. By highlighting human-recognizable deepfake traces, DeeptraceReward provides a rigorous testbed and training signal for generating socially recognizable and trustworthy videos.

Takeaways, Limitations

Takeaways:
Introducing DeeptraceReward, a detailed benchmark for how humans can detect deepfake traces in AI-generated videos.
A novel approach based on human perception for AI-generated video detection is presented.
Successfully training a reward model that mimics human judgment using a multimodal language model.
A detailed analysis of the difficulty of detecting fake videos is presented.
Contributing to research on creating socially aware and trustworthy videos
Limitations:
Possible bias in the data used for model training and evaluation
Possibility of errors due to the subjectivity of human annotations
Limited generalizability to new types of deepfake technology.
Limited accuracy due to difficulties in temporal labeling
Absence of comparison with additional models other than the 7B model and GPT-5
👍