Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection

Created by
  • Haebom

Author

Taehoon Kim, Jongwook Choi, Yonghyun Jeong, Haeun Noh, Jaejun Yoo, Seungryul Baek, Jongwon Choi

Outline

This paper proposes a novel deepfake video detection technique that utilizes temporal disparity at the pixel level to overcome the limitations of existing spatial frequency-based deepfake detection methods. Existing methods simply stack spatial frequency spectra between frames to express temporal information, which has the limitation of failing to detect temporal artifacts at the pixel level. The proposed method extracts features that are highly sensitive to temporal disparity by performing a 1D Fourier transform on the temporal axis for each pixel, and is particularly effective in areas where unnatural movements are likely to occur. In addition, we introduce an attention proposal module trained in an end-to-end manner to accurately find areas containing temporal artifacts, and expand the range of detectable forgery artifacts by using a joint transformer module that effectively integrates spatial-temporal context information and pixel-level temporal frequency features. It provides robust performance in various and difficult detection scenarios, contributing greatly to the advancement of deepfake video detection.

Takeaways, Limitations

Takeaways:
A new deepfake detection technique that overcomes the limitations of existing methods through pixel-by-pixel temporal discrepancy analysis
Extracting temporally inconsistent features using 1D Fourier transform and accurately identifying artifact locations using attention modules
Improving detection performance by leveraging spatial-temporal context information through joint transformer modules
Powerful detection performance for various deepfake videos
Limitations:
Further validation is needed on the generalization performance of the proposed method and its detection performance against various deepfake generation techniques.
Need to evaluate resistance to noise and compression in real environments
Potential increase in computational cost due to complexity of attention module and joint transformer module
Potentially biased performance against certain types of deepfakes
👍