Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Beyond Wide-Angle Images: Structure-to-Detail Video Portrait Correction via Unsupervised Spatiotemporal Adaptation

Created by
  • Haebom

Author

Wenbo Nie, Lang Nie, Chunyu Lin, Jingwen Chen, Ke Xing, Jiyuan Wang, Kang Liao

Outline

To address the problem of facial distortion caused by wide-angle cameras, this paper proposes ImagePC, a structural-detail portrait correction model that integrates long-range recognition from Transformers and multi-stage denoising from diffusion models. Considering the difficulty of obtaining video labels, we propose VideoPC, a repurposed version of ImagePC for unlabeled wide-angle videos, utilizing spatiotemporal diffusion adaptation with spatial consistency and temporal smoothness constraints. VideoPC sequentially mitigates temporal blur in blind scenarios while maintaining high-quality spatial facial correction. We evaluate the performance and train the model on a video portrait dataset containing a diverse set of people, lighting conditions, and backgrounds, and demonstrate through experiments that it outperforms existing methods both qualitatively and quantitatively. The code and dataset will be made public in the future.

Takeaways, Limitations

Takeaways:
An effective solution to the problem of facial distortion caused by wide-angle camera distortion.
A novel structural-detail portrait correction model integrating transformer and diffusion models is proposed.
We propose VideoPC, an effective compensation technique for non-displayed videos.
Building and releasing a new video portrait dataset that includes a variety of conditions.
Demonstrated quantitative and qualitative superior performance compared to existing methods.
Limitations:
Lack of analysis of the computational cost and complexity of the proposed model.
Lack of generalization performance evaluation for different types of distortion.
Lack of performance evaluation in real application environments.
Further analysis is needed on the effectiveness and limitations of VideoPC's temporal smoothness constraints.
👍