Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Conditional Video Generation for High-Efficiency Video Compression

Created by
  • Haebom

Author

Fangqiu Yi, Jingyu Xu, Jiawei Shao, Chi Zhang, Xuelong Li

Outline

This paper proposes a perceptually optimized video compression framework that leverages the conditional diffusion model, which excels at reconstructing video content that matches human visual perception. We reframe video compression as a conditional generative task, where a generative model synthesizes video from sparse but information-rich signals. We introduce three main modules: multi-particle conditioning, which captures both static scene structure and dynamic spatiotemporal cues; a compressed representation designed for efficient transmission without sacrificing semantic richness; and multi-conditional training using modality dropout and role-aware embeddings, which prevent overreliance on a single modality and enhance robustness. Extensive experiments demonstrate that the proposed method significantly outperforms conventional and neural codecs on perceptual quality metrics such as the Fréchet Video Distance (FVD) and LPIPS, especially at high compression ratios.

Takeaways, Limitations

Takeaways:
A novel video compression framework using the conditional diffusion model is presented.
Achieving superior perceptual quality at high compression ratios compared to conventional and neural codecs.
Improved efficiency and robustness through multi-particle conditioning, compressed representation, and multi-condition training.
Limitations:
Lack of analysis on the computational complexity and real-time processing potential of the proposed method.
Insufficient evaluation of generalization performance across various video types and content.
Lack of quantitative analysis of the relationship between bitrate and perceived quality of compressed video.
👍