Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Conditional Video Generation for High-Efficiency Video Compression

Created by
  • Haebom

Author

Fangqiu Yi, Jingyu Xu, Jiawei Shao, Chi Zhang, Xuelong Li

Outline

This paper proposes a perceptually optimized video compression framework that leverages the conditional diffusion model, which excels at reconstructing video content that matches human visual perception. We reframe video compression as a conditional generative task, where a generative model synthesizes video from sparse but information-rich signals. We introduce three main modules: multi-particle conditioning, which captures both static scene structure and dynamic spatiotemporal cues; a compact representation designed for efficient transmission without sacrificing semantic richness; and multi-conditional training using modality dropout and role-aware embeddings to avoid overreliance on a single modality and enhance robustness. Extensive experiments demonstrate that the proposed method significantly outperforms both conventional and neural codecs on perceptual quality metrics such as the Fréchet Video Distance (FVD) and LPIPS, especially at high compression ratios.

Takeaways, Limitations

Takeaways:
A novel video compression framework using the conditional diffusion model is presented.
Achieves superior perceptual quality at high compression ratios compared to conventional and neural codecs (based on FVD and LPIPS)
Efficient and robust compression performance achieved through multi-particle conditioning, compact representation, and multi-condition training.
Limitations:
Lack of analysis of the computational complexity and memory requirements of the proposed method.
Lack of generalization performance evaluation across different video types and content.
Lack of detailed explanation of actual implementation and application
👍