[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Conditional Video Generation for High-Efficiency Video Compression

Created by
  • Haebom

Author

Fangqiu Yi, Jingyu Xu, Jiawei Shao, Chi Zhang, Xuelong Li

Outline

In this paper, we propose a perceptually optimized video compression framework that leverages the conditional diffusion model, which excels at reconstructing video content that matches human visual perception. We reframe video compression as a conditional generative task where a generative model synthesizes video from sparse but information-rich signals, introducing three main modules: multi-grain conditionalization that captures both static scene structures and dynamic spatiotemporal cues, a compressed representation designed for efficient transmission without sacrificing semantic richness, and multi-conditional training with modality dropout and role-aware embedding to avoid over-reliance on a single modality and improve robustness. Extensive experiments demonstrate that the proposed method significantly outperforms both conventional and neural codecs on perceptual quality metrics such as the Fréchet Video Distance (FVD) and LPIPS, especially at high compression ratios.

Takeaways, Limitations

Takeaways:
A novel video compression framework using conditional diffusion model
Achieve improved perceptual quality (lower FVD and LPIPS values) compared to conventional and neural codecs, especially effective at high compression ratios.
Presenting technological innovations such as multi-particle conditioning, compressed representation, and multi-condition training
Limitations:
Lack of detailed information on specific compression ratios and quality figures in the paper.
Lack of analysis on the computational complexity and real-time processing potential of the proposed method.
Lack of generalization performance evaluation across different video types and content
👍