[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

“PhyWorldBench”: A Comprehensive Evaluation of Physical Realism in Text-to-Video Models

Created by
  • Haebom

Author

Jing Gu, Xian Liu, Yu Zeng, Ashwin Nagarajan, Fangrui Zhu, Daniel Hong, Yue Fan, Qianqi Yan, Kaiwen Zhou, Ming-Yu Liu, Xin Eric Wang

Outline

In this paper, we present PhyWorldBench, a comprehensive benchmark for evaluating video generation models based on their adherence to the laws of physics. PhyWorldBench covers a wide range of physical phenomena, from basic principles such as object motion and energy conservation to more complex scenarios involving rigid-body interactions and human or animal movements. We also introduce an “Anti-Physics” category that uses prompts that intentionally violate the laws of physics in the real world, to assess whether models can follow these instructions while remaining logically consistent. In addition to large-scale human evaluation, we present a simple yet effective way to evaluate physical realism in a zero-shot manner by leveraging current MLLMs. We evaluate 12 state-of-the-art text-to-video generation models (five open source, five proprietary) and conduct a detailed comparative analysis to identify significant challenges models face in complying with the laws of physics in the real world. We conduct extensive testing on 1,050 curated prompts (basic, complex, and semi-physical scenarios) to rigorously examine performance on a wide range of physical phenomena with a variety of prompt types, and derive targeted guidelines for writing prompts that enhance fidelity to physical principles.

Takeaways, Limitations

Takeaways: PhyWorldBench provides a standardized benchmark for evaluating the physical realism of video generation models. Zero-shot evaluation methods enable efficient model evaluation. Analysis of various physical phenomena and prompt types provides specific directions for model improvement.
Limitations: Despite the comprehensiveness of the benchmark, it is difficult to perfectly reflect all physical phenomena in the real world. The accuracy of MLLM-based zero-shot evaluation methods requires further research. The types and number of models used in the evaluation may be limited.
👍