Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Explain Before You Answer: A Survey on Compositional Visual Reasoning

Created by
  • Haebom

Author

Fucai Ke, Joy Hsu, Zhixi Cai, Zixian Ma, Xin Zheng, Xindi Wu, Sukai Huang, Weiqing Wang, Pari Delir Haghighi, Gholamreza Haffari, Ranjay Krishna, Jiajun Wu, Hamid Rezatofighi

Outline

This paper presents a comprehensive survey of the field of compositional visual reasoning (CVR), analyzing over 260 papers published from 2023 to 2025. CVR aims to empower machines to decompose visual scenes and perform multi-step logical reasoning based on intermediate concepts, much like humans. We define the advantages of compositional approaches (cognitive alignment, semantic fidelity, robustness, interpretability, and data efficiency) and trace five paradigm shifts: from prompt-based, language-centric pipelines to tool-based LLMs and VLMs, thought-chain reasoning, and integrated agent VLMs. We present over 60 benchmarks and metrics, highlighting key insights, challenges (e.g., limitations of LLM-based reasoning, hallucinations, biases in deductive reasoning, scalable supervision, tool integration, and benchmark limitations), and future directions (e.g., world model integration, human-AI collaborative reasoning, and richer evaluation protocols).

Takeaways, Limitations

Takeaways:
Provides a systematic review and comprehensive analysis of the field of constructive visual reasoning.
Understanding Research Trends Through Five-Step Paradigm Shifts
Presenting various benchmarks and indicators
Clarifying the advantages and limitations of the constructive approach
Suggestions for future research directions
Limitations:
Limitations of LLM-based inference
Hallucination problems
Bias toward deductive reasoning
Absence of scalable supervision
Difficulties in tool integration
Limitations of Benchmarks
👍