Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Enhancing Supervised Composed Image Retrieval via Reasoning-Augmented Representation Engineering

Created by
  • Haebom

Author

Jun Li, Kai Li, Shaoguo Liu, Tingting Gao

Outline

This paper proposes the Pyramid Matching Model with Training-Free Refinement (PMTFR) framework to solve the Composed Image Retrieval (CIR) problem. While existing two-step approaches require additional ranking model training, PMTFR reduces training costs by leveraging the Chain-of-Thought (CoT) technique. The Pyramid Patcher module enhances the understanding of visual information at various resolutions, and representations extracted from CoT data are injected into LVLMs to improve retrieval results without training. Experimental results demonstrate that PMTFR outperforms the existing state-of-the-art supervised CIR task. The code will be made public.

Takeaways, Limitations

Takeaways:
Achieving state-of-the-art performance on the supervised CIR task using a training-free refinement paradigm.
Reduce learning costs by leveraging CoT techniques.
Improved understanding of visual information at various resolutions through the Pyramid Patcher module.
Improving performance through effective expression engineering.
Code to be released soon.
Limitations:
Further research may be needed to explore the generalization performance of the methodology presented in this paper.
Additional experiments on various CIR benchmarks and datasets may be required.
A more detailed analysis of the design and parameters of the Pyramid Patcher module may be required.
👍