This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
This paper presents performance enhancement by applying Chain-of-Thought (CoT) inference to autoregressive image generation. We focus on three techniques: extending test time computation for verification, aligning model preferences via Direct Preference Optimization (DPO), and a complementary combination of these two techniques. In particular, we propose Potential Assessment Reward Model (PARM) and PARM++, which are specialized in autoregressive image generation. PARM evaluates each generation step through a potential assessment approach and combines the strengths of existing reward models, while PARM++ additionally introduces a self-correction mechanism to correct bad images. By applying the proposed methods based on the Show-o model, we achieve 24% performance improvement on the GenEval benchmark, outperforming Stable Diffusion 3 by 15%.
Takeaways, Limitations
•
Takeaways:
◦
We successfully apply CoT inference to autoregressive image generation, demonstrating improved performance.
◦
Improving image generation quality by proposing new reward models such as PARM and PARM++.
◦
We present an effective method for combining CoT inference strategies with test time calculation extension and DPO.
◦
Achieving SOTA performance on GenEval benchmark.
•
Limitations:
◦
Further studies are needed to investigate the generality of the proposed method and its applicability to other image generation models.
◦
Need to analyze the computational cost and complexity of PARM and PARM++ models.
◦
Further validation is needed to determine whether performance improvements for specific benchmarks generalize to other benchmarks.