Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Chart-R1: Chain-of-Thought Supervision and Reinforcement for Advanced Chart Reasoner

Created by
  • Haebom

Author

Lei Chen, Xuanle Zhao, Zhixiong Zeng, Jing Huang, Yufeng Zhong, Lin Ma

Outline

This paper presents the Chart-R1 model, which applies an R1-style method based on reinforcement learning fine-tuning to complex inference in the chart domain. Unlike existing R1-style methods that focus on mathematical reasoning and code intelligence, Chart-R1 enhances inference capabilities for more general multimodal data, particularly chart data. To achieve this, we propose a novel programmatic data synthesis technique that generates high-quality step-by-step chart inference data containing single and multiple sub-charts. We also develop a two-step learning strategy: Chart-COT, which utilizes a Chain-of-Thought (COT) map, and Chart-RFT, which utilizes numerical sensitivity fine-tuning. Chart-COT decomposes complex inference tasks into fine-grained sub-tasks, while Chart-RFT emphasizes numerical sensitivity in the chart domain by using relatively gentle rewards for numerical responses. Experimental results show that Chart-R1 outperforms existing chart domain methods and is comparable to large-scale models such as GPT-4o and Claude-3.5.

Takeaways, Limitations

Takeaways:
Successfully applying reinforcement learning-based R1-style methodology to complex inference problems with multimodal data, particularly chart data.
Solving the problem of chart inference data shortage with new programmatic data synthesis techniques.
An effective two-step learning strategy is presented that combines the counterfactual (COT) and numerical sensitivity enhancement fine-tuning (RFT).
Demonstrated superior performance compared to existing methods and large-scale models.
Limitations:
Further research is needed on the generalization performance and limitations of the proposed data synthesis technique.
There is a potential bias towards certain types of chart data. Performance evaluation is needed for various types of chart data.
Further research is needed on the design and optimization of the reward function used.
Analysis of performance differences for specific metrics is needed when comparing with large-scale models.
👍