[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

On Pre-training of Multimodal Language Models Customized for Chart Understanding

Created by
  • Haebom

Author

Wan-Cyuan Fan, Yen-Chun Chen, Mengchen Liu, Lu Yuan, Leonid Sigal

Outline

Building on recent studies tailoring multimodal large-scale language models (MLLMs) to specific domain tasks, particularly scientific chart understanding, this paper focuses on addressing the gap between natural image-caption pretraining data and digital chart image-QA data, particularly the model’s ability to extract basic numerical values from charts. To this end, we present three key findings. First, incorporating raw data values into alignment pretraining significantly improves chart data understanding. Second, randomly replacing images with textual representations during end-to-end fine-tuning transfers language inference capabilities to chart interpretation skills. Third, allowing the model to first extract basic chart data and then answer questions during fine-tuning further improves accuracy. Based on these findings, we present a customized MLLM, CHOPINLLM, that effectively interprets various types of charts (including unannotated charts) and maintains strong inference capabilities, and we construct a new benchmark to evaluate the comprehensibility of MLLMs across various chart types and understanding levels. Experimental results show that CHOPINLLM performs strongly on both annotated and unannotated charts.

Takeaways, Limitations

Takeaways:
Presenting effective pre-training and fine-tuning strategies to improve the performance of MLLM for chart understanding (integrating raw data values, converting image-to-text representations, and question answering after data extraction)
Development of the CHOPINLLM model to effectively understand various types of charts (with and without annotations)
A new benchmark for evaluating MLLM's chart understanding ability
Limitations:
Further research is needed to determine the generalizability of the presented methodology.
Need to test more diverse and complex chart types
Additional analysis is needed on the performance limitations and improvement directions of CHOPINLLM.
👍