Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

ChartM$^3$: Benchmarking Chart Editing with Multimodal Instructions

Created by
  • Haebom

Author

Donglu Yang, Liang Zhang, Zihao Yue, Liangyu Chen, Yichen Xu, Wenxuan Wang, Qin Jin

Outline

This paper presents a multimodal chart editing paradigm that combines natural language and visual indicators. To address the ambiguity of existing natural language-based chart editing methods, we propose a method that expresses user intent in natural language and visual indicators that explicitly highlight elements to be edited. To support this, we present Chart$\text{M}^3$, a novel multimodal chart editing benchmark with multi-level complexity and multi-faceted evaluation. Chart$\text{M}^3$ comprises 1,000 samples with four levels of editing difficulty, each composed of three elements: chart, code, and multimodal indicators. We provide metrics that assess both visual appearance and code correctness, allowing us to comprehensively evaluate chart editing models. Through Chart$\text{M}^3$, this paper demonstrates the limitations of current multimodal large-scale language models (MLLMs), particularly their inability to interpret and apply visual indicators. To address these limitations, we construct Chart$\text{M}^3$-Train, a large-scale training dataset consisting of 24,000 multimodal chart editing samples. Fine-tuning MLLM on this dataset significantly improves performance, demonstrating the importance of multimodal supervised learning. The dataset, code, and evaluation tools are available on GitHub.

Takeaways, Limitations

Takeaways:
Introducing a new paradigm for chart editing using multi-modal input.
A new benchmark, Chart$\text{M}^3$, is provided, taking into account multi-layered complexity and multi-faceted evaluation.
Revealing the limitations of existing MLLMs' ability to interpret and apply visual indicators.
Improving MLLM performance with the large-scale multimodal learning dataset Chart$\text{M}^3$-Train.
Emphasize the importance of multimodal supervised learning in developing chart editing systems.
Limitations:
The number of samples (1,000) in the Chart$\text{M}^3$ benchmark may be relatively small.
Further research is needed on generalization performance across different types of charts and editing tasks.
Further research is needed to overcome the limitations of current MLLM (e.g., developing more sophisticated visual understanding models).
👍