Building on recent studies tailoring multimodal large-scale language models (MLLMs) to specific domain tasks, particularly scientific chart understanding, this paper focuses on addressing the gap between natural image-caption pretraining data and digital chart image-QA data, particularly the model’s ability to extract basic numerical values from charts. To this end, we present three key findings. First, incorporating raw data values into alignment pretraining significantly improves chart data understanding. Second, randomly replacing images with textual representations during end-to-end fine-tuning transfers language inference capabilities to chart interpretation skills. Third, allowing the model to first extract basic chart data and then answer questions during fine-tuning further improves accuracy. Based on these findings, we present a customized MLLM, CHOPINLLM, that effectively interprets various types of charts (including unannotated charts) and maintains strong inference capabilities, and we construct a new benchmark to evaluate the comprehensibility of MLLMs across various chart types and understanding levels. Experimental results show that CHOPINLLM performs strongly on both annotated and unannotated charts.