This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
This paper presents MLLM-CTBench, a benchmark for continuous instruction tuning of multimodal large-scale language models (MLLMs). MLLM-CTBench comprises seven carefully selected tasks from six diverse domains. It provides a multidimensional evaluation metric (combining final answer accuracy and Chain of Thought (CoT) inference quality), a comprehensive evaluation of continuous learning algorithms (eight algorithms across four major categories), and a comparison of the effectiveness of reinforced fine-tuning (RFT) and supervised fine-tuning (SFT) (based on model performance retention across successive tasks). Experimental results demonstrate that the MLLM inference process is more robust to forgetting during continuous training than the final output, and that a robust base model exhibits stronger forgetting resistance. Properly regularized RFT is shown to be a more robust approach for performance retention across tasks than SFT, highlighting the importance of KL-divergence regularization.
Takeaways, Limitations
•
Takeaways:
◦
Providing MLLM-CTBench, a systematic benchmark for continuous instructional adjustment of MLLM.