This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
To improve the efficiency of Multimodal Chain-of-Thought (MCoT) prompting, we aim to overcome the limitations of randomly or manually selected examples. We point out that performance instability occurs because it fails to account for model-specific knowledge distributions and the inherent complexity of the task. Therefore, we propose a new framework inspired by the principle of "personalized training with balanced difficulty." This framework redefines prompt selection as a prompt curriculum design problem, constructing a set of training examples aligned with the model's current ability. We develop a difficulty-balanced sampling strategy by integrating two signals: prediction discrepancy (active learning), which captures the model's perceived difficulty, and intrinsic sample complexity, which measures the inherent difficulty of problem-image pairs. Experiments with multiple MLLMs across five benchmarks demonstrate consistent performance gains and a reduction in performance variance due to random sampling.
Takeaways, Limitations
•
Takeaways:
◦
Improving MCoT prompting performance by suggesting a prompt curriculum design method that reflects the model's difficulty and considers the inherent difficulty of the problem.
◦
A new direction in prompt engineering by consistently improving performance across various MLLMs and overcoming the limitations of random sampling.
◦
Difficulty-balanced sampling strategies leveraging active learning and intrinsic sample complexity can be applied to various MLLMs.
•
Limitations:
◦
Lack of details about the specific algorithm implementation and computational complexity.
◦
Further research is needed to determine whether the new framework can be applied to other fields.
◦
Lack of in-depth analysis of optimization considering model-specific characteristics.