Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining

Created by
  • Haebom

Author

Chenxi Liu, Tianyi Xiong, Yanshuo Chen, Ruibo Chen, Yihan Wu, Junfeng Guo, Tianyi Zhou, Heng Huang

Outline

This paper proposes Modality-Balancing Preference Optimization (MBPO), a novel preference learning framework to address the modality imbalance problem in large-scale multimodal models (LMMs). MBPO builds a more effective offline preference dataset using hard negatives generated through adversarial perturbation and generates online responses using verified rewards using a close-ended task. Furthermore, Group Relative Policy Optimization (GRPO) is used to train the model using hybrid offline-online data. Experimental results show that MBPO improves the performance of LMMs and effectively reduces hallucinations.

Takeaways, Limitations

Takeaways:
Contributes to solving the modality imbalance problem of LMM.
Augmenting the effectiveness of offline preference datasets by generating hard negatives using adversarial perturbation.
Improving model adaptability by generating online data and training using GRPO.
Demonstrating the effectiveness of LMM in improving performance and reducing hallucination in vision-language tasks.
Limitations:
Further research is needed on how to mitigate the internal bias of the LLM backbone.
Generalization performance evaluation is needed for all types of LMM tasks.
Further research is needed on the scalability and computational efficiency of MBPO.
👍