Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

EcomMMMU: Strategic Utilization of Visuals for Robust Multimodal E-Commerce Models

Created by
  • Haebom

Author

Xinyi Ling, Hanwen Du, Zhihui Zhu, Xia Ning

Outline

This paper addresses the issue that diverse image data from e-commerce platforms may not always improve product understanding. To systematically validate this, we introduce EcomMMMU, a large-scale multimodal multi-task understanding dataset with 406,190 samples and 8,989,510 images. EcomMMMU consists of eight essential tasks and a Visual Selection Subset (VSS) subset to evaluate the ability to leverage multiple images. It is used to benchmark the performance of multimodal large-scale language models (MLLMs). Our analysis of EcomMMMU reveals that product images do not always improve performance and, in some cases, even degrade it. Based on this insight, we propose SUMEI, a data-driven method that predicts the usefulness of images and strategically utilizes them for subtasks. Experimental results demonstrate the effectiveness and robustness of SUMEI. Data and code are available at https://anonymous.4open.science/r/submission25 .

Takeaways, Limitations

Takeaways:
A new perspective on leveraging multimodal data in e-commerce platforms: revealing that image data is not always beneficial.
Evaluating the multi-image utilization capability of MLLM and suggesting potential improvements using the large-scale multi-modal dataset EcomMMMU.
SUMEI, an efficient multi-image utilization method based on image usefulness prediction, is proposed.
This suggests that MLLM may struggle to effectively leverage rich visual content in e-commerce operations.
Limitations:
Possible bias toward specific e-commerce platforms in the EcomMMMU dataset.
Further research is needed to determine the generalizability of the SUMEI method.
Performance verification is needed for other e-commerce-related tasks beyond the eight presented.
👍