This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
EcomMMMU: Strategic Utilization of Visuals for Robust Multimodal E-Commerce Models
Created by
Haebom
Author
Xinyi Ling, Hanwen Du, Zhihui Zhu, Xia Ning
Outline
This paper addresses the issue that diverse image data from e-commerce platforms may not always improve product understanding. To systematically validate this, we introduce EcomMMMU, a large-scale multimodal multi-task understanding dataset with 406,190 samples and 8,989,510 images. EcomMMMU consists of eight essential tasks and a Visual Selection Subset (VSS) subset to evaluate the ability to leverage multiple images. It is used to benchmark the performance of multimodal large-scale language models (MLLMs). Our analysis of EcomMMMU reveals that product images do not always improve performance and, in some cases, even degrade it. Based on this insight, we propose SUMEI, a data-driven method that predicts the usefulness of images and strategically utilizes them for subtasks. Experimental results demonstrate the effectiveness and robustness of SUMEI. Data and code are available at https://anonymous.4open.science/r/submission25 .
Takeaways, Limitations
•
Takeaways:
◦
A new perspective on leveraging multimodal data in e-commerce platforms: revealing that image data is not always beneficial.
◦
Evaluating the multi-image utilization capability of MLLM and suggesting potential improvements using the large-scale multi-modal dataset EcomMMMU.
◦
SUMEI, an efficient multi-image utilization method based on image usefulness prediction, is proposed.
◦
This suggests that MLLM may struggle to effectively leverage rich visual content in e-commerce operations.
•
Limitations:
◦
Possible bias toward specific e-commerce platforms in the EcomMMMU dataset.
◦
Further research is needed to determine the generalizability of the SUMEI method.
◦
Performance verification is needed for other e-commerce-related tasks beyond the eight presented.