Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Cross-Platform E-Commerce Product Categorization and Recategorization: A Multimodal Hierarchical Classification Approach

Created by
  • Haebom

Author

Lotte Gross, Rebecca Walter, Nicole Zoppi, Adrien Justus, Alessandro Gambetti, Qiwei Han, Maximilian Kaiser

Outline

This study develops and deploys a multimodal hierarchical classification framework to address the industrial challenges of e-commerce product classification, such as platform heterogeneity and the structural limitations of existing classification systems. Using a dataset of 271,700 products collected from 40 international fashion e-commerce platforms, we integrate textual features (RoBERTa), visual features (ViT), and a joint visual-linguistic representation (CLIP). We explore early, late, and attention-based fusion strategies within a hierarchical structure, and enhance dynamic masking to ensure consistency of the classification system. As a result, the CLIP embedding using the MLP-based late fusion strategy achieved the highest hierarchical F1 score (98.59%), outperforming a single-modal baseline model. To address shallow or inconsistent categories, we introduce a self-supervised learning-based "product reclassification" pipeline using SimCLR, UMAP, and cascade clustering. This pipeline discovers new, fine-grained categories (e.g., subtypes of "shoes") with a cluster purity of over 86%. Cross-platform experiments demonstrate deployment tradeoffs. While complex late-fusion methods maximize accuracy by utilizing diverse training data, simple early-fusion methods generalize more effectively to unseen platforms. Finally, we demonstrate industrial scalability by deploying the framework on EURWEB's commercial transaction information platform using a two-stage inference pipeline combining a lightweight RoBERTa stage and a GPU-accelerated multi-modal stage.

Takeaways, Limitations

Takeaways:
Improving e-commerce product classification accuracy through multi-modal (text, image) information fusion (achieves 98.59% F1 score).
Overcoming the limitations of existing classification systems and discovering refined categories through a self-supervised learning-based product reclassification pipeline.
We present a model selection strategy suitable for real-world deployment environments by suggesting trade-offs between cross-platform generalization performance and accuracy.
Presenting successful cases of building and deploying real-world systems with industrial scalability.
Limitations:
The results were limited to a specific domain (fashion). Further research is needed to determine generalizability to other domains.
Performance depends on large datasets. Possible performance degradation in data-poor environments.
The complexity of the late fusion method increases computational costs. Further research is needed on lightweighting and optimization.
Further analysis of the clustering performance of self-supervised learning-based reclassification pipelines is needed.
👍