Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

ChoirRec: Semantic User Grouping via LLMs for Conversion Rate Prediction of Low-Activity Users

Created by
  • Haebom

Author

Dakai Zhai, Jiong Gao, Boya Du, Junwei Xu, Qijie Shen, Jialin Zhu, Yuning Jiang

Outline

To address the challenges of predicting conversion rates (CVR) for inactive users in large-scale e-commerce recommender systems, we propose the ChoirRec framework, which leverages the semantic capabilities of large-scale language models (LLMs) to organize user groups and improve CVR predictions for inactive users. ChoirRec consists of a semantic group generation module for filtering out noise signals, a group-aware hierarchical representation module for augmenting sparse user embeddings, and a group-aware multi-particle module that utilizes a dual-channel architecture and adaptive fusion mechanism for effective learning and utilization of group knowledge. In offline and online experiments on the Taobao platform, we demonstrated a 1.16% increase in offline GAUC and a 7.24% increase in order volume in online A/B tests, demonstrating its practical applicability.

Takeaways, Limitations

Takeaways:
Improving user group composition and CVR prediction performance using large-scale language models.
A novel approach to solving the problem of predicting CVR for inactive users.
Demonstrated effectiveness on a large-scale e-commerce platform, demonstrating practical value.
Improved knowledge transfer efficiency through dual-channel architecture and adaptive fusion mechanisms.
Limitations:
Lack of detailed information on the type of specific LLM model or how groups are created.
Lack of comparative analysis with other recommendation systems.
Insufficient evaluation of the model's scalability and generalization performance to other datasets.
Lack of specific information on computational complexity and resource consumption.
👍