Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

SPARC: Soft Probabilistic Adaptive multi-interest Retrieval Model via Codebooks for recommender system

Created by
  • Haebom

Author

Jialiang Shi, Yaguang Dou, Tian Qi

Outline

This paper proposes a novel retrieval framework, SPARC (Soft Probabilistic Adaptive Retrieval Model via Codebooks), to address three key challenges of multi-interest modeling in practical recommender systems (RS): 1. invariant interests extracted from predefined external knowledge, 2. over-exploitation strategies focused on matching existing interests, and 3. lack of novel interest discovery. SPARC utilizes a Residual Quantized Variational Autoencoder (RQ-VAE) to construct a discrete interest space, which is then trained alongside a large-scale recommender model to mine behavior-based interests that dynamically evolve and reflect user feedback. Furthermore, a probabilistic interest module that predicts the probability distribution over the entire dynamic discrete interest space enables an efficient "soft search" strategy during online inference, shifting the paradigm from passive matching to active exploration and effectively facilitating interest discovery. A/B testing on an industry platform with tens of millions of daily active users yielded significant results, including a +0.9% increase in user watch time, a +0.4% increase in page views (PV), and a +22.7% increase in PV500 (new content reaching 500 PVs within 24 hours). Offline evaluations using the Amazon Product dataset also consistently showed improvements in metrics such as Recall@K and NDCG@K.

Takeaways, Limitations

Takeaways:
We present the possibility of building a dynamic and discrete interest space using RQ-VAE and reflecting real-time user preferences through behavior-based interest mining.
Active interest exploration and discovery through a "soft search" strategy using the probabilistic interest module.
Validated practical effectiveness through A/B testing on large-scale industry platforms. Significant improvements in key metrics such as user viewing time, page views, and new content reach.
Further verification of the algorithm's performance improvement through offline evaluation.
Limitations:
Lack of discussion on the complexity and computational cost of constructing interest spaces using RQ-VAE.
Further research is needed to determine the generalizability of the results to specific industry platforms.
Lack of clear description of the details and preprocessing steps of the Amazon Product dataset used.
Further analysis is needed to determine the model's adaptability to long-term changes in user behavior.
👍