Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Subjective Behaviors and Preferences in LLM: Language of Browsing

Created by
  • Haebom

Author

Sai Sundaresan, Harshita Chopra, Atanu R. Sinha, Koustava Goswami, Nagasai Saketh Naidu, Raghav Karan, N Anushka

Outline

This paper questions how well large-scale language models (LLMs) can capture users' subjective and heterogeneous website or app usage behaviors. We consider a user's sequential page-access logs as each user's unique "browsing language" and pose three questions: whether small-scale LMs can better represent this "browsing language" than large-scale LMs; whether LMs with a single parameter set can sufficiently capture the heterogeneous behaviors of diverse users; and whether a single LM with high average performance can consistently perform at the user level. To address this, we propose Heterogeneity-aware Training of Language Models (HeTLM), a cluster-wise LM training method suited for subjective behavior. We experimentally demonstrate that small-scale LMs outperform large-scale pre-trained or fine-tuned LMs when trained using a page-wise tokenizer; that HeTLMs with heterogeneous cluster-wise parameter sets outperform single LMs of the same size; and that they achieve improved user-level alignment by improving average performance and reducing variance during the generation process.

Takeaways, Limitations

Takeaways:
We demonstrate that small-scale LMs can better model users' subjective web browsing behavior using page-level tokenizers.
We suggest that cluster-based training methods such as HeTLM can improve the performance of LLM by taking into account user heterogeneity.
HeTLM achieves higher average performance and lower performance variance than a single LLM, improving user-level performance consistency.
Limitations:
HeTLM's performance improvements may be limited to specific datasets and user behavior patterns.
Further research is needed to determine generalizability across different types of user behavior and website/app types.
Further analysis of the computational cost and scalability of HeTLM is needed.
👍