Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs

Created by
  • Haebom

Author

Sonia K. Murthy, Rosie Zhao, Jennifer Hu, Sham Kakade, Markus Wulfmeier, Peng Qian, Tomer Ullman

Outline

This paper presents a novel method for interpreting value trade-offs between conflicting goals (e.g., honesty and consideration for the other party's feelings) in large-scale language models (LLMs). Using the 'cognitive model' from cognitive science, we evaluate the extent to which LLMs reflect human-like value trade-offs. We analyze the inference effort and training dynamics after reinforcement learning in two settings: a state-of-the-art black-box model and an open-source model, revealing patterns between informational utility and social utility. The results show that informational utility is higher than social utility in the inference model, and this trend is also confirmed in the open-source model with high mathematical inference ability. In addition, through the analysis of the training dynamics of LLMs, we find large changes in the utility value in the early training stage and the persistent influence of the base model and the selection of pre-training data. This method can be applied to various development situations of LLMs, and can contribute to forming hypotheses about high-dimensional behaviors, improving the training system of inference models, and improving the control of value trade-offs during model training.

Takeaways, Limitations

Takeaways:
In LLM, we present a novel method to quantitatively analyze the value trade-off between informational and social usefulness.
Provides insight into the correlation between reasoning and social skills in LLM.
We reveal the influence of value changes and baseline models and pre-training data during the LLM training process.
__T70544_____ presents a high-level behavioral understanding of LLM, improved training systems, and enhanced value trade-off control.
Limitations:
Reliance on a specific cognitive model limits the generalizability of the analysis results.
Restrictions on the type and size of LLMs used in the analysis.
The existence of subjective aspects in defining and measuring value trade-offs.
Lack of assessment of LLM performance in real social situations.
👍