Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Dream to Chat: Model-based Reinforcement Learning on Dialogues with User Belief Modeling

Created by
  • Haebom

Author

Yue Zhao, Xiaoyu Wang, Dan Wang, Zhonglin Jiang, Qingqing Gu, Teng Chen, Ningyuan Xi, Jinxian Qu, Yong Chen, Luo Ji

Outline

This paper explores the application of world models to natural language processing tasks, focusing specifically on conversational systems. We build a conversational world model that predicts user emotions, sentiments, intentions, and future utterances. We define a POMDP (Property-Oriented Model of Mind Processing) to demonstrate that emotions, sentiments, and intentions can be modeled as user beliefs, arguing that information bottlenecks can be resolved by maximizing the information bottleneck. Using this user belief modeling, we apply a model-based reinforcement learning framework to the conversational system and propose a framework called DreamCUB. Experiments demonstrate that a pre-trained conversational world model achieves state-of-the-art performance in sentiment classification and sentiment identification, and joint training of the policy, critic, and conversational world models further enhances conversational quality. Furthermore, we demonstrate that the model maintains a reasonable exploration-exploitation balance and transfers well to non-domain scenarios, such as empathic conversations.

Takeaways, Limitations

Takeaways:
Leveraging world models in conversation systems to improve user understanding and conversation quality.
A novel POMDP-based user belief modeling approach for emotion, sentiment, and intention prediction is proposed.
Demonstrating the effectiveness of model-based reinforcement learning using the DreamCUB framework.
Achieving SOTA in emotion classification and sentiment identification.
Maintaining a reasonable search-exploitation balance and excellent out-of-domain transfer performance.
Limitations:
There is no Limitations information specified in the paper. (It is impossible to judge based on the abstract alone.)
👍