Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Experiments

Created by
  • Haebom

Author

Ziyuan Zhang, Darcy Wang, Ningyuan Chen, Rodrigo Mansur, Vahid Sarhangian

Outline

To study the exploration and exploitation (E&E) strategies of large-scale language models (LLMs), we use a classic multi-arm bandit (MAB) experiment introduced in the cognitive science and psychiatry literature. We compare the E&E strategies of LLMs, humans, and MAB algorithms, and investigate how activating thought traces through prompting strategies and mental models affects LLMs' decision-making. Our results show that activating thought leads to human-like behavioral changes in LLMs, demonstrating human-like levels of exploration in simple environments. However, in more complex, unstable environments, LLMs fail to match human adaptability in effective directed exploration.

Takeaways, Limitations

While the LLM shows potential as a human behavior simulator and automated decision-making tool, Limitations also exists.
When activated in the LLM, it exhibits human-like behavior, exhibiting a mix of random and directed exploration.
In simple environments, they achieve human-like levels of exploration, but struggle to adapt in complex environments.
There is a need to improve the effective oriented exploration capabilities of LLM.
👍