Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Pok\'eAI: A Goal-Generating, Battle-Optimizing Multi-agent System for Pokemon Red

Created by
  • Haebom

Author

Zihao Liu, Xinhang Sui, Yueran Song, Siwen Wang

Outline

PokéAI is a text-based multi-agent large-scale language model (LLM) framework designed to autonomously play and progress through the Pokémon Red game. It consists of three specialized agents: Plan, Execute, and Critique, each with its own memory bank, role, and skill set. The Planner acts as the central brain and generates tasks for game progression, while the Executor performs these tasks within the game environment. After the tasks are completed, the Critique agent evaluates whether the goal has been achieved, and once verification is complete, control is returned to the Planner agent, forming a closed-loop decision-making system. A battle module was developed within the Executor agent, which achieved an average win rate of 80.8% in 50 battles against wild Pokémon, which is 6% lower than the performance of skilled human players. In addition, the model’s battle performance is strongly correlated with the LLM Arena score on language-related tasks, suggesting a meaningful link between language ability and strategic reasoning. Analysis of gameplay logs shows that each LLM exhibits a unique play style, suggesting that individual models develop unique strategic behaviors.

Takeaways, Limitations

Takeaways:
Demonstrating the potential of game play using text-based multi-agent LLM.
Presenting the correlation between LLM language proficiency and strategic reasoning ability.
The unique play style of each LLM presents a variety of strategic possibilities.
Suggesting the possibility of developing a high-performance automated gameplay agent (80.8% battle win rate).
Limitations:
This is still in the early stages of research and the entire game has not been played yet.
Further research is needed to determine generalizability across a variety of gaming environments and situations.
Further analysis is needed to determine the causal relationship between LLM Arena scores and combat performance.
Validation of applicability and scalability to more complex game environments is needed.
👍