This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
TextQuests: How Good are LLMs at Text-Based Video Games?
Created by
Haebom
Author
Long Phan, Mantas Mazeika, Andy Zou, Dan Hendrycks
Outline
This paper proposes TextQuests, a novel benchmark for evaluating AI agents in complex, interactive environments reflecting real-world problems. While existing benchmarks focus on tool use or structured task performance, TextQuests assesses long-term, self-directed reasoning based on the Infocom interactive fiction game. By restricting the use of external tools, TextQuests focuses on the agent's inherent long-term contextual reasoning, trial-and-error learning, and persistent problem-solving abilities. It evaluates the AI agent's self-directed problem-solving abilities through complex games that would take a human player over 30 hours. We publish the benchmark at https://textquests.ai .