[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games

Created by
  • Haebom

Author

Lukas Schafer, Logan Jones, Anssi Kanervisto, Yuhan Cao, Tabish Rashid, Raluca Georgescu, Dave Bignell, Siddhartha Sen, Andrea Trevi no Gavito, Sam Devlin

Outline

In this paper, we systematically study the effectiveness of imitation learning using pre-trained visual encoders for studying decision making in modern video games (Minecraft, Counter-Strike: Global Offensive, and Minecraft Dungeons). Instead of relying on traditional game-specific integrations or large datasets, we present an imitation learning approach that trains agents to play games using only images. Experimental results show that while end-to-end training can be effective with low-resolution images and short demonstrations, leveraging pre-trained encoders such as DINOv2 significantly improves performance depending on the game. Furthermore, we suggest that pre-trained encoders can make decision making research in video games more accessible by significantly reducing the training cost.

Takeaways, Limitations

Takeaways:
We show that imitation learning using pre-trained visual encoders (e.g., DINOv2) is effective for training decision-making agents in modern video games.
End-to-end training is possible with low-resolution images and short demonstrations, and performance can be further improved by using pre-trained encoders.
Leveraging pre-trained encoders can make modern video game-based decision-making research more accessible and reduce training costs.
Limitations:
The games studied were limited to Minecraft, Counter-Strike: Global Offensive, and Minecraft Dungeons. Generalizability to different game genres and complexities requires further research.
There is a dependency on a specific pre-trained encoder (DINOv2), and further comparative analysis of the performance of other encoders is needed.
There is a lack of in-depth analysis of how the quality and quantity of demonstration data used in imitation learning affect the results.
👍