Sign In

HeroBench: A Benchmark for Long-Horizon Planning and Structured Reasoning in Virtual Worlds

Created by
  • Haebom
Category
Empty

์ €์ž

Petr Anokhin, Roman Khalikov, Stefan Rebrikov, Viktor Volkov, Artyom Sorokin, Vincent Bissonnette

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ๋ณต์žกํ•œ RPG ๊ฐ€์ƒ ์„ธ๊ณ„์—์„œ ์žฅ๊ธฐ ๊ณ„ํš ๋ฐ ๊ตฌ์กฐ์  ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ HeroBench๋ผ๋Š” ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. HeroBench๋Š” ํ˜„์‹ค์ ์ธ ์ œ์•ฝ ์กฐ๊ฑด ํ•˜์—์„œ ์ˆ˜๋ฐฑ์—์„œ ์ˆ˜์ฒœ ๊ฐœ์˜ ์•ก์…˜์„ ํฌํ•จํ•˜๋Š” ๋‹จ์ผ ์ข…๋‹จ ๊ณ„ํš์„ ์š”๊ตฌํ•˜๋ฉฐ, LLM์˜ ์žฅ๊ธฐ ๊ณ„ํš ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. 25๊ฐœ์˜ ์ตœ์‹  LLM ํ‰๊ฐ€ ๊ฒฐ๊ณผ, ๊ธฐ์กด ์ถ”๋ก  ๋ฒค์น˜๋งˆํฌ์—์„œ ๋ณด๊ธฐ ๋“œ๋ฌธ ์ƒ๋‹นํ•œ ์„ฑ๋Šฅ ๊ฒฉ์ฐจ๋ฅผ ๋ณด์˜€์œผ๋ฉฐ, ๊ฐ€์žฅ ์–ด๋ ค์šด ๊ณผ์ œ๋Š” ํ˜„์žฌ ์–ด๋–ค ๋ชจ๋ธ๋„ ์•ˆ์ •์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜์ง€ ๋ชปํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
LLM์˜ ์žฅ๊ธฐ์ ์ด๊ณ  ๊ณ„์ธต์ ์ธ ๊ณ„ํš ์ˆ˜๋ฆฝ ๋Šฅ๋ ฅ์„ ํ˜„์‹ค์ ์ธ ๊ฐ€์ƒ ์„ธ๊ณ„ ํ™˜๊ฒฝ์—์„œ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๊ธฐ์กด ๋ฒค์น˜๋งˆํฌ์—์„œ ๋“œ๋Ÿฌ๋‚˜์ง€ ์•Š์•˜๋˜ LLM ๊ฐ„์˜ ์„ฑ๋Šฅ ๊ฒฉ์ฐจ์™€ ์žฅ๊ธฐ ๊ณ„ํš์—์„œ์˜ ์–ด๋ ค์›€์„ ๋ช…ํ™•ํžˆ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
โ€ข
๊ฐ€์žฅ ์–ด๋ ค์šด ๊ณผ์ œ๋ฅผ ์•ˆ์ •์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜๋Š” ๋ชจ๋ธ์ด ์—†์–ด, LLM์˜ ์ž์œจ์  ์žฅ๊ธฐ ๊ณ„ํš ๋Šฅ๋ ฅ ํ–ฅ์ƒ์— ๋Œ€ํ•œ ์ง€์†์ ์ธ ์—ฐ๊ตฌ ํ•„์š”์„ฑ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘