Sign In

Exposing Weaknesses of Large Reasoning Models through Graph Algorithm Problems

Created by
  • Haebom
Category
Empty

์ €์ž

Qifan Zhang, Jianhao Ruan, Aochuan Chen, Kang Zeng, Nuo Chen, Jing Tang, Jia Li

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ์—ฐ๊ตฌ๋Š” ๋Œ€๊ทœ๋ชจ ์ถ”๋ก  ๋ชจ๋ธ(LRM)์˜ ํ•œ๊ณ„๋ฅผ ๊ทธ๋ž˜ํ”„ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ์„ ํ†ตํ•ด ํ‰๊ฐ€ํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ์ธ GrAlgoBench๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. GrAlgoBench๋Š” ๊ธฐ์กด ์ˆ˜ํ•™, ์ฝ”๋“œ, ์ƒ์‹ ์ถ”๋ก  ๋ฒค์น˜๋งˆํฌ์˜ ํ•œ๊ณ„์ ์„ ๊ทน๋ณตํ•˜๊ณ , ์žฅ๊ธฐ์  ๋งฅ๋ฝ ์ดํ•ด, ๋‚œ์ด๋„ ์กฐ์ ˆ, ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๊ฐ€๋Šฅํ•œ ๊ฒ€์ฆ ๊ฐ€๋Šฅ์„ฑ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, LRM์€ ๋งฅ๋ฝ ๊ธธ์ด๊ฐ€ ๊ธธ์–ด์งˆ์ˆ˜๋ก ์ •ํ™•๋„๊ฐ€ ๊ธ‰๊ฒฉํžˆ ํ•˜๋ฝํ•˜๋ฉฐ, ๊ณผ๋„ํ•œ ์ž๊ธฐ ๊ฒ€์ฆ์œผ๋กœ ์ธํ•ด ์ถ”๋ก  ๊ณผ์ •์ด ๋น„ํšจ์œจ์ ์œผ๋กœ ๊ธธ์–ด์ง€๋Š” '๊ณผ์ž‰ ์‚ฌ๊ณ ' ํ˜„์ƒ์„ ๋ณด์ด๋Š” ๋‘ ๊ฐ€์ง€ ์ฃผ์š” ์•ฝ์ ์„ ๋“œ๋Ÿฌ๋ƒˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
LRM์˜ ์žฅ๊ธฐ์  ๋งฅ๋ฝ ์ดํ•ด ๋Šฅ๋ ฅ ๋ถ€์กฑ์ด ๊ทธ๋ž˜ํ”„ ํฌ๊ธฐ๊ฐ€ ์ปค์งˆ์ˆ˜๋ก 50% ์ดํ•˜๋กœ ๋–จ์–ด์ง€๋Š” ์ •ํ™•๋„ ์ €ํ•˜๋กœ ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค.
โ€ข
LRM์€ ์ •ํ™•๋„ ํ–ฅ์ƒ ์—†์ด ์ถ”๋ก  ๊ณผ์ •์„ ๋ถˆํ•„์š”ํ•˜๊ฒŒ ๋Š˜๋ฆฌ๋Š” '๊ณผ์ž‰ ์‚ฌ๊ณ ' ๋ฐ ๋น„ํšจ์œจ์ ์ธ ์ž๊ธฐ ๊ฒ€์ฆ ๋ฌธ์ œ๋ฅผ ๊ฒช์Šต๋‹ˆ๋‹ค.
โ€ข
GrAlgoBench๋Š” LRM์˜ ์ถ”๋ก  ๋Šฅ๋ ฅ ํ‰๊ฐ€๋ฅผ ์œ„ํ•œ ์—„๊ฒฉํ•˜๊ณ  ๋‹ค์ฐจ์›์ ์ธ ํ…Œ์ŠคํŠธ๋ฒ ๋“œ๋ฅผ ์ œ๊ณตํ•˜์ง€๋งŒ, ์ œ์‹œ๋œ ์•ฝ์ ๋“ค์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•œ LRM ์•„ํ‚คํ…์ฒ˜ ๋ฐ ํ•™์Šต ๋ฐฉ๋ฒ•๋ก  ์—ฐ๊ตฌ๊ฐ€ ์ถ”๊ฐ€์ ์œผ๋กœ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘