Sign In

SMART: Self-Generating and Self-Validating Multi-Dimensional Assessment for LLMs' Mathematical Problem Solving

Created by
  • Haebom
Category
Empty

์ €์ž

Yujie Hou, Mei Wang, Yaoyao Zhong, Ting Zhang, Xuetao Ma, Hua Huang

๐Ÿ’ก ๊ฐœ์š”

๊ธฐ์กด LLM ์ˆ˜ํ•™ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ ํ‰๊ฐ€ ๋ฐฉ์‹์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด, Polya์˜ ๋ฌธ์ œ ํ•ด๊ฒฐ ์ด๋ก ์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ '์˜๋ฏธ ์ดํ•ด', '์ˆ˜ํ•™์  ์ถ”๋ก ', '์‚ฐ์ˆ  ์—ฐ์‚ฐ', '์„ฑ์ฐฐ ๋ฐ ๊ฐœ์„ '์˜ ๋„ค ๊ฐ€์ง€ ์ธ์ง€ ์ฐจ์›์œผ๋กœ ์ˆ˜ํ•™ ๋ฌธ์ œ ํ•ด๊ฒฐ ๊ณผ์ •์„ ๋ถ„ํ•ดํ•˜๋Š” ์ƒˆ๋กœ์šด ํ‰๊ฐ€ ํ”„๋ ˆ์ž„์›Œํฌ์ธ SMART๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. SMART๋Š” ๊ฐ ์ฐจ์›์— ๋งž๋Š” ๊ตฌ์ฒด์ ์ธ ํƒœ์Šคํฌ๋ฅผ ํ†ตํ•ด LLM์˜ ์ธ์ง€ ๊ณผ์ •์„ ์ธก์ •ํ•˜๋ฉฐ, 22๊ฐœ์˜ ์ตœ์‹  LLM์— ์ ์šฉํ•œ ๊ฒฐ๊ณผ, ์ฐจ์›๋ณ„ ๋Šฅ๋ ฅ์˜ ์ƒ๋‹นํ•œ ๋ถˆ์ผ์น˜๋ฅผ ๋ฐœ๊ฒฌํ•˜๊ณ  ์ง„์ •ํ•œ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ์„ ๋” ์ž˜ ๋ฐ˜์˜ํ•˜๋Š” 'All-Pass Score'๋ผ๋Š” ์ƒˆ๋กœ์šด ์ง€ํ‘œ๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
LLM์˜ ์ˆ˜ํ•™ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ์„ ํ‘œ๋ฉด์ ์ธ ํŒจํ„ด ์ธ์‹์—์„œ ๋ฒ—์–ด๋‚˜ ๋‹ค์ฐจ์›์ ์ธ ์ธ์ง€ ๊ณผ์ •์œผ๋กœ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
ํ˜„์žฌ LLM๋“ค์ด ์ˆ˜ํ•™์  ์ถ”๋ก ์˜ ํŠน์ • ๋‹จ๊ณ„์—์„œ๋Š” ์šฐ์ˆ˜ํ•˜์ง€๋งŒ, ์ „์ฒด ๋ฌธ์ œ ํ•ด๊ฒฐ ๊ณผ์ •์—์„œ๋Š” ์ž ์žฌ์ ์ธ ์•ฝ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ All-Pass Score๋Š” LLM์˜ ์‹ค์ œ์ ์ธ ์ˆ˜ํ•™ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ์„ ๋ณด๋‹ค ์ •ํ™•ํ•˜๊ฒŒ ์ธก์ •ํ•˜๋Š” ๋ฐ ๊ธฐ์—ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
SMART ํ”„๋ ˆ์ž„์›Œํฌ์™€ All-Pass Score์˜ ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅ์„ฑ ๋ฐ ๋‹ค์–‘ํ•œ ์ˆ˜ํ•™ ๋ถ„์•ผ๋กœ์˜ ํ™•์žฅ ๊ฐ€๋Šฅ์„ฑ์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘