Sign In

LogicSkills: A Structured Benchmark for Formal Reasoning in Large Language Models

Created by
  • Haebom
Category
Empty

์ €์ž

Brian Rabern, Philipp Mondorf, Barbara Plank

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ์—ฐ๊ตฌ๋Š” ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์ด ํ˜•์‹์  ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ์ง„์ •์œผ๋กœ ์ˆ™๋‹ฌํ–ˆ๋Š”์ง€ ํŒŒ์•…ํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ์ธ 'LogicSkills'๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. LogicSkills๋Š” ์ „์นญ ๊ธฐํ˜ธํ™”, ๋ฐ˜๋ก€ ๊ตฌ์„ฑ, ํƒ€๋‹น์„ฑ ํ‰๊ฐ€์˜ ์„ธ ๊ฐ€์ง€ ํ•ต์‹ฌ ๋…ผ๋ฆฌ ๊ธฐ์ˆ ์„ ๋ถ„๋ฆฌํ•˜์—ฌ ํ‰๊ฐ€ํ•˜๋ฉฐ, ์ด ๋ชจ๋“  ์งˆ๋ฌธ์€ 1์ฐจ ๋…ผ๋ฆฌ์˜ ๋‘ ๋ณ€์ˆ˜ ์กฐ๊ฐ์—์„œ ํŒŒ์ƒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, LLM์€ ํƒ€๋‹น์„ฑ ํ‰๊ฐ€์—์„œ๋Š” ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋‚˜, ๊ธฐํ˜ธํ™” ๋ฐ ๋ฐ˜๋ก€ ๊ตฌ์„ฑ์—์„œ๋Š” ์„ฑ๋Šฅ์ด ํ˜„์ €ํžˆ ๋‚ฎ์•„ ํ‘œ๋ฉด์  ํŒจํ„ด ์˜์กด์„ฑ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
LLM์˜ ํ˜•์‹์  ์ถ”๋ก  ๋Šฅ๋ ฅ ์ค‘ ํŠน์ • ๊ธฐ์ˆ (๊ธฐํ˜ธํ™”, ๋ฐ˜๋ก€ ๊ตฌ์„ฑ)์ด ์ƒ๋Œ€์ ์œผ๋กœ ๋ถ€์กฑํ•˜๋ฉฐ, ์ด๋Š” ์ง„์ •ํ•œ ๊ทœ์น™ ๊ธฐ๋ฐ˜ ์ถ”๋ก  ๋Šฅ๋ ฅ๋ณด๋‹ค๋Š” ํŒจํ„ด ๋งค์นญ์— ์˜์กดํ•  ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
โ€ข
LogicSkills ๋ฒค์น˜๋งˆํฌ๋Š” LLM์˜ ๋…ผ๋ฆฌ์  ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ๋ณด๋‹ค ์„ธ๋ถ„ํ™”๋˜๊ณ  ๊ฐ๊ด€์ ์œผ๋กœ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ํ‘œ์ค€์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
ํ˜„์žฌ ๋ฒค์น˜๋งˆํฌ๋Š” 1์ฐจ ๋…ผ๋ฆฌ์˜ ํŠน์ • ์กฐ๊ฐ(๋‘ ๋ณ€์ˆ˜, ํ•ญ๋“ฑ ์—†์Œ)์— ๊ตญํ•œ๋˜์–ด ์žˆ์–ด, ๋” ๋ณต์žกํ•˜๊ฑฐ๋‚˜ ๋‹ค์–‘ํ•œ ํ˜•์‹์  ์ถ”๋ก ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ํ™•์žฅ๋  ํ•„์š”๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘