Sign In

InteractScience: Programmatic and Visually-Grounded Evaluation of Interactive Scientific Demonstration Code Generation

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Qiaosheng Chen, Yang Liu, Lei Li, Kai Chen, Qipeng Guo, Gong Cheng, Fei Yuan

๐Ÿ’ก ๊ฐœ์š”

์ด ์—ฐ๊ตฌ๋Š” ๊ณผํ•™ ๊ต์œก ๋ฐ ์—ฐ๊ตฌ์—์„œ ์ค‘์š”ํ•œ ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ ๊ณผํ•™ ์‹œ์—ฐ ์ฝ”๋“œ ์ƒ์„ฑ ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ์ธ InteractScience๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. InteractScience๋Š” ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๋ฐฉ์‹์˜ ๊ธฐ๋Šฅ ํ…Œ์ŠคํŠธ์™€ ์‹œ๊ฐ์  ๊ธฐ๋ฐ˜์˜ ์ •์„ฑ์  ํ…Œ์ŠคํŠธ๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ, LLM์ด ๊ณผํ•™ ์ง€์‹์„ ์ •ํ™•ํ•˜๊ฒŒ ์ดํ•ดํ•˜๊ณ  ์‚ฌ์šฉ์ž์˜ ์ƒํ˜ธ์ž‘์šฉ์— ๋ฐ˜์‘ํ•˜๋Š” ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ ํ”„๋ก ํŠธ์—”๋“œ ์ฝ”๋“œ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. 30๊ฐœ์˜ ์ตœ์‹  LLM์„ ํ‰๊ฐ€ํ•œ ๊ฒฐ๊ณผ, ๋„๋ฉ”์ธ ์ง€์‹๊ณผ ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ ์ฝ”๋”ฉ ๋Šฅ๋ ฅ์˜ ํ†ตํ•ฉ์—์„œ ์—ฌ์ „ํžˆ ๊ฐœ์„ ์˜ ์—ฌ์ง€๊ฐ€ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
LLM์ด ๊ณผํ•™์  ๊ฐœ๋…์„ ์„ค๋ช…ํ•˜๋Š” ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ ์‹œ์—ฐ ์ฝ”๋“œ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋Šฅ๋ ฅ์„ ์ฒด๊ณ„์ ์œผ๋กœ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ์ตœ์ดˆ์˜ ๋ฒค์น˜๋งˆํฌ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
โ€ข
ํ”„๋กœ๊ทธ๋ž˜๋ฐ ํ…Œ์ŠคํŠธ์™€ ์‹œ๊ฐ์  ํ…Œ์ŠคํŠธ๋ฅผ ๊ฒฐํ•ฉํ•œ ํ˜์‹ ์ ์ธ ํ‰๊ฐ€ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ํ†ตํ•ด ์‹ค์ œ์ ์ธ ์ƒํ˜ธ์ž‘์šฉ ๋Šฅ๋ ฅ์„ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค.
โ€ข
ํ˜„์žฌ LLM๋“ค์ด ๊ณผํ•™ ์ง€์‹๊ณผ ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ ์ฝ”๋”ฉ ๋Šฅ๋ ฅ์„ ํ†ตํ•ฉํ•˜๋Š” ๋ฐ ์–ด๋ ค์›€์„ ๊ฒช๊ณ  ์žˆ์Œ์„ ์‹ค์ฆ์ ์œผ๋กœ ๋ณด์—ฌ์ฃผ๋ฉฐ, ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๋ฒค์น˜๋งˆํฌ์˜ ๊ณผํ•™ ๋„๋ฉ”์ธ์ด ์ œํ•œ์ ์ด๊ฑฐ๋‚˜, ๋ณต์žกํ•˜๊ณ  ๋น„์ •ํ˜•์ ์ธ ์‚ฌ์šฉ์ž ์ƒํ˜ธ์ž‘์šฉ์— ๋Œ€ํ•œ ํ‰๊ฐ€๊ฐ€ ๋ถ€์กฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘