Sign In

Frontier-Eng: Benchmarking Self-Evolving Agents on Real-World Engineering Tasks with Generative Optimization

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Yizhe Chi, Deyao Hong, Dapeng Jiang, Tianwei Luo, Kaisen Yang, Boshi Zhang, Zhe Cao, Xiaoyan Fan, Bingxiang He, Han Hao, Weiyang Jin, Dianqiao Lei, Qingle Liu, Houde Qian, Bowen Wang, Situ Wang, Youjie Zheng, Yifan Zhou, Calvin Xiao, Eren Cai, Qinhuai Na

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ๊ธฐ์กด LLM ์—์ด์ „ํŠธ ๋ฒค์น˜๋งˆํฌ๊ฐ€ ์ฃผ๋กœ ์„ฑ๊ณต/์‹คํŒจ ๊ธฐ๋ฐ˜์˜ ์ด์ง„ ๋ถ„๋ฅ˜ ์ž‘์—…์— ์ง‘์ค‘ํ•˜์—ฌ ์‹ค์ œ ์—”์ง€๋‹ˆ์–ด๋ง ๋ถ„์•ผ์˜ ๋ฐ˜๋ณต์ ์ธ ์ตœ์ ํ™” ๊ฐ€์น˜๋ฅผ ๊ฐ„๊ณผํ•˜๋Š” ์ ์„ ์ง€์ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ๋ณธ ์—ฐ๊ตฌ๋Š” ์‚ฐ์—… ๋“ฑ๊ธ‰ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์™€ ๊ฒ€์ฆ๊ธฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋ฉฐ ์—ฐ์†์ ์ธ ๋ณด์ƒ ์‹ ํ˜ธ์™€ ์ œ์•ฝ๋œ ์˜ˆ์‚ฐ ํ•˜์˜ ์—„๊ฒฉํ•œ ์‹คํ˜„ ๊ฐ€๋Šฅ์„ฑ ์ œ์•ฝ์„ ํฌํ•จํ•˜๋Š” 47๊ฐœ์˜ ์—”์ง€๋‹ˆ์–ด๋ง ์ž‘์—…์œผ๋กœ ๊ตฌ์„ฑ๋œ "Frontier-Eng"๋ผ๋Š” ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. Frontier-Eng๋Š” ์—์ด์ „ํŠธ๊ฐ€ ํ›„๋ณด ์‚ฐ์ถœ๋ฌผ์„ ์ƒ์„ฑํ•˜๊ณ , ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ๊ฒ€์ฆ์ž ํ”ผ๋“œ๋ฐฑ์„ ๋ฐ›์•„, ์ œํ•œ๋œ ์ƒํ˜ธ์ž‘์šฉ ์˜ˆ์‚ฐ ๋‚ด์—์„œ ์ˆ˜์ •ํ•˜๋Š” ์ƒ์„ฑ์  ์ตœ์ ํ™”(propose-execute-evaluate loop)๋ฅผ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
Frontier-Eng๋Š” ์‹ค์ œ ์—”์ง€๋‹ˆ์–ด๋ง ์ž‘์—…์˜ ๋ณต์žก์„ฑ๊ณผ ๋ฐ˜๋ณต์  ์ตœ์ ํ™” ๊ณผ์ •์„ AI ์—์ด์ „ํŠธ ํ‰๊ฐ€๋ฅผ ์œ„ํ•œ ์ƒˆ๋กœ์šด ํ‘œ์ค€์œผ๋กœ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
ํ˜„์žฌ ์ตœ์ฒจ๋‹จ ์–ธ์–ด ๋ชจ๋ธ๋“ค๋„ Frontier-Eng ๋ฒค์น˜๋งˆํฌ์—์„œ ์ƒ๋‹นํ•œ ์–ด๋ ค์›€์„ ๊ฒช๊ณ  ์žˆ์œผ๋ฉฐ, ํŠนํžˆ Claude 4.6 Opus๊ฐ€ ๊ฐ€์žฅ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋‚˜ ์—ฌ์ „ํžˆ ๊ฐœ์„ ์˜ ์—ฌ์ง€๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
๋ถ„์„ ๊ฒฐ๊ณผ, ์—์ด์ „ํŠธ ์„ฑ๋Šฅ ํ–ฅ์ƒ์€ ๋ฐ˜๋ณต ํšŸ์ˆ˜์™€ ๊ฐœ์„  ๊ทœ๋ชจ ๋ชจ๋‘์—์„œ ๊ฑฐ๋“ญ์ œ๊ณฑ ๋ฒ•์น™(power-law decay)์„ ๋”ฐ๋ฅด๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์œผ๋ฉฐ, ์ œํ•œ๋œ ์˜ˆ์‚ฐ ํ•˜์—์„œ๋Š” ๊นŠ์ด(depth)๊ฐ€ ๋„ˆ๋น„(width)๋ณด๋‹ค ๋” ์ค‘์š”ํ•จ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.
โ€ข
ํ–ฅํ›„ ๊ณผ์ œ๋กœ๋Š” ๋” ๋‹ค์–‘ํ•œ ์—”์ง€๋‹ˆ์–ด๋ง ๋„๋ฉ”์ธ์œผ๋กœ ํ™•์žฅํ•˜๊ณ , ์—์ด์ „ํŠธ๊ฐ€ ๋„๋ฉ”์ธ ์ง€์‹๊ณผ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ํ”ผ๋“œ๋ฐฑ์„ ํ†ตํ•ฉํ•˜์—ฌ ๋ณต์žกํ•˜๊ณ  ๊ฐœ๋ฐฉํ˜• ์—”์ง€๋‹ˆ์–ด๋ง ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋Šฅ๋ ฅ์„ ๋”์šฑ ์‹ฌ์ธต์ ์œผ๋กœ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๊ฐœ๋ฐœํ•˜๋Š” ๊ฒƒ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘