Sign In

SecureVibeBench: Evaluating Secure Coding Capabilities of Code Agents with Realistic Vulnerability Scenarios

Created by
  • Haebom
Category
Empty

์ €์ž

Junkai Chen, Huihui Huang, Yunbo Lyu, Junwen An, Jieke Shi, Chengran Yang, Ting Zhang, Haoye Tian, Yikun Li, Zhenhao Li, Xin Zhou, Xing Hu, David Lo

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ์—ฐ๊ตฌ๋Š” LLM ๊ธฐ๋ฐ˜ ์ฝ”๋“œ ์—์ด์ „ํŠธ๊ฐ€ ์ƒ์„ฑํ•˜๋Š” ์ฝ”๋“œ์˜ ๋ณด์•ˆ ์ทจ์•ฝ์ ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ํ˜„์‹ค์ ์ธ ์ทจ์•ฝ์  ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ํฌํ•จํ•˜๋Š” SecureVibeBench๋ผ๋Š” ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฒค์น˜๋งˆํฌ๋Š” ์‹ค์ œ ์˜คํ”ˆ์†Œ์Šค ํ”„๋กœ์ ํŠธ์—์„œ ํŒŒ์ƒ๋œ 105๊ฐœ์˜ C/C++ ๋ณด์•ˆ ์ฝ”๋”ฉ ์ž‘์—…์„ ํฌํ•จํ•˜๋ฉฐ, ๋‹ค์ค‘ ํŒŒ์ผ ํŽธ์ง‘, ์‹ค์ œ ์ทจ์•ฝ์  ๋งฅ๋ฝ, ๊ธฐ๋Šฅ ๋ฐ ๋ณด์•ˆ ํ…Œ์ŠคํŠธ๋ฅผ ๊ฒฐํ•ฉํ•œ ํฌ๊ด„์ ์ธ ํ‰๊ฐ€๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ํ˜„์žฌ ์ธ๊ธฐ ์žˆ๋Š” ์ฝ”๋“œ ์—์ด์ „ํŠธ๋“ค์˜ ํ‰๊ฐ€ ๊ฒฐ๊ณผ, ๊ฐ€์žฅ ์šฐ์ˆ˜ํ•œ ์—์ด์ „ํŠธ์กฐ์ฐจ๋„ 23.8%์˜ ์ •ํ™•ํ•˜๊ณ  ์•ˆ์ „ํ•œ ์†”๋ฃจ์…˜๋งŒ์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ๊ทธ์ณ, ์ฝ”๋“œ ์—์ด์ „ํŠธ์˜ ๋ณด์•ˆ ์ฝ”๋”ฉ ๋Šฅ๋ ฅ์ด ์•„์ง ๋ถ€์กฑํ•จ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
LLM ๊ธฐ๋ฐ˜ ์ฝ”๋“œ ์—์ด์ „ํŠธ์˜ ์‹ค์ œ์ ์ธ ๋ณด์•ˆ ์ฝ”๋”ฉ ๋Šฅ๋ ฅ์„ ๊ฐ๊ด€์ ์œผ๋กœ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ๋ฅผ ์ œ๊ณตํ–ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
๊ธฐ์กด ๋ฒค์น˜๋งˆํฌ์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ณ  ์ธ๊ฐ„ ๊ฐœ๋ฐœ์ž์˜ ์ทจ์•ฝ์  ๋„์ž… ๋ฐฉ์‹์„ ๋ฐ˜์˜ํ•˜์—ฌ ์ฝ”๋“œ ์—์ด์ „ํŠธ์™€ ์ธ๊ฐ„ ๊ฐ„์˜ ๊ณต์ •ํ•œ ๋น„๊ต๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
โ€ข
ํ˜„์žฌ ์ฝ”๋“œ ์—์ด์ „ํŠธ๋“ค์˜ ๋ณด์•ˆ ์ฝ”๋”ฉ ๋Šฅ๋ ฅ์ด ํ˜„์ €ํžˆ ๋‚ฎ์œผ๋ฉฐ, ์ •ํ™•์„ฑ๊ณผ ๋ณด์•ˆ์„ฑ์„ ๋™์‹œ์— ๋งŒ์กฑ์‹œํ‚ค๋Š” ์†”๋ฃจ์…˜ ์ƒ์„ฑ์— ์–ด๋ ค์›€์„ ๊ฒช๊ณ  ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
โ€ข
๋ฒค์น˜๋งˆํฌ๋Š” C/C++ ์–ธ์–ด์— ํ•œ์ •๋˜์–ด ์žˆ์œผ๋ฉฐ, ์‹ค์ œ ์†Œํ”„ํŠธ์›จ์–ด ๊ฐœ๋ฐœ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๋ชจ๋“  ์œ ํ˜•์˜ ์ทจ์•ฝ์ ์„ ๋‹ค๋ฃจ์ง€ ๋ชปํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ–ฅํ›„ ๋” ๋‹ค์–‘ํ•œ ์–ธ์–ด์™€ ๋ณต์žกํ•œ ์ทจ์•ฝ์  ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ํฌํ•จํ•˜๋Š” ํ™•์žฅ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘