Sign In

When Grammar Guides the Attack: Uncovering Control-Plane Vulnerabilities in LLMs with Structured Output

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Shuoming Zhang, Jiacheng Zhao, Hanyuan Dong, Ruiyuan Xu, Zhicheng Li, Yangyu Zhang, Shuaijiang Li, Yuan Wen, Chunwei Xia, Zheng Wang, Xiaobing Feng, Huimin Cui

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ LLM์ด ๊ตฌ์กฐํ™”๋œ ์ถœ๋ ฅ API๋ฅผ ํ†ตํ•ด ๋„๊ตฌ ํ”Œ๋žซํผ์œผ๋กœ ํ™œ์šฉ๋  ๋•Œ ๋ฐœ์ƒํ•˜๋Š” ์ƒˆ๋กœ์šด ์œ ํ˜•์˜ ์ œ์–ด๋ฉด ๊ณต๊ฒฉ์ธ CDA(Constrained Decoding Attack)๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. CDA๋Š” ๋ฌธ๋ฒ• ๊ธฐ๋ฐ˜ ๋””์ฝ”๋”ฉ ๊ณผ์ •์„ ์•…์šฉํ•˜์—ฌ ์œ ํ•ดํ•œ ์˜๋„๋ฅผ ์ฃผ์ž…ํ•˜๋ฉฐ, ์ด๋Š” ๊ธฐ์กด์˜ ๋ฐ์ดํ„ฐ๋ฉด ์ทจ์•ฝ์ ๊ณผ๋Š” ๋‹ฌ๋ฆฌ ๋ชจ๋ธ ์ž์ฒด์˜ ์•ˆ์ „ ์ •๋ ฌ๋งŒ์œผ๋กœ๋Š” ๋ง‰๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. EnumAttack๊ณผ DictAttack์„ ํ†ตํ•ด CDA๋ฅผ ๊ตฌํ˜„ํ•œ ๊ฒฐ๊ณผ, ์ตœ์‹  LLM์—์„œ ๋งค์šฐ ๋†’์€ ๊ณต๊ฒฉ ์„ฑ๊ณต๋ฅ ์„ ๋ณด์ด๋ฉฐ ์ƒˆ๋กœ์šด ๋ฐฉ์–ด ์ „๋žต์˜ ํ•„์š”์„ฑ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
LLM์˜ ๊ตฌ์กฐํ™”๋œ ์ถœ๋ ฅ API ์‚ฌ์šฉ์€ ์ œ์–ด๋ฉด ๊ณต๊ฒฉ์— ๋Œ€ํ•œ ์ƒˆ๋กœ์šด ์ทจ์•ฝ์ ์„ ๋…ธ์ถœํ•˜๋ฉฐ, ๊ธฐ์กด์˜ ์•ˆ์ „ ์žฅ์น˜๋งŒ์œผ๋กœ๋Š” ํšจ๊ณผ์ ์ธ ๋ฐฉ์–ด๊ฐ€ ์–ด๋ ต์Šต๋‹ˆ๋‹ค.
โ€ข
DictAttack๊ณผ ๊ฐ™์ด ํ”„๋กฌํ”„ํŠธ์™€ ๋ฌธ๋ฒ•์„ ๋ถ„๋ฆฌํ•˜์—ฌ ๊ณต๊ฒฉํ•˜๋Š” ๋ฐฉ์‹์€ LLM์˜ "์˜๋ฏธ๋ก ์  ๊ฐ„๊ทน(semantic gap)"์„ ์•…์šฉํ•˜๋ฉฐ, ๊ธฐ์กด ๋ฐฉ์–ด ๊ธฐ๋ฒ•์˜ ํ•œ๊ณ„๋ฅผ ๋“œ๋Ÿฌ๋ƒ…๋‹ˆ๋‹ค.
โ€ข
๋ณธ ์—ฐ๊ตฌ๋Š” ๋ฐ์ดํ„ฐ๋ฉด๊ณผ ์ œ์–ด๋ฉด์„ ์•„์šฐ๋ฅด๋Š” ํ†ตํ•ฉ์ ์ธ ๋ฐฉ์–ด ์ „๋žต์˜ ํ•„์š”์„ฑ์„ ๊ฐ•์กฐํ•˜๋ฉฐ, ํ–ฅํ›„ LLM ๋ณด์•ˆ ์—ฐ๊ตฌ์˜ ์ค‘์š”ํ•œ ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
DictAttack์˜ ๋†’์€ ๊ณต๊ฒฉ ์„ฑ๊ณต๋ฅ ์„ ๊ฐ์•ˆํ•  ๋•Œ, ํ˜„์žฌ ์กด์žฌํ•˜๋Š” ์ตœ์‹  ๋ฐฉ์–ด ๊ธฐ๋ฒ•์—๋„ ์ƒ๋‹นํ•œ ๊ฐœ์„ ์ด ํ•„์š”ํ•˜๋ฉฐ, ํ–ฅํ›„์—๋Š” ๋”์šฑ ์ •๊ตํ•˜๊ณ  ํƒ์ง€ํ•˜๊ธฐ ์–ด๋ ค์šด ๊ณต๊ฒฉ ๋ฐฉ์‹์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘