Sign In

Universal Adversarial Attacks against Closed-Source MLLMs via Target-View Routed Meta Optimization

Created by
  • Haebom
Category
Empty

์ €์ž

Hui Lu, Yi Yu, Yiming Yang, Chenyu Yi, Xueyi Ke, Qixing Zhang, Bingquan Shen, Alex Kot, Xudong Jiang

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ์ƒ์šฉ์œผ๋กœ ๊ณต๊ฐœ๋˜์ง€ ์•Š์€ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ(MLLM)์„ ๋Œ€์ƒ์œผ๋กœ, ๋‹จ ํ•˜๋‚˜์˜ ์ ๋Œ€์  ๊ณต๊ฒฉ์ด ๋‹ค์–‘ํ•œ ์ž…๋ ฅ์— ๋Œ€ํ•ด ํŠน์ • ๋ชฉํ‘œ๋ฅผ ๋‹ฌ์„ฑํ•˜๋„๋ก ํ•˜๋Š” ๋ฒ”์šฉ ๋ชฉํ‘œ ๊ธฐ๋ฐ˜ ๊ณต๊ฒฉ(UTTAA) ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด ๊ณต๊ฒฉ ๋ฐฉ์‹์€ ์ž…๋ ฅ๋งˆ๋‹ค ๊ฐœ๋ณ„์ ์œผ๋กœ ์ƒ์„ฑ๋˜์–ด ์žฌ์‚ฌ์šฉ์„ฑ์ด ๋‚ฎ๋‹ค๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ์—ˆ์œผ๋‚˜, ์ œ์•ˆ๋œ MCRMO-Attack์€ ๋‹ค์ค‘ ํฌ๋กญ ์˜์ƒ๊ณผ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ™œ์šฉํ•˜์—ฌ ํ•™์Šต์˜ ๋ถˆ์•ˆ์ •์„ฑ์„ ์ค„์ด๊ณ , ํ† ํฐ ๋ผ์šฐํŒ… ๊ธฐ๋ฒ•์œผ๋กœ ์˜์ƒ ํŠน์ง•๊ณผ ํ…์ŠคํŠธ ๊ฐ„์˜ ์—ฐ๊ด€์„ฑ์„ ๊ฐ•ํ™”ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด GPT-4o์™€ Gemini-2.0๊ณผ ๊ฐ™์€ ์ƒ์šฉ MLLM์— ๋Œ€ํ•ด ์ด์ „๋ณด๋‹ค ํ›จ์”ฌ ๋†’์€ ๊ณต๊ฒฉ ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
** ๋ฒ”์šฉ ๊ณต๊ฒฉ์˜ ๊ฐ€๋Šฅ์„ฑ ์ œ์‹œ**: ์ž…๋ ฅ์— ๋…๋ฆฝ์ ์œผ๋กœ ์ ์šฉ ๊ฐ€๋Šฅํ•œ ๋‹จ์ผ ์ ๋Œ€์  ๊ณต๊ฒฉ ๋ฐฉ๋ฒ•๋ก ์„ ํ†ตํ•ด MLLM ๋ณด์•ˆ์˜ ์ƒˆ๋กœ์šด ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
โ€ข
** ์ƒ์šฉ MLLM ๋ณด์•ˆ ์ทจ์•ฝ์  ์‹œ์‚ฌ**: ๊ณต๊ฐœ๋˜์ง€ ์•Š์€ ์ƒ์šฉ MLLM ๋˜ํ•œ ๋ฒ”์šฉ์ ์ธ ๊ณต๊ฒฉ์— ์ทจ์•ฝํ•  ์ˆ˜ ์žˆ์Œ์„ ์‹œ์‚ฌํ•˜๋ฉฐ, ์ด์— ๋Œ€ํ•œ ๋ฐฉ์–ด ์ „๋žต ๋งˆ๋ จ์˜ ํ•„์š”์„ฑ์„ ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค.
โ€ข
** ๊ณต๊ฒฉ ์„ฑ๊ณต๋ฅ  ์ฆ๋Œ€**: ์ œ์•ˆ๋œ MCRMO-Attack์€ ๊ธฐ์กด ๋ฒ”์šฉ ๊ณต๊ฒฉ ๋ฐฉ์‹ ๋Œ€๋น„ ์ƒ๋‹นํ•œ ๊ณต๊ฒฉ ์„ฑ๊ณต๋ฅ  ํ–ฅ์ƒ์„ ๋ณด์—ฌ์ฃผ์–ด, MLLM ๋ณด์•ˆ ์—ฐ๊ตฌ์— ์ค‘์š”ํ•œ ์ง„์ „์„ ์ด๋ฃจ์—ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
** ํ•œ๊ณ„์ **: ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์ด ํŠน์ • ์œ ํ˜•์˜ MLLM์—๋งŒ ํšจ๊ณผ์ ์ด๊ฑฐ๋‚˜, ๊ณต๊ฒฉ์ด ์ƒ์„ฑ๋˜๋Š” ๋ฐ ์ƒ๋‹นํ•œ ๊ณ„์‚ฐ ์ž์›์ด ์†Œ์š”๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ์˜ ๊ณต๊ฒฉ ํšจ๊ณผ ๋ฐ ๋ฐฉ์–ด ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์ถ”๊ฐ€์ ์ธ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘