Sign In

Red-teaming the Multimodal Reasoning: Jailbreaking Vision-Language Models via Cross-modal Entanglement Attacks

Created by
  • Haebom
Category
Empty

์ €์ž

Yu Yan, Sheng Sun, Shengjia Cheng, Teli Liu, Mingfeng Li, Min Liu

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ๊ฐ€์ง„ Vision-Language Models (VLMs)์˜ ์ž ์žฌ์  ์œ ํ•ด ์ž‘์—… ์ˆ˜ํ–‰ ๊ฐ€๋Šฅ์„ฑ์— ์ฃผ๋ชฉํ•˜์—ฌ, ๊ธฐ์กด ๋ธ”๋ž™๋ฐ•์Šค ๊ณต๊ฒฉ ๋ฐฉ์‹์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๋Š” ์ƒˆ๋กœ์šด ๊ณต๊ฒฉ ๋ฐฉ๋ฒ•๋ก ์ธ CrossTALK๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. CrossTALK๋Š” ์ง€์‹ ํ™•์žฅ, ๊ต์ฐจ ๋ชจ๋‹ฌ ๋‹จ์„œ ์–ฝํž˜, ์‹œ๋‚˜๋ฆฌ์˜ค ์ค‘์ฒฉ์„ ํ†ตํ•ด VLM์˜ ์•ˆ์ „ ์ •๋ ฌ ํŒจํ„ด์„ ์šฐํšŒํ•˜์—ฌ ์œ ํ•ดํ•œ ๊ฒฐ๊ณผ๋ฌผ์„ ๋„์ถœํ•ฉ๋‹ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์ด ์ตœ์ฒจ๋‹จ ๊ณต๊ฒฉ ์„ฑ๊ณต๋ฅ ์„ ๋‹ฌ์„ฑํ–ˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
VLMs์˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ถ”๋ก  ๋Šฅ๋ ฅ์€ ๋ณต์žกํ•œ ์œ ํ•ด ์ž‘์—…์— ์•…์šฉ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด์— ๋Œ€ํ•œ ํšจ๊ณผ์ ์ธ ๋ฐฉ์–ด ๋ฐ ๋ ˆ๋“œํŒ€ ๊ณต๊ฒฉ ์ „๋žต์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
โ€ข
CrossTALK๋Š” ๊ธฐ์กด ๊ณต๊ฒฉ ๋ฐฉ์‹๋ณด๋‹ค ๋” ๋ณต์žกํ•˜๊ณ  ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ๊ต์ฐจ ๋ชจ๋‹ฌ ๊ณต๊ฒฉ์„ ํ†ตํ•ด VLM์˜ ์•ˆ์ „ ์žฅ์น˜๋ฅผ ์šฐํšŒํ•˜๋Š” ์ƒˆ๋กœ์šด ์ ‘๊ทผ ๋ฐฉ์‹์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ ๊ณต๊ฒฉ ๋ฐฉ๋ฒ•๋ก ์˜ ํšจ๊ณผ๋ฅผ ์‹ค์ฆ์ ์œผ๋กœ ์ž…์ฆํ•˜์˜€์œผ๋‚˜, VLM์˜ ์ง€์†์ ์œผ๋กœ ๋ฐœ์ „ํ•˜๋Š” ์•ˆ์ „ ์ •๋ ฌ ๋ฉ”์ปค๋‹ˆ์ฆ˜์— ๋Œ€ํ•œ ์ง€์†์ ์ธ ์—ฐ๊ตฌ ๋ฐ ๋ฐฉ์–ด ๊ธฐ๋ฒ• ๊ฐœ๋ฐœ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘