Sign In

Omni-Safety under Cross-Modality Conflict: Vulnerabilities, Dynamics Mechanisms and Efficient Alignment

Created by
  • Haebom
Category
Empty

์ €์ž

Kun Wang, Zherui Li, Zhenhong Zhou, Yitong Zhang, Yan Mi, Kun Yang, Yiming Zhang, Junhao Dong, Zhongxiang Sun, Qiankun Li, Yang Liu

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ์—ฐ๊ตฌ๋Š” ๋‹ค์–‘ํ•œ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ๋ฅผ ํ†ตํ•ฉํ•˜๋Š” ์˜ด๋‹ˆ๋ชจ๋‹ฌ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(OLLM)์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๊ต์ฐจ ๋ชจ๋‹ฌ ์•ˆ์ „์„ฑ ๋ฌธ์ œ๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ-์˜๋ฏธ๋ก  ๋ถ„๋ฆฌ ์›์น™๊ณผ AdvBench-Omni ๋ฐ์ดํ„ฐ์…‹์„ ํ†ตํ•ด OLLM์˜ ์ทจ์•ฝ์ ์„ ๋ฐํžˆ๊ณ , ์ค‘๊ฐ„์ธต ์šฉํ•ด ํ˜„์ƒ๊ณผ ์ˆœ์ˆ˜ ๊ฑฐ๋ถ€ ๋ฐฉํ–ฅ์„ ๊ทœ๋ช…ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ œ์•ˆ๋œ OmniSteer ๋ฐฉ๋ฒ•๋ก ์€ ๊ฐ€๋ฒผ์šด ์–ด๋Œ‘ํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐœ์ž… ๊ฐ•๋„๋ฅผ ์กฐ์ ˆํ•จ์œผ๋กœ์จ ์œ ํ•ดํ•œ ์ž…๋ ฅ์— ๋Œ€ํ•œ ๊ฑฐ๋ถ€์œจ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ค๋ฉด์„œ๋„ ์ „๋ฐ˜์ ์ธ ๋ชจ๋ธ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
OLLM์€ ๊ต์ฐจ ๋ชจ๋‹ฌ ์ƒํ˜ธ์ž‘์šฉ์—์„œ ์ƒˆ๋กœ์šด ์•ˆ์ „์„ฑ ์œ„ํ—˜์„ ์•ผ๊ธฐํ•˜๋ฉฐ, ์ด์— ๋Œ€ํ•œ ์ฒด๊ณ„์ ์ธ ์ดํ•ด๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์ค‘๊ฐ„์ธต ์šฉํ•ด ํ˜„์ƒ๊ณผ ์ˆœ์ˆ˜ ๊ฑฐ๋ถ€ ๋ฐฉํ–ฅ์˜ ๋ฐœ๊ฒฌ์€ OLLM์˜ ์•ˆ์ „์„ฑ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์ดํ•ดํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•œ ํ†ต์ฐฐ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
โ€ข
OmniSteer๋Š” ํšจ๊ณผ์ ์œผ๋กœ OLLM์˜ ์•ˆ์ „์„ฑ์„ ๊ฐ•ํ™”ํ•˜๋ฉด์„œ๋„ ๋‹ค์ค‘ ๋ชจ๋‹ฌ ๋Šฅ๋ ฅ์„ ๋ณด์กดํ•˜๋Š” ์‹ค์šฉ์ ์ธ ํ•ด๊ฒฐ์ฑ…์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์ œ์‹œ๋œ AdvBench-Omni ๋ฐ์ดํ„ฐ์…‹์˜ ๋ฒ”์œ„์™€ ์ƒˆ๋กœ์šด ๊ต์ฐจ ๋ชจ๋‹ฌ ๊ณต๊ฒฉ ๊ธฐ๋ฒ•์— ๋Œ€ํ•œ ์ง€์†์ ์ธ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘