Sign In

Weight space Detection of Backdoors in LoRA Adapters

Created by
  • Haebom
Category
Empty

์ €์ž

David Puertolas Merenciano, Ekaterina Vasyagina, Raghav Dixit, Kevin Zhu, Ruizhe Li, Javier Ferrando, Maheep Chaudhary

๐Ÿ’ก ๊ฐœ์š”

์ด ์—ฐ๊ตฌ๋Š” ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์„ ํšจ์œจ์ ์œผ๋กœ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” LoRA ์–ด๋Œ‘ํ„ฐ์˜ ๋ฐฑ๋„์–ด ๊ณต๊ฒฉ ์ทจ์•ฝ์„ฑ์„ ํƒ๊ตฌํ•ฉ๋‹ˆ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ๋ชจ๋ธ์„ ์‹คํ–‰ํ•˜์ง€ ์•Š๊ณ  ์–ด๋Œ‘ํ„ฐ์˜ ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ ์ž์ฒด๋ฅผ ๋ถ„์„ํ•˜์—ฌ ๋ฐฑ๋„์–ด๋ฅผ ํƒ์ง€ํ•˜๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ์— ๋…๋ฆฝ์ ์ธ ์ ‘๊ทผ ๋ฐฉ์‹์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. singular ๊ฐ’์˜ ์ง‘์ค‘๋„, ์—”ํŠธ๋กœํ”ผ, ๋ถ„ํฌ ํ˜•ํƒœ์™€ ๊ฐ™์€ ๋‹จ์ˆœํ•œ ํ†ต๊ณ„๋ฅผ ์ถ”์ถœํ•˜์—ฌ ์ •์ƒ ํŒจํ„ด์—์„œ ๋ฒ—์–ด๋‚˜๋Š” ์–ด๋Œ‘ํ„ฐ๋ฅผ ์‹๋ณ„ํ•จ์œผ๋กœ์จ ๋†’์€ ํƒ์ง€ ์ •ํ™•๋„์™€ ๋‚ฎ์€ ์˜คํƒ์œจ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
LoRA ์–ด๋Œ‘ํ„ฐ์˜ ๋ฐฑ๋„์–ด ๊ณต๊ฒฉ ์œ„ํ—˜์„ฑ์„ ํšจ๊ณผ์ ์œผ๋กœ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ๋ถˆ๊ฐ€์ง€๋ก ์  ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๋ชจ๋ธ ์‹คํ–‰ ์—†์ด ๊ฐ€์ค‘์น˜ ๊ณต๊ฐ„ ๋ถ„์„์„ ํ†ตํ•ด ๋Œ€๊ทœ๋ชจ ์–ด๋Œ‘ํ„ฐ ์Šคํฌ๋ฆฌ๋‹์˜ ์‹ค์šฉ์„ฑ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์˜ ํƒ์ง€ ์„ฑ๋Šฅ์„ ๋‹ค์–‘ํ•œ Llama-3.2-3B ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ์…‹์—์„œ ๊ฒ€์ฆํ•˜์—ฌ ๋†’์€ ์ •ํ™•๋„๋ฅผ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹์˜ ๋ถ„ํฌ๋‚˜ ํŠน์ • ๋ฐฑ๋„์–ด ํŠธ๋ฆฌ๊ฑฐ์— ๋Œ€ํ•œ ์‚ฌ์ „ ์ง€์‹ ์—†์ด๋„ ์ž‘๋™ํ•˜์ง€๋งŒ, ์ž ์žฌ์ ์œผ๋กœ ์ƒˆ๋กœ์šด ์œ ํ˜•์˜ ๋ฐฑ๋„์–ด ๊ณต๊ฒฉ์— ๋Œ€ํ•œ ํƒ์ง€ ๋Šฅ๋ ฅ์€ ์ถ”๊ฐ€์ ์ธ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘