Sign In

Selective Fine-Tuning for Targeted and Robust Concept Unlearning

Created by
  • Haebom
Category
Empty

์ €์ž

Mansi, Avinash Kori, Francesca Toni, Soteris Demetriou

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ํ…์ŠคํŠธ ์•ˆ๋‚ด ํ™•์‚ฐ ๋ชจ๋ธ์—์„œ ์œ ํ•ด ์ฝ˜ํ…์ธ  ์ƒ์„ฑ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ ๊ฐœ๋… ์ œ๊ฑฐ(concept unlearning) ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ ๊ฐœ๋ณ„ ๊ฐœ๋… ์ œ๊ฑฐ ๋ฐฉ์‹์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ณ , ๊ณ„์‚ฐ ๋น„์šฉ์ด ๋งŽ์ด ๋“œ๋Š” ์ „์ฒด ๋ฏธ์„ธ ์กฐ์ • ๋Œ€์‹  ํšจ์œจ์ ์ธ ์„ ํƒ์  ๋ฏธ์„ธ ์กฐ์ •์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด Hessian ๊ธฐ๋ฐ˜ ์ •๊ทœํ™”๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋Œ€์ƒ ๊ฐœ๋… ๋‰ด๋Ÿฐ์„ ๋™์ ์œผ๋กœ ์ถ”์ •ํ•˜๊ณ  ์ œ๊ฑฐํ•˜๋Š” TRUST(Targeted Robust Selective Fine-Tuning) ๋ฐฉ๋ฒ•๋ก ์„ ๊ฐœ๋ฐœํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
๊ฐœ๋… ์กฐํ•ฉ ๋ฐ ์กฐ๊ฑด๋ถ€ ๊ฐœ๋… ์ œ๊ฑฐ ๊ฐ€๋Šฅ์„ฑ: TRUST๋Š” ๋‹จ์ผ ๊ฐœ๋…๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋ณตํ•ฉ์ ์ธ ๊ฐœ๋… ๋ฐ ์กฐ๊ฑด๋ถ€ ๊ฐœ๋…๊นŒ์ง€ ํšจ๊ณผ์ ์œผ๋กœ ์ œ๊ฑฐํ•  ์ˆ˜ ์žˆ์Œ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
ํšจ์œจ์„ฑ๊ณผ ๊ฐ•๊ฑด์„ฑ: ๊ธฐ์กด ์ตœ์‹  ๊ธฐ๋ฒ• ๋Œ€๋น„ ํ˜„์ €ํžˆ ๋น ๋ฅธ ์†๋„๋ฅผ ๋ณด์ด๋ฉด์„œ๋„, ์ ๋Œ€์ ์ธ ํ”„๋กฌํ”„ํŠธ์— ๋Œ€ํ•œ ๊ฐ•๊ฑด์„ฑ์„ ์œ ์ง€ํ•˜๊ณ  ์ƒ์„ฑ ํ’ˆ์งˆ ์ €ํ•˜๋ฅผ ์ตœ์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๋™์  ๋‰ด๋Ÿฐ ์ถ”์ •์˜ ์ค‘์š”์„ฑ: ๊ณ ์ •๋œ ๋ฐฉ์‹์ด ์•„๋‹Œ ๋™์ ์œผ๋กœ ๋Œ€์ƒ ๊ฐœ๋… ๋‰ด๋Ÿฐ์„ ์ถ”์ •ํ•จ์œผ๋กœ์จ ์ตœ์ ํ™”๋œ ์ œ๊ฑฐ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
์ผ๋ฐ˜ํ™” ๋ฐ ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ: TRUST์˜ ์„ฑ๋Šฅ์„ ๋‹ค์–‘ํ•œ ์œ ํ•ด ์ฝ˜ํ…์ธ  ์‹œ๋‚˜๋ฆฌ์˜ค์— ๋Œ€ํ•ด ์ถ”๊ฐ€์ ์œผ๋กœ ๊ฒ€์ฆํ•˜๊ณ , ๋‰ด๋Ÿฐ ์ถ”์ • ๋ฐฉ์‹์˜ ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ์„ ๋†’์ด๋Š” ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘