Sign In

VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment

Created by
  • Haebom
Category
Empty

์ €์ž

Jiawei Chen, Tianzhuo Yang, Guoxi Zhang, Jiaming Ji, Yaodong Yang, Juntao Dai

๐Ÿ’ก ๊ฐœ์š”

๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์„ ๋ฏธ๋ฌ˜ํ•œ ์ธ๊ฐ„์  ๊ฐ€์น˜์— ๋งž์ถ”๋Š” ๊ฒƒ์€ ์–ด๋ ค์šด ๊ณผ์ œ์ด๋ฉฐ, ๊ธฐ์กด RLHF ๋ฐฉ์‹์€ ์„ธ๋ฐ€ํ•œ ์†์„ฑ์„ ๋‹ค๋ฃจ๋Š” ๋ฐ ํ•œ๊ณ„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ 'VISA'๋ผ๋Š” ์ƒˆ๋กœ์šด ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•˜์—ฌ, ๊ฐ€์น˜ ์ •๋ ฌ ์‹œ ๋ฐœ์ƒํ•˜๋Š” '์ •๋ ฌ ์„ธ๊ธˆ' ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค. VISA๋Š” ์ •๋ฐ€ํ•œ ๊ฐ€์น˜ ํƒ์ง€, ์˜๋ฏธ-๊ฐ€์น˜ ๋ฒˆ์—ญ, ํ•ต์‹ฌ ๊ฐ€์น˜ ์žฌ์ž‘์„ฑ ๋ชจ๋“ˆ์„ ํ†ตํ•ด, ๋ฏธ์„ธ ์กฐ์ • ๊ณผ์ •์—์„œ ๋ฐœ์ƒํ•˜๋Š” ํŽธํ–ฅ ํก์ˆ˜, ํ™˜๊ฐ, ์ •๋ณด ์†์‹ค์„ ์™„ํ™”ํ•˜๋ฉด์„œ๋„ ์›๋ž˜ ์ง€์‹์„ ์œ ์ง€ํ•˜๋Š” ๋ฐ ์ดˆ์ ์„ ๋งž์ถฅ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
LLM์ด ๋ฏธ์„ธํ•œ ์ธ๊ฐ„์  ๊ฐ€์น˜๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šตํ•˜๊ณ  ํ‘œํ˜„ํ•˜๋„๋ก ์ •๋ฐ€ํ•˜๊ฒŒ ์ œ์–ดํ•  ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๊ธฐ์กด ๋ฏธ์„ธ ์กฐ์ • ๋ฐฉ์‹์ด๋‚˜ ํ”„๋กฌํ”„ํŒ… ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๋ก ๋ณด๋‹ค ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ, LLM์˜ ๊ฐ€์น˜ ์ •๋ ฌ๊ณผ ์‚ฌ์‹ค์  ์ผ๊ด€์„ฑ์„ ๋™์‹œ์— ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ์Œ์„ ์ž…์ฆํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์ด ๊ธฐ์กด LLM์˜ ์ผ๋ฐ˜์ ์ธ ๋Šฅ๋ ฅ๊ณผ ์‚ฌ์‹ค์  ์ผ๊ด€์„ฑ์„ ์œ ์ง€ํ•˜๋ฉด์„œ๋„ ๊ฐ€์น˜ ์ •๋ ฌ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” '์ •๋ ฌ ์„ธ๊ธˆ' ๋ฌธ์ œ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์™„ํ™”ํ•ฉ๋‹ˆ๋‹ค.
โ€ข
GRPO ํ•™์Šต ๊ณผ์ • ๋ฐ ๋ณตํ•ฉ ๋ณด์ƒ ํ•จ์ˆ˜์˜ ์„ค๊ณ„, ๊ทธ๋ฆฌ๊ณ  ๋‹ค์–‘ํ•œ ๊ฐ€์น˜ ํƒ์ง€๊ธฐ์˜ ์ •ํ™•๋„ ๋ฐ ํšจ์œจ์„ฑ์— ๋Œ€ํ•œ ์ถ”๊ฐ€์ ์ธ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘