Sign In

Where Pretraining writes and Alignment reads: the asymmetry of Transformer weight space

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Valeria Ruscio, Eli-Shaoul Khedouri, Keiran Thompson

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜ ๊ณต๊ฐ„์—์„œ ์‚ฌ์ „ ํ•™์Šต(pretraining)๊ณผ ์„ ํ˜ธ๋„ ์ •๋ ฌ(preference alignment)์ด ์„œ๋กœ ๋‹ค๋ฅธ ๋ฐฉ์‹์œผ๋กœ ์ž‘์šฉํ•˜๋Š” ๋น„๋Œ€์นญ์„ฑ์„ ํƒ๊ตฌํ•ฉ๋‹ˆ๋‹ค. ์—ฐ๊ตฌ์ง„์€ ๊ฐ€์ค‘์น˜ ๋ณ€ํ™”๋Ÿ‰๊ณผ ์ž”์ฐจ ์ŠคํŠธ๋ฆผ ํ™œ์„ฑํ™” ๋ถ€๋ถ„ ๊ณต๊ฐ„ ๋ฐ ์˜ˆ์ธก ๋ถ€๋ถ„ ๊ณต๊ฐ„ ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋ถ„์„ํ•˜๋Š” ์ƒˆ๋กœ์šด ํ”„๋กœ๋ธŒ๋ฅผ ๊ฐœ๋ฐœํ•˜์—ฌ, ์ •๋ ฌ ์—…๋ฐ์ดํŠธ๊ฐ€ ์ฃผ๋กœ '์ฝ๊ธฐ ๊ฒฝ๋กœ'($W_Q, W_K$)์— ์ง‘์ค‘๋˜๋Š” ๋ฐ˜๋ฉด '์“ฐ๊ธฐ ๊ฒฝ๋กœ'($W_O, W_2$)๋Š” ์ƒ๋Œ€์ ์œผ๋กœ ์˜ˆ์ธก ๋ถ€๋ถ„ ๊ณต๊ฐ„์— ๋Œ€ํ•ด ๋“ฑ๋ฐฉ์„ฑ(isotropic)์„ ์œ ์ง€ํ•จ์„ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๊ฐ€์ค‘์น˜ ์—…๋ฐ์ดํŠธ์˜ ์™ธ์ (outer product) ๊ตฌ์กฐ์™€ ๊ฐ ๊ฒฝ๋กœ์— ์ž‘์šฉํ•˜๋Š” ๊ธฐ์šธ๊ธฐ(gradient)์˜ ํŠน์„ฑ ์ฐจ์ด์—์„œ ๊ธฐ์ธํ•˜๋ฉฐ, ํŠนํžˆ ์‚ฌ์ „ ํ•™์Šต ๋‹จ๊ณ„์—์„œ ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์†์‹ค์ด ์“ฐ๊ธฐ ๊ฒฝ๋กœ์˜ ์˜ˆ์ธก ๊ธฐํ•˜ํ•™์„ ํ˜•์„ฑํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•จ์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
์‚ฌ์ „ ํ•™์Šต์€ ๋ชจ๋ธ์˜ ๊ธฐ๋ณธ์ ์ธ ํ‘œํ˜„ ๋Šฅ๋ ฅ์„ ํ˜•์„ฑํ•˜๋Š” ๋ฐ ๊ธฐ์—ฌํ•˜๋ฉฐ, ํŠนํžˆ ์“ฐ๊ธฐ ๊ฒฝ๋กœ์˜ ๊ตฌ์กฐ์— ์˜ํ–ฅ์„ ๋ฏธ์นฉ๋‹ˆ๋‹ค.
โ€ข
์„ ํ˜ธ๋„ ์ •๋ ฌ์€ ์ฃผ๋กœ ์ฝ๊ธฐ ๊ฒฝ๋กœ๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์˜ ์˜์‚ฌ ๊ฒฐ์ • ๋ฐฉ์‹์„ ๋ฏธ์„ธ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์ด๋Ÿฌํ•œ ๊ฐ€์ค‘์น˜ ๊ณต๊ฐ„์˜ ๋น„๋Œ€์นญ์„ฑ์„ ์ดํ•ดํ•˜๋Š” ๊ฒƒ์€ ๋ชจ๋ธ์˜ ๋™์ž‘์„ ํ•ด์„ํ•˜๊ณ , ๋” ํšจ์œจ์ ์ธ ํŠœ๋‹ ๋ฐฉ๋ฒ•์„ ๊ฐœ๋ฐœํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•œ ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๋ณธ ์—ฐ๊ตฌ๋Š” ํŠธ๋žœ์Šคํฌ๋จธ์˜ ํŠน์ • ๋ ˆ์ด์–ด์— ์ดˆ์ ์„ ๋งž์ถ”๊ณ  ์žˆ์œผ๋ฉฐ, ๋ชจ๋ธ ์ „์ฒด์— ๊ฑธ์นœ ์ด๋Ÿฌํ•œ ๋น„๋Œ€์นญ์„ฑ์˜ ๋ณดํŽธ์„ฑ ๋ฐ ๋‹ค๋ฅธ ์•„ํ‚คํ…์ฒ˜์—์„œ์˜ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘