Sign In

Beyond Parameter Arithmetic: Sparse Complementary Fusion for Distribution-Aware Model Merging

Created by
  • Haebom
Category
Empty

์ €์ž

Weihong Lin, Lin Sun, Qilong Shi, Aomufei Yuan, Yuxuan Tian, Zhengyang Wang, Guangxiang Zhao, Xiangzheng Zhang, Tong Yang

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ์—ฐ๊ตฌ๋Š” ๊ธฐ์กด ๊ฐ€์ค‘์น˜ ๊ณต๊ฐ„ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ ๋ณ‘ํ•ฉ ๋ฐฉ์‹์˜ ๊ฐ„์„ญ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ํฌ์†Œํ•˜๊ณ  ๋ถ„ํฌ๋ฅผ ๊ณ ๋ คํ•œ ์—…๋ฐ์ดํŠธ๋ฅผ ํ†ตํ•ด ๊ธฐ๋Šฅ์  ๊ฐ„์„ญ์„ ์ œ์–ดํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ชจ๋ธ ๋ณ‘ํ•ฉ ํ”„๋ ˆ์ž„์›Œํฌ์ธ Sparse Complementary Fusion with reverse KL (SCF-RKL)์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. SCF-RKL์€ ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ณต๊ฐ„์˜ ์„ ํ˜•์„ฑ์„ ๊ฐ€์ •ํ•˜๋Š” ๋Œ€์‹ , ์—ญ KL ๋ฐœ์‚ฐ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ ๊ฐ„์˜ ๊ธฐ๋Šฅ์  ๋ฐœ์‚ฐ์„ ์ธก์ •ํ•˜๊ณ  ์ƒํ˜ธ ๋ณด์™„์ ์ธ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์„ ํƒ์ ์œผ๋กœ ํ†ตํ•ฉํ•˜์—ฌ ์•ˆ์ •์ ์ธ ํ‘œํ˜„์„ ๋ณด์กดํ•˜๋ฉด์„œ ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ์„ ํ†ตํ•ฉํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ๋ชจ๋ธ ๊ทœ๋ชจ์™€ ์•„ํ‚คํ…์ฒ˜, ๊ทธ๋ฆฌ๊ณ  ์ถ”๋ก  ๋ฐ ์ง€์‹œ ํŠœ๋‹ ๋ชจ๋ธ์— ๊ฑธ์ณ ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜์„ ํ†ตํ•ด SCF-RKL์ด ๊ธฐ์กด ๋ชจ๋ธ ๋ณ‘ํ•ฉ ๋ฐฉ๋ฒ•๋ก ์„ ๋Šฅ๊ฐ€ํ•˜๋Š” ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ, ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ๊ณผ ์ƒ์„ฑ ์•ˆ์ •์„ฑ์„ ์œ ์ง€ํ•จ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
๋งค๊ฐœ๋ณ€์ˆ˜ ๊ณต๊ฐ„์˜ ์„ ํ˜•์„ฑ์— ๋Œ€ํ•œ ๊ฐ€์ • ์—†์ด, ๋ชจ๋ธ ๊ฐ„์˜ ๊ธฐ๋Šฅ์  ์ฐจ์ด๋ฅผ ์ง์ ‘ ์ธก์ •ํ•˜๊ณ  ๋ณด์™„์ ์ธ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ํฌ์†Œํ•˜๊ฒŒ ํ†ตํ•ฉํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ชจ๋ธ ๋ณ‘ํ•ฉ ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๋‹ค์–‘ํ•œ ๊ทœ๋ชจ์™€ ์ข…๋ฅ˜์˜ ๋ชจ๋ธ, ๊ด‘๋ฒ”์œ„ํ•œ ๋ฒค์น˜๋งˆํฌ์—์„œ ๊ธฐ์กด ๋ฐฉ๋ฒ• ๋Œ€๋น„ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ๊ณผ ์•ˆ์ •์„ฑ์„ ๋‹ฌ์„ฑํ•˜์—ฌ ๋ชจ๋ธ ๋ณ‘ํ•ฉ ์—ฐ๊ตฌ์˜ ๋ฐœ์ „์— ๊ธฐ์—ฌํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์ด ์—ญ KL ๋ฐœ์‚ฐ์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ธฐ๋Šฅ์  ๋ฐœ์‚ฐ์„ ์ธก์ •ํ•˜๋Š” ๋งŒํผ, ๊ณ„์‚ฐ ๋ณต์žก์„ฑ ์ฆ๊ฐ€ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์œผ๋ฉฐ, ํŠน์ • ์œ ํ˜•์˜ ๋ชจ๋ธ์ด๋‚˜ ์ž‘์—…์— ๋Œ€ํ•œ ํšจ๊ณผ์„ฑ์ด ์ถ”๊ฐ€์ ์œผ๋กœ ์—ฐ๊ตฌ๋  ํ•„์š”๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘