Sign In

MaLoRA: Gated Modality LoRA for Key-Space Alignment in Multimodal LLM Fine-Tuning

Created by
  • Haebom
Category
Empty

์ €์ž

Xinhan Zheng, Huyu Wu, Xueting Wang, Duo Su, Haiyun Jiang

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๊ฑฐ๋Œ€ ์–ธ์–ด ๋ชจ๋ธ(MLLM)์ด ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ์— ํŽธํ–ฅ๋˜์–ด ์‹œ๊ฐ์  ์ •๋ณด๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์ถ”๋ก ํ•˜์ง€ ๋ชปํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ์ œ๊ธฐํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํŽธํ–ฅ์ด ๋ฐ์ดํ„ฐ ๋ถˆ๊ท ํ˜•์ด๋‚˜ ๋ช…๋ น์–ด ํŠœ๋‹ ๊ฐ™์€ ์™ธ๋ถ€ ์š”์ธ์ด ์•„๋‹Œ, ๋ชจ๋ธ ๋‚ด๋ถ€์˜ ์‹œ๊ฐ ํ‚ค ๋ฒกํ„ฐ๊ฐ€ ํ…์ŠคํŠธ ํ‚ค ๊ณต๊ฐ„๊ณผ ๋ถ„ํฌ์ ์œผ๋กœ ์–ด๊ธ‹๋‚˜๋Š”(out-of-distribution) ๊ฒƒ์—์„œ ๋น„๋กฏ๋œ๋‹ค๋Š” ๊ฐ€์„ค์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. LLaVA์™€ Qwen2.5-VL ๋ชจ๋ธ์˜ ํ‚ค ๋ฒกํ„ฐ ๋ถ„์„์„ ํ†ตํ•ด ์‹œ๊ฐ ๋ฐ ํ…์ŠคํŠธ ํ‚ค๊ฐ€ ๋ช…ํ™•ํžˆ ๊ตฌ๋ถ„๋˜๋Š” ํ•˜์œ„ ๊ณต๊ฐ„์„ ์ฐจ์ง€ํ•จ์„ ์ฆ๋ช…ํ•˜๋ฉฐ, ์ด๋Š” ํ…์ŠคํŠธ ํŽธํ–ฅ์ด ๋ชจ๋ธ ๋‚ด๋ถ€์˜ ์ฃผ์˜(attention) ํ‚ค ๊ณต๊ฐ„ ์ •๋ ฌ ๋ถˆ์ผ์น˜์—์„œ ๋น„๋กฏ๋œ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
MLLM์˜ ํ…์ŠคํŠธ ํŽธํ–ฅ์€ ์™ธ๋ถ€ ์š”์ธ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋‚ด๋ถ€์ ์ธ ์ฃผ์˜ ํ‚ค ๊ณต๊ฐ„์˜ ์ •๋ ฌ ๋ถˆ์ผ์น˜์—์„œ ๊ธฐ์ธํ•œ๋‹ค๋Š” ์ƒˆ๋กœ์šด ๊ด€์ ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์‹œ๊ฐ ํ‚ค ๋ฒกํ„ฐ์™€ ํ…์ŠคํŠธ ํ‚ค ๋ฒกํ„ฐ ๊ฐ„์˜ ๋ถ„ํฌ์  ์ฐจ์ด๊ฐ€ MLLM์˜ ์‹œ๊ฐ ์ •๋ณด ํ™œ์šฉ ๋Šฅ๋ ฅ์„ ์ €ํ•ดํ•˜๋Š” ํ•ต์‹ฌ ๋ฉ”์ปค๋‹ˆ์ฆ˜์ž„์„ ๊ทœ๋ช…ํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์ œ์‹œ๋œ MaLoRA (Gated Modality LoRA) ๊ธฐ๋ฒ•์€ ์ถ”ํ›„ MLLM์˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•œ ํšจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ•๋ก ์œผ๋กœ ๊ธฐ๋Œ€๋ฉ๋‹ˆ๋‹ค. (๋ณธ ์ดˆ๋ก์—๋Š” MaLoRA ๋ฐฉ๋ฒ•๋ก  ์ž์ฒด์— ๋Œ€ํ•œ ์ƒ์„ธ ์„ค๋ช…์€ ๋ถ€์กฑํ•˜๋‚˜, ์ œ๋ชฉ ๋ฐ ๋ฐฐ๊ฒฝ์—์„œ ์ถ”๋ก  ๊ฐ€๋Šฅ)
โ€ข
๋ณธ ์—ฐ๊ตฌ๋Š” LLaVA์™€ Qwen2.5-VL ๋‘ ๊ฐ€์ง€ ๋ชจ๋ธ์— ๋Œ€ํ•œ ๋ถ„์„ ๊ฒฐ๊ณผ๋งŒ์„ ์ œ์‹œํ•˜๋ฉฐ, ๋‹ค์–‘ํ•œ MLLM ์•„ํ‚คํ…์ฒ˜์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅ์„ฑ์€ ์ถ”๊ฐ€ ๊ฒ€์ฆ์ด ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘