Sign In

DARK: Diagonal-Anchored Repulsive Knowledge Distillation for Vision-Language Models under Extreme Compression

Created by
  • Haebom
Category
Empty

์ €์ž

Numan Saeed, Asif Hanif, Fadillah Adamsyah Maani, Hussain Alasmawi, Mohammad Yaqub

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ์ž„์ƒ ํ™˜๊ฒฝ์—์„œ ์˜จ๋””๋ฐ”์ด์Šค ๋ฐฐํฌ๋ฅผ ์œ„ํ•œ ๋น„์ „-์–ธ์–ด ๋ชจ๋ธ์˜ ๊ทน์‹ฌํ•œ ์••์ถ• ์‹œ ๋ฐœ์ƒํ•˜๋Š” ์ง€์‹ ์ฆ๋ฅ˜(KD) ์„ฑ๋Šฅ ์ €ํ•˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์ œ์•ˆ๋œ DARK(Diagonal-Anchored Repulsive Knowledge Distillation)๋Š” ํ•™์Šต ๋ชฉํ‘œ๋ฅผ ๋Œ€๊ฐํ•ญ(์ผ์น˜ํ•˜๋Š” ์ด๋ฏธ์ง€-ํ…์ŠคํŠธ ์Œ)๊ณผ ๋น„๋Œ€๊ฐํ•ญ(๋น„ํ‘œ์  ์œ ์‚ฌ์„ฑ)์œผ๋กœ ๋ถ„ํ•ดํ•˜๋Š” ๋Œ€์กฐ์  KD ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค. DARK๋Š” ๋Œ€๊ฐํ•ญ์œผ๋กœ ์ผ์น˜ ์Œ ์ •๋ ฌ์„ ์œ ์ง€ํ•˜๊ณ , ๋น„๋Œ€๊ฐํ•ญ ๊ฐ€์ค‘์น˜๋ฅผ ์ ์ง„์ ์œผ๋กœ ์กฐ์ •ํ•˜์—ฌ ๋ชจ๋ฐฉ์—์„œ ๋น„ํ‘œ์  ์œ ์‚ฌ์„ฑ ๊ตฌ์กฐ๋ฅผ '๋ฐ˜๋ฐœ'ํ•˜๋„๋ก ์œ ๋„ํ•จ์œผ๋กœ์จ ๊ทน์‹ฌํ•œ ์••์ถ• ํ™˜๊ฒฝ์—์„œ ํšจ์œจ์ ์ธ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
๊ทน์‹ฌํ•œ ๋ชจ๋ธ ์••์ถ• ์‹œ, ๊ต์‚ฌ์˜ ๋ชจ๋“  ์œ ์‚ฌ์„ฑ ๊ตฌ์กฐ๋ฅผ ๋ชจ๋ฐฉํ•˜๋Š” ๋Œ€์‹  ํŠน์ • ๊ตฌ์กฐ(์˜ˆ: ๋น„ํ‘œ์  ์œ ์‚ฌ์„ฑ)๋ฅผ ๋ฐ˜๋ฐœํ•˜๋„๋ก ์œ ๋„ํ•˜๋Š” ๊ฒƒ์ด ํšจ์œจ์ ์ธ ์ง€์‹ ์ฆ๋ฅ˜ ๋ฐฉ๋ฒ•์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
DARK๋Š” ๋Œ€์กฐ์  ํ•™์Šต ๋ฐฉ์‹์„ ํ†ตํ•ด ์ด๋ฏธ์ง€-ํ…์ŠคํŠธ ์Œ์˜ ์ผ๊ด€์„ฑ์„ ์œ ์ง€ํ•˜๋ฉด์„œ๋„, ๊ต์‚ฌ์˜ ๋ถˆํ•„์š”ํ•˜๊ฑฐ๋‚˜ ํ˜ผ๋™์„ ์•ผ๊ธฐํ•˜๋Š” ์œ ์‚ฌ์„ฑ ๊ตฌ์กฐ๋ฅผ ์ œ๊ฑฐํ•˜์—ฌ ํ•™์ƒ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ DARK ๋ฐฉ๋ฒ•๋ก ์„ ํ†ตํ•ด FetalCLIP ๋ชจ๋ธ์„ 26๋ฐฐ ์ž‘์€ ์‹œ๊ฐ ์ธ์ฝ”๋”๋ฅผ ๊ฐ€์ง„ MobileFetalCLIP์œผ๋กœ ์„ฑ๊ณต์ ์œผ๋กœ ์••์ถ•ํ•˜์˜€์œผ๋ฉฐ, ์ œ๋กœ์ƒท ์„ฑ๋Šฅ์—์„œ ๊ต์‚ฌ ๋ชจ๋ธ์„ ๋Šฅ๊ฐ€ํ•˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ ์ž„์ƒ ํ™˜๊ฒฝ์—์„œ์˜ ํ™œ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
๋ณธ ์—ฐ๊ตฌ๋Š” ์ฃผ๋กœ ๋น„์ „-์–ธ์–ด ๋ชจ๋ธ์˜ ๊ทน์‹ฌํ•œ ์••์ถ•์— ์ดˆ์ ์„ ๋งž์ถ”์—ˆ์œผ๋ฉฐ, ๋‹ค๋ฅธ ์ข…๋ฅ˜์˜ ๋ชจ๋ธ์ด๋‚˜ ์••์ถ• ์ˆ˜์ค€์—์„œ์˜ DARK์˜ ํšจ๊ณผ์— ๋Œ€ํ•œ ์ถ”๊ฐ€์ ์ธ ๊ฒ€์ฆ์ด ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘