Sign In

EmergentBridge: Improving Zero-Shot Cross-Modal Transfer in Unified Multimodal Embedding Models

Created by
  • Haebom
Category
Empty

์ €์ž

Jincheng Xie, Xingchen Xiao, Heyan Huang, Zhongyi Huang, Yu Zheng, Runheng Liu

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ์ด๋ฏธ์ง€-ํ…์ŠคํŠธ์™€ ๊ฐ™์ด ์ผ๋ถ€ ํŽ˜์–ด๋ง๋œ ๋ฐ์ดํ„ฐ๋งŒ์œผ๋กœ ํ•™์Šต๋œ ํ†ตํ•ฉ ๋‹ค์ค‘ ๋ชจ๋‹ฌ ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ์—์„œ, ํŽ˜์–ด๋ง๋˜์ง€ ์•Š์€ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ ์Œ(์˜ˆ: ์˜ค๋””์˜ค-๊นŠ์ด) ๊ฐ„์˜ ์ œ๋กœ์ƒท ์ „์ด ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” EmergentBridge ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. EmergentBridge๋Š” ์ƒˆ๋กœ์šด ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ๋ฅผ ๊ธฐ์กด ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ์˜ ํ”„๋ก์‹œ ์ž„๋ฒ ๋”ฉ์— ์ง์ ‘ ์—ฐ๊ฒฐํ•  ๋•Œ ๋ฐœ์ƒํ•˜๋Š” ๊ทธ๋ž˜๋””์–ธํŠธ ๊ฐ„์„ญ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š” ๋ธŒ๋ฆฌ์ง€ ์•ต์ปค๋ฅผ ํ•™์Šตํ•˜๊ณ  ์•ต์ปค ์ •๋ ฌ ๋ฐฉํ–ฅ์— ์ง๊ตํ•˜๋Š” ๋ถ€๋ถ„ ๊ณต๊ฐ„์—์„œ๋งŒ ํ”„๋ก์‹œ ์ •๋ ฌ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ํŽ˜์–ด๋ง๋˜์ง€ ์•Š์€ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ ๊ฐ„์˜ ์—ฐ๊ฒฐ์„ฑ์„ ๊ฐ•ํ™”ํ•˜๋ฉด์„œ๋„ ๊ธฐ์กด์˜ ์•ต์ปค ์ •๋ ฌ ๊ตฌ์กฐ๋ฅผ ์œ ์ง€ํ•˜์—ฌ ์ œ๋กœ์ƒท ๋ถ„๋ฅ˜ ๋ฐ ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
๊ฐ•๋ ฅํ•œ ์ œ๋กœ์ƒท ํฌ๋กœ์Šค๋ชจ๋‹ฌ ์ „์ด ๋Šฅ๋ ฅ: ์ œํ•œ๋œ ํŽ˜์–ด๋ง ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง„ ๋‹ค์ค‘ ๋ชจ๋‹ฌ ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ์—์„œ๋„ ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šต๋˜์ง€ ์•Š์€ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ ์Œ ๊ฐ„์˜ ์ œ๋กœ์ƒท ์ „์ด ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
์‹ค์šฉ์ ์ธ ๋ฐ์ดํ„ฐ ์ œ์•ฝ ์กฐ๊ฑด ๊ทน๋ณต: ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ํฌ์†Œํ•œ ํŽ˜์–ด๋ง ๋ฐ์ดํ„ฐ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์—ฌ, ๋ณด๋‹ค ์ ์€ ๋ฐ์ดํ„ฐ๋กœ๋„ ๊ด‘๋ฒ”์œ„ํ•œ ๋‹ค์ค‘ ๋ชจ๋‹ฌ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์„ ๊ตฌ์ถ•ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
๊ทธ๋ž˜๋””์–ธํŠธ ๊ฐ„์„ญ ๋ฌธ์ œ ํ•ด๊ฒฐ: ์ƒˆ๋กœ์šด ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ ํ†ตํ•ฉ ์‹œ ๋ฐœ์ƒํ•˜๋Š” ํ•™์Šต ๋ถˆ์•ˆ์ •์„ฑ ๋ฌธ์ œ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๊ด€๋ฆฌํ•˜์—ฌ ํ†ตํ•ฉ ๋ชจ๋ธ์˜ ์ „๋ฐ˜์ ์ธ ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ๋ฐฉ์ง€ํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๋ชจ๋ธ ๋ณต์žก์„ฑ ๋ฐ ํ•™์Šต ์‹œ๊ฐ„ ์ฆ๊ฐ€ ๊ฐ€๋Šฅ์„ฑ: ์ œ์•ˆ๋œ ๋ธŒ๋ฆฌ์ง€ ํ•™์Šต ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด ๊ธฐ์กด ๋ฐฉ์‹๋ณด๋‹ค ๋ชจ๋ธ์˜ ๋ณต์žก์„ฑ์„ ์ฆ๊ฐ€์‹œํ‚ค๊ฑฐ๋‚˜ ํ•™์Šต ์‹œ๊ฐ„์„ ๋Š˜๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘