Sign In

LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding

Created by
  • Haebom
Category
Empty

์ €์ž

Boyuan Sun, Jiaxing Zhao, Xiang Chen, Xihan Wei, Qibin Hou

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ์‚ฌ์šฉ์ž์˜ ์ง€์‹œ์— ๋”ฐ๋ผ ์‹œ๊ฐ์  ํŠน์ง• ์ถ”์ถœ๊ธฐ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ๋™์ ์œผ๋กœ ์กฐ์ ˆํ•˜๋Š” ์ƒˆ๋กœ์šด ๋น„๋””์˜ค ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ์ธ LLaVA-Octopus๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ์‹œ๊ฐ์  ํŠน์ง• ์ถ”์ถœ๊ธฐ๊ฐ€ ๊ฐ๊ธฐ ๋‹ค๋ฅธ ํƒœ์Šคํฌ์— ๊ฐ•์ ์„ ๋ณด์ด๋ฏ€๋กœ, ์ด๋ฅผ ์ง€์‹œ ๊ธฐ๋ฐ˜์œผ๋กœ ์œตํ•ฉํ•˜์—ฌ ๋ชจ๋ธ ์„ฑ๋Šฅ์„ ๊ทน๋Œ€ํ™”ํ•ฉ๋‹ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, LLaVA-Octopus๋Š” ๋น„๋””์˜ค ์งˆ์˜์‘๋‹ต, ๊ธด ๋น„๋””์˜ค ์ดํ•ด ๋“ฑ ๋‹ค์–‘ํ•œ ๋ฒค์น˜๋งˆํฌ์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
์‚ฌ์šฉ์ž์˜ ์ง€์‹œ์— ๋”ฐ๋ผ ์‹œ๊ฐ์  ํŠน์ง• ์ถ”์ถœ๊ธฐ์˜ ์œตํ•ฉ ๋ฐฉ์‹์„ ์ ์‘์ ์œผ๋กœ ์กฐ์ ˆํ•จ์œผ๋กœ์จ ๋น„๋””์˜ค ์ดํ•ด ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
๋‹ค์–‘ํ•œ ์‹œ๊ฐ์  ํŠน์ง• ์ถ”์ถœ๊ธฐ์˜ ์ƒํ˜ธ ๋ณด์™„์ ์ธ ๊ฐ•์ ์„ ํšจ๊ณผ์ ์œผ๋กœ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
๋™์ ์ธ ๊ฐ€์ค‘์น˜ ์กฐ์ ˆ ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด ๋ชจ๋“  ์œ ํ˜•์˜ ๋น„๋””์˜ค ํƒœ์Šคํฌ์— ์ตœ์ ํ™”๋˜์—ˆ๋Š”์ง€, ๊ทธ๋ฆฌ๊ณ  ๊ณ„์‚ฐ ๋ณต์žก์„ฑ์— ๋Œ€ํ•œ ์ถ”๊ฐ€์ ์ธ ๋ถ„์„์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘