Sign In

Multimodal Information Fusion for Chart Understanding: A Survey of MLLMs -- Evolution, Limitations, and Cognitive Enhancement

Created by
  • Haebom
Category
Empty

์ €์ž

Zhihang Yi, Jian Zhao, Jiancheng Lv, Tao Wang

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ์ฐจํŠธ ์ดํ•ด๋ผ๋Š” ๊ณ ์œ ํ•œ ์ •๋ณด ์œตํ•ฉ ๊ณผ์ œ์—์„œ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(MLLM)์˜ ์—ญํ• ์„ ํƒ๊ตฌํ•˜๋Š” ํฌ๊ด„์ ์ธ ์„œ๋ฒ ์ด์ž…๋‹ˆ๋‹ค. ์‹œ๊ฐ ๋ฐ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ ์œตํ•ฉ์˜ ๊ทผ๋ณธ์ ์ธ ๋„์ „ ๊ณผ์ œ, ๊ด€๋ จ ์ž‘์—… ๋ฐ ๋ฐ์ดํ„ฐ์…‹, ๊ทธ๋ฆฌ๊ณ  ๊ธฐ์กด ๋”ฅ๋Ÿฌ๋‹๋ถ€ํ„ฐ ์ตœ์‹  MLLM๊นŒ์ง€์˜ ๋ฐฉ๋ฒ•๋ก  ๋ฐœ์ „์„ ์ฒด๊ณ„์ ์œผ๋กœ ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค. ๋” ๋‚˜์•„๊ฐ€ ํ˜„์žฌ ๋ชจ๋ธ์˜ ํ•œ๊ณ„๋ฅผ ๋น„ํŒ์ ์œผ๋กœ ๊ฒ€ํ† ํ•˜๊ณ , ์ธ์ง€ ๋Šฅ๋ ฅ ํ–ฅ์ƒ์„ ์œ„ํ•œ ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
MLLM์€ ์ฐจํŠธ ์ดํ•ด ๋ถ„์•ผ์—์„œ ์ •๋ณด ์œตํ•ฉ์„ ํ˜์‹ ํ•˜๋ฉฐ, ๊ธฐ์กด ์—ฐ๊ตฌ๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ์ •๋ฆฌํ•˜๊ณ  ๋ฐœ์ „ ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์บ๋…ธ๋‹ˆ์ปฌ ๋ฐ ๋น„์บ๋…ธ๋‹ˆ์ปฌ ๋ฒค์น˜๋งˆํฌ์— ๋Œ€ํ•œ ์ƒˆ๋กœ์šด ๋ถ„๋ฅ˜ ์ฒด๊ณ„๋Š” ์ฐจํŠธ ์ดํ•ด ์—ฐ๊ตฌ์˜ ํ™•์žฅ์„ฑ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ, ํ–ฅํ›„ ์—ฐ๊ตฌ ์„ค๊ณ„๋ฅผ ์œ„ํ•œ ๊ฐ€์ด๋“œ๋ผ์ธ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
โ€ข
ํ˜„์žฌ MLLM์€ ์ง€๊ฐ ๋ฐ ์ถ”๋ก  ๋Šฅ๋ ฅ์— ์žˆ์–ด ํ•œ๊ณ„๋ฅผ ๋ณด์ด๋ฉฐ, ์ด๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•œ ๊ณ ๊ธ‰ ์ •๋ ฌ ๊ธฐ๋ฒ• ๋ฐ ๊ฐ•ํ™” ํ•™์Šต๊ณผ ๊ฐ™์€ ์ƒˆ๋กœ์šด ์ ‘๊ทผ ๋ฐฉ์‹์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘