Sign In

Fair in Mind, Fair in Action? A Synchronous Benchmark for Understanding and Generation in UMLLMs

Created by
  • Haebom
Category
Empty

์ €์ž

Yiran Zhao, Lu Zhou, Xiaogang Xu, Zhe Liu, Jiafei Wu, Liming Fang

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ๋‹ค์–‘ํ•œ ๊ณต์ •์„ฑ ์ง€ํ‘œ๋“ค์ด ์กด์žฌํ•˜์ง€๋งŒ ์ฒ ํ•™์  ๊ฐ€์ •์˜ ์ถฉ๋Œ๋กœ ์ธํ•ด ํ†ตํ•ฉ์ ์ธ ํŒจ๋Ÿฌ๋‹ค์ž„ ๊ตฌ์ถ•์ด ์–ด๋ ค์šด UMLLM(Unified Multimodal Large Language Models)์˜ ๊ณต์ •์„ฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด UMLLM์˜ ์ดํ•ด ๋ฐ ์ƒ์„ฑ ์ž‘์—…์˜ ๊ณต์ •์„ฑ์„ ๋™์‹œ์— ํ‰๊ฐ€ํ•˜๋Š” ์ตœ์ดˆ์˜ ๋ฒค์น˜๋งˆํฌ์ธ IRIS๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. IRIS ๋ฒค์น˜๋งˆํฌ๋Š” 60๊ฐœ ์ด์ƒ์˜ ์„ธ๋ถ„ํ™”๋œ ์ง€ํ‘œ๋ฅผ '์ด์ƒ์  ๊ณต์ •์„ฑ', '์‹ค์„ธ๊ณ„ ์ถฉ์‹ค๋„', 'ํŽธํ–ฅ ๊ด€์„ฑ ๋ฐ ์ œ์–ด ๊ฐ€๋Šฅ์„ฑ'์˜ ์„ธ ๊ฐ€์ง€ ์ฐจ์›์œผ๋กœ ํ†ตํ•ฉํ•˜์—ฌ UMLLM์˜ ๊ณต์ •์„ฑ ๋Šฅ๋ ฅ์„ ์ง„๋‹จํ•˜๊ณ  ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐ ๋„์›€์„ ์ค๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
UMLLM์˜ ์ดํ•ด ๋ฐ ์ƒ์„ฑ ์ž‘์—…์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์‹œ์Šคํ…œ์  ํŽธํ–ฅ์„ ํฌ๊ด„์ ์œผ๋กœ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๋‹ค์–‘ํ•œ ๊ณต์ •์„ฑ ์ง€ํ‘œ๋ฅผ ํ†ตํ•ฉํ•˜๊ณ  ํ•ด์„ํ•  ์ˆ˜ ์žˆ๋Š” "๊ณต์ •์„ฑ ๊ณต๊ฐ„" ๊ฐœ๋…์„ ์ œ์‹œํ•˜์—ฌ ๊ณต์ •์„ฑ ์—ฐ๊ตฌ์˜ "๋ฐ”๋ฒจํƒ‘" ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐ ๊ธฐ์—ฌํ•ฉ๋‹ˆ๋‹ค.
โ€ข
"์ƒ์„ฑ ๊ฒฉ์ฐจ", "๊ฐœ์ธ๋ณ„ ๋ถˆ์ผ์น˜", "๋ฐ˜๊ณ ์ •๊ด€๋… ๋ณด์ƒ"๊ณผ ๊ฐ™์€ UMLLM์˜ ์ƒˆ๋กœ์šด ๊ณต์ •์„ฑ ํ˜„์ƒ์„ ๋ฐœ๊ฒฌํ•˜๊ณ  ์ง„๋‹จํ•  ์ˆ˜ ์žˆ๋Š” ๋„๊ตฌ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์•„์ง ๋ฐœ์ „ ์ดˆ๊ธฐ ๋‹จ๊ณ„์— ์žˆ๋Š” UMLLM์˜ ๊ณต์ •์„ฑ ๋ฌธ์ œ ํ•ด๊ฒฐ์„ ์œ„ํ•œ ์ง€์†์ ์ธ ์—ฐ๊ตฌ ๋ฐ ๊ฐœ์„ ์ด ํ•„์š”ํ•˜๋ฉฐ, ๋ฒค์น˜๋งˆํฌ์˜ ํ™•์žฅ์„ฑ๊ณผ ๋‹ค์–‘ํ•œ ์‹ค์ œ ์‹œ๋‚˜๋ฆฌ์˜ค ์ ์šฉ์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ์—ฐ๊ตฌ๊ฐ€ ์š”๊ตฌ๋ฉ๋‹ˆ๋‹ค.
๐Ÿ‘