Sign In

OV-Stitcher: A Global Context-Aware Framework for Training-Free Open-Vocabulary Semantic Segmentation

Created by
  • Haebom
Category
Empty

์ €์ž

Seungjae Moon, Seunghyun Oh, Youngmin Ro

๐Ÿ’ก ๊ฐœ์š”

๊ธฐ์กด์˜ ํ›ˆ๋ จ ์—†๋Š” ๊ฐœ๋ฐฉํ˜• ์–ดํœ˜ ์˜๋ฏธ ๋ถ„ํ• (TF-OVSS)์€ ์ž…๋ ฅ ํ•ด์ƒ๋„ ์ œํ•œ์œผ๋กœ ์ธํ•ด ์Šฌ๋ผ์ด๋”ฉ ์œˆ๋„์šฐ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ์ „์—ญ์ ์ธ ๋งฅ๋ฝ์„ ๋†“์น˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” OV-Stitcher๋ผ๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•˜๋ฉฐ, ์ด๋Š” ๋ถ„ํ• ๋œ ์ด๋ฏธ์ง€ ํŠน์ง•์„ ๋งˆ์ง€๋ง‰ ์ธ์ฝ”๋” ๋ธ”๋ก ๋‚ด์—์„œ ์ง์ ‘ ์žฌ๊ตฌ์„ฑํ•˜์—ฌ ์ „์—ญ์ ์ธ ์ฃผ์˜(global attention)๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ผ๊ด€๋œ ๋งฅ๋ฝ ์ง‘๊ณ„์™€ ๊ณต๊ฐ„์ ์œผ๋กœ ์ผ๊ด€๋œ ๋ถ„ํ•  ์ง€๋„๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ์ด์ „ ๋ฐฉ๋ฒ• ๋Œ€๋น„ mIoU๋ฅผ ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
๊ธฐ์กด TF-OVSS ๋ฐฉ๋ฒ•๋ก ์˜ ๊ณ ์งˆ์ ์ธ ๋ฌธ์ œ์˜€๋˜ ์ „์—ญ์  ๋งฅ๋ฝ ๋ถ€์กฑ ๋ฌธ์ œ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค.
โ€ข
ํ›ˆ๋ จ ์—†์ด๋„ ๋ถ„ํ• ๋œ ํŠน์ง• ๊ฐ„์˜ ์ฃผ์˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์žฌ๊ตฌ์„ฑํ•˜์—ฌ ์˜๋ฏธ๋ก ์ ์œผ๋กœ ์ผ๊ด€๋œ ๋ถ„ํ•  ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
ํ‰๊ฐ€ ๋ฒค์น˜๋งˆํฌ์—์„œ ๊ธฐ์กด ํ›ˆ๋ จ ์—†๋Š” ๋ฒ ์ด์Šค๋ผ์ธ ๋Œ€๋น„ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์ž…์ฆํ•˜๋ฉฐ ํ™•์žฅ ๊ฐ€๋Šฅํ•˜๊ณ  ํšจ๊ณผ์ ์ธ ์†”๋ฃจ์…˜์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๋ณธ ์—ฐ๊ตฌ๋Š” ํ›ˆ๋ จ ์—†๋Š” ๋ฐฉ์‹์— ์ดˆ์ ์„ ๋งž์ถ”์—ˆ์œผ๋ฏ€๋กœ, ๋ฏธ์„ธ ์กฐ์ •(fine-tuning)์„ ํ†ตํ•ด ๋” ๋†’์€ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•  ๊ฐ€๋Šฅ์„ฑ์€ ํƒ๊ตฌ๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.
๐Ÿ‘