Sign In

LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers

Created by
  • Haebom
Category
Empty

์ €์ž

Md Abtahi Majeed Chowdhury, Md Rifat Ur Rahman, Akil Ahmad Taki

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ Vision Transformer(ViT)์—์„œ ๊ณต๊ฐ„ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜๋Š” ํ•ต์‹ฌ ์š”์†Œ์ธ ์œ„์น˜ ์ž„๋ฒ ๋”ฉ(PE)์˜ ํŒจ์น˜ ์ˆœ์„œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด 'LOOPE'๋ผ๋Š” ํ•™์Šต ๊ฐ€๋Šฅํ•œ ํŒจ์น˜ ์ˆœ์„œ ์ตœ์ ํ™” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. LOOPE๋Š” ์ฃผ์–ด์ง„ ์ฃผํŒŒ์ˆ˜ ์ง‘ํ•ฉ์— ๋Œ€ํ•ด ์ตœ์ ์˜ ๊ณต๊ฐ„ ํ‘œํ˜„์„ ํ•™์Šตํ•˜์—ฌ, ํŒจ์น˜ ์ˆœ์„œ๊ฐ€ ์œ„์น˜ ์ž„๋ฒ ๋”ฉ์˜ ํšจ๊ณผ์— ๋ฏธ์น˜๋Š” ์ค‘์š”์„ฑ์„ ๊ทœ๋ช…ํ•ฉ๋‹ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, LOOPE๋Š” ๋‹ค์–‘ํ•œ ViT ์•„ํ‚คํ…์ฒ˜์—์„œ ๋ถ„๋ฅ˜ ์ •ํ™•๋„๋ฅผ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผฐ์œผ๋ฉฐ, ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํ‚น ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ํ†ตํ•ด PE์˜ ํšจ๊ณผ๋ฅผ ๋”์šฑ ๋ฏผ๊ฐํ•˜๊ฒŒ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
LOOPE๋Š” ViT์˜ ํŒจ์น˜ ์ˆœ์„œ ๊ฒฐ์ •์ด ์œ„์น˜ ์ž„๋ฒ ๋”ฉ์˜ ์„ฑ๋Šฅ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ์ •๋Ÿ‰์ ์œผ๋กœ ๋ณด์—ฌ์ฃผ๊ณ , ์ด๋ฅผ ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๋ฐฉ์‹์œผ๋กœ ์ตœ์ ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ๊ฐ€๋Šฅ์„ฑ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ "Three Cell Experiment"๋Š” ๊ธฐ์กด์˜ ์„ฑ๋Šฅ ์ธก์ • ๋ฐฉ์‹๋ณด๋‹ค ์œ„์น˜ ์ž„๋ฒ ๋”ฉ์˜ ํšจ๊ณผ๋ฅผ ํ›จ์”ฌ ๋” ๋ฏผ๊ฐํ•˜๊ณ  ์ •ํ™•ํ•˜๊ฒŒ ์ง„๋‹จํ•  ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํ‚น ๋„๊ตฌ๋กœ์„œ ํ™œ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
LOOPE์˜ ๋ณต์žก์„ฑ์ด๋‚˜ ๊ณ„์‚ฐ ๋น„์šฉ ์ฆ๊ฐ€ ์—ฌ๋ถ€, ๊ทธ๋ฆฌ๊ณ  ๋‹ค์–‘ํ•œ ๋‹ค์šด์ŠคํŠธ๋ฆผ ์ž‘์—…์—์„œ์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์— ๋Œ€ํ•œ ์ถ”๊ฐ€์ ์ธ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘