Sign In

Architecture-agnostic Lipschitz-constant Bayesian header and its application to resolve semantically proximal classification errors with vision transformers

์ž‘์„ฑ์ž
  • Haebom
์นดํ…Œ๊ณ ๋ฆฌ
Empty

์ €์ž

Frederik Schafer, Luis Mandl, Lars Kalber, Tim Ricken

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ์ง€๋„ ํ•™์Šต ๋ชจ๋ธ์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ์ €ํ•ดํ•˜๋Š” ์ค‘์š”ํ•œ ๋ณ‘๋ชฉ ํ˜„์ƒ์ธ ๋ ˆ์ด๋ธ” ๋…ธ์ด์ฆˆ, ํŠนํžˆ ์˜๋ฏธ๋ก ์ ์œผ๋กœ ์œ ์‚ฌํ•œ ์˜ค๋ถ„๋ฅ˜ ์˜ค๋ฅ˜์— ํšจ๊ณผ์ ์œผ๋กœ ๋Œ€์‘ํ•˜๊ธฐ ์œ„ํ•œ ์•„ํ‚คํ…์ฒ˜ ๋…๋ฆฝ์ ์ธ Lipschitz ์ƒ์ˆ˜ ๋ฒ ์ด์ฆˆ ํ—ค๋”๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์€ Vision Transformer์™€ ๊ฐ™์€ ํŠน์ง• ์ถ”์ถœ๊ธฐ์— ํ†ตํ•ฉ๋˜์–ด LipB-ViT๋ฅผ ์ƒ์„ฑํ•˜๋ฉฐ, ๋ณ€๋ถ„ ๊ฐ€์ค‘์น˜์˜ ํ‰๊ท  ๋ฐ ๋กœ๊ทธ ๋ถ„์‚ฐ์— ๋Œ€ํ•œ ์ŠคํŽ™ํŠธ๋Ÿผ ์ •๊ทœํ™”๋ฅผ ์ ์šฉํ•˜์—ฌ ์˜ˆ์ธก ๋ถˆํ™•์‹ค์„ฑ์„ ๋ณด์ •ํ•˜๊ณ  ๋…ธ์ด์ฆˆ ์ฆํญ์„ ์™„ํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์˜๋ฏธ๋ก ์ ์œผ๋กœ ๊ทผ์ ‘ํ•œ ์˜ค๋ถ„๋ฅ˜ ๋ ˆ์ด๋ธ”์„ ํƒ์ง€ํ•˜๋Š” ๋ฐ ์žˆ์–ด ๊ธฐ์กด์˜ k-NN ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๋ณด๋‹ค 7% ์ด์ƒ ํ–ฅ์ƒ๋œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
์˜๋ฏธ๋ก ์  ์˜ค๋ฅ˜์— ๊ฐ•๊ฑดํ•œ ๋ ˆ์ด๋ธ” ๋…ธ์ด์ฆˆ ํƒ์ง€: ์ œ์•ˆ๋œ LipB-ViT์™€ ์ƒˆ๋กœ์šด ํ‰๊ฐ€ ์ง€ํ‘œ๋Š” ์˜๋ฏธ๋ก ์ ์œผ๋กœ ์œ ์‚ฌํ•œ ์˜ค๋ถ„๋ฅ˜๋กœ ์ธํ•œ ๋ ˆ์ด๋ธ” ๋…ธ์ด์ฆˆ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํƒ์ง€ํ•˜๊ณ , ๋ฐ์ดํ„ฐ์…‹ ํ’ˆ์งˆ ๋ฐ ๋ ˆ์ด๋ธ” ๋…ธ์ด์ฆˆ๋ฅผ ์ •๋Ÿ‰ํ™”ํ•˜๋Š” ๋ฐ ๊ธฐ์—ฌํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์•„ํ‚คํ…์ฒ˜ ๋…๋ฆฝ์ ์ธ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ: ๊ธฐ์กด ์‚ฌ์ „ ํ•™์Šต๋œ ํŠน์ง• ์ถ”์ถœ๊ธฐ์— ํ”Œ๋Ÿฌ๊ทธ ์•ค ํ”Œ๋ ˆ์ด ๋ฐฉ์‹์œผ๋กœ ์‰ฝ๊ฒŒ ํ†ตํ•ฉ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ผ๊ด€๋œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์„ค์ •์„ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ๋„๋ฉ”์ธ์— ์ ์šฉ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๊ณ„์‚ฐ ๋น„์šฉ ์ฆ๊ฐ€: ๋ชฌํ…Œ ์นด๋ฅผ๋กœ ์ƒ˜ํ”Œ๋ง์œผ๋กœ ์ธํ•ด ๊ณ„์‚ฐ ๋น„์šฉ์ด ์ฆ๊ฐ€ํ•˜๋Š” ์ ์€ ํ•œ๊ณ„์ ์œผ๋กœ ์ง€์ ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋Š” ํ–ฅํ›„ ์ตœ์ ํ™” ์—ฐ๊ตฌ์˜ ํ•„์š”์„ฑ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘