Sign In

It's Not a Lottery, It's a Race: Understanding How Gradient Descent Adapts the Network's Capacity to the Task

Created by
  • Haebom
Category
Empty

์ €์ž

Hannah Pinson

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ์‹ ๊ฒฝ๋ง ํ•™์Šต ๊ณผ์ •์—์„œ ์ด๋ก ์  ์šฉ๋Ÿ‰์ด ๊ณผ์ œ์— ์ ํ•ฉํ•œ ์œ ํšจ ์šฉ๋Ÿ‰์œผ๋กœ ์–ด๋–ป๊ฒŒ ๊ฐ์†Œํ•˜๋Š”์ง€, ํŠนํžˆ ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์ด ์ด๋ฅผ ์–ด๋–ป๊ฒŒ ๋‹ฌ์„ฑํ•˜๋Š”์ง€๋ฅผ ๊ฐœ๋ณ„ ๋‰ด๋Ÿฐ ์ˆ˜์ค€์—์„œ ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์ƒํ˜ธ ์ •๋ ฌ(mutual alignment), ์ž ๊ธˆ ํ•ด์ œ(unlocking), ๊ฒฝ์Ÿ(racing)์ด๋ผ๋Š” ์„ธ ๊ฐ€์ง€ ๋™์  ์›๋ฆฌ๋ฅผ ์ œ์‹œํ•˜๋ฉฐ, ์ด ์›๋ฆฌ๋“ค์ด ํ›ˆ๋ จ ํ›„ ๋‰ด๋Ÿฐ ๋ณ‘ํ•ฉ์ด๋‚˜ ์ €์ฐจ์› ๊ฐ€์ค‘์น˜ ๊ฐ€์ง€์น˜๊ธฐ๋ฅผ ํ†ตํ•ด ์šฉ๋Ÿ‰์„ ํšจ๊ณผ์ ์œผ๋กœ ์ค„์ด๋Š” ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ๋‚˜์•„๊ฐ€, ๋ณต๊ถŒ ํ‹ฐ์ผ“ ๊ฐ€์„ค(lottery ticket hypothesis)์˜ ๊ทผ๊ฐ„์ด ๋˜๋Š”, ์ผ๋ถ€ ๋‰ด๋Ÿฐ์ด ๋” ๋†’์€ ๊ฐ€์ค‘์น˜ ๊ทœ๋ฒ”(weight norm)์„ ์–ป๊ฒŒ ๋˜๋Š” ํŠน์ • ์œ ๋ฆฌํ•œ ์ดˆ๊ธฐ ์กฐ๊ฑด์˜ ์ด์œ ๋ฅผ ๋ฐํž™๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์ด ์‹ ๊ฒฝ๋ง์˜ ํ•™์Šต ๋™๋ ฅ์„ ํ†ตํ•ด ์ด๋ก ์  ์šฉ๋Ÿ‰์„ ์‹ค์ œ ๊ณผ์ œ์— ๋งž๊ฒŒ ๋™์ ์œผ๋กœ ์กฐ์ ˆํ•˜๋Š” ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ๊ฐœ๋ณ„ ๋‰ด๋Ÿฐ ์ˆ˜์ค€์—์„œ ๊ทœ๋ช…ํ–ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
์ƒํ˜ธ ์ •๋ ฌ, ์ž ๊ธˆ ํ•ด์ œ, ๊ฒฝ์Ÿ์ด๋ผ๋Š” ์„ธ ๊ฐ€์ง€ ๋™์  ์›๋ฆฌ๊ฐ€ ๋‰ด๋Ÿฐ ๋ณ‘ํ•ฉ ๋ฐ ๊ฐ€์ง€์น˜๊ธฐ๋ฅผ ํ†ตํ•œ ์‹ ๊ฒฝ๋ง ์šฉ๋Ÿ‰ ๊ฐ์†Œ์— ๊ธฐ์—ฌํ•จ์„ ์ด๋ก ์ ์œผ๋กœ ์ œ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
๋ณต๊ถŒ ํ‹ฐ์ผ“ ๊ฐ€์„ค์˜ ์ผ๋ถ€๋ฅผ ์„ค๋ช…ํ•˜๋ฉฐ, ํŠน์ • ์ดˆ๊ธฐ ์กฐ๊ฑด์ด ์œ ๋ฆฌํ•œ ๋‰ด๋Ÿฐ์˜ ๊ฐ€์ค‘์น˜ ๊ทœ๋ฒ” ์ฆ๊ฐ€ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์ด๋ก ์ ์œผ๋กœ ๋’ท๋ฐ›์นจํ•ฉ๋‹ˆ๋‹ค.
โ€ข
๋ณธ ์—ฐ๊ตฌ๋Š” ๋‹จ์ผ ์€๋‹‰์ธต ReLU ์‹ ๊ฒฝ๋ง์„ ๋Œ€์ƒ์œผ๋กœ ํ•˜์˜€์œผ๋ฏ€๋กœ, ๋” ๋ณต์žกํ•œ ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ๋‚˜ ํ™œ์„ฑํ™” ํ•จ์ˆ˜์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™”๋Š” ์ถ”๊ฐ€์ ์ธ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘