Sign In

Objective Decoupling in Social Reinforcement Learning: Recovering Ground Truth from Sycophantic Majorities

Created by
  • Haebom
Category
Empty

์ €์ž

Majid Ghasemi, Mark Crowley

๐Ÿ’ก ๊ฐœ์š”

๋ณธ ๋…ผ๋ฌธ์€ ์ธ๊ฐ„ ํ”ผ๋“œ๋ฐฑ์— ์˜์กดํ•˜๋Š” ๊ธฐ์กด AI ์ •๋ ฌ ์ „๋žต์ด ์‚ฌํšŒ์  ํ™˜๊ฒฝ์—์„œ ๋ฐœ์ƒํ•˜๋Š” '๊ฐ๊ด€์  ๋ถ„๋ฆฌ(Objective Decoupling)' ๋ฌธ์ œ๋กœ ์ธํ•ด ์ž ์žฌ์ ์ธ ์ง„์‹ค๋œ ๋ชฉํ‘œ์—์„œ ์˜๊ตฌ์ ์œผ๋กœ ๋ฒ—์–ด๋‚  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์„ ์ง€์ ํ•ฉ๋‹ˆ๋‹ค. ์—ฐ๊ตฌ์ง„์€ ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ํ”ผ๋“œ๋ฐฑ ์‹ ํ˜ธ ์ž์ฒด๋ณด๋‹ค๋Š” ํ”ผ๋“œ๋ฐฑ ์ œ๊ณต์ž์˜ ์‹ ๋ขฐ์„ฑ์„ ํŒ๋‹จํ•˜๋Š” '์ธ์‹์  ์†Œ์Šค ์ •๋ ฌ(Epistemic Source Alignment, ESA)'์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ESA๋Š” ๋‹ค์ˆ˜์˜ ํ‰๊ฐ€์ž๊ฐ€ ํŽธํ–ฅ๋˜๊ฑฐ๋‚˜ ํ˜‘๋ ฅํ•˜๋”๋ผ๋„ ์ง„์‹ค๋œ ๋ชฉํ‘œ๋กœ์˜ ์ˆ˜๋ ด์„ ๋ณด์žฅํ•จ์„ ์ด๋ก ์ ์œผ๋กœ ์ฆ๋ช…ํ•˜๊ณ , ์‹คํ—˜์„ ํ†ตํ•ด ์ด๋ฅผ ์ž…์ฆํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
AI ์ •๋ ฌ์—์„œ ์ธ๊ฐ„ ํ”ผ๋“œ๋ฐฑ์˜ ์‹ ๋ขฐ์„ฑ์— ๋Œ€ํ•œ ๊ธฐ์กด์˜ '์ •์ ์ธ' ๊ฐ€์ •์€ ์‚ฌํšŒ์  ๋งฅ๋ฝ์—์„œ ์ทจ์•ฝํ•˜๋ฉฐ, '๊ฐ๊ด€์  ๋ถ„๋ฆฌ'๋ผ๋Š” ์ƒˆ๋กœ์šด ์‹คํŒจ ๋ชจ๋“œ๋ฅผ ์•ผ๊ธฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
์ œ์•ˆ๋œ ESA ๋ฐฉ๋ฒ•๋ก ์€ ๋‹ค์ˆ˜์˜ ์˜๊ฒฌ์— ์˜์กดํ•˜๋Š” ๋Œ€์‹  ํ”ผ๋“œ๋ฐฑ ์ œ๊ณต์ž ์ž์ฒด๋ฅผ ํ‰๊ฐ€ํ•จ์œผ๋กœ์จ, ํŽธํ–ฅ๋˜๊ฑฐ๋‚˜ ์•…์˜์ ์ธ ํ‰๊ฐ€์ž๊ฐ€ ๋‹ค์ˆ˜์ธ ํ™˜๊ฒฝ์—์„œ๋„ AI๊ฐ€ ์ง„์ •ํ•œ ๋ชฉํ‘œ๋ฅผ ํ•™์Šตํ•˜๋„๋ก ๋ณด์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€ข
๋ณธ ์—ฐ๊ตฌ๋Š” '๊ฐ๊ด€์  ๋ถ„๋ฆฌ' ๋ฌธ์ œ๋ฅผ AI ์ •๋ ฌ์˜ ๊ทผ๋ณธ์ ์ธ ๋„์ „ ๊ณผ์ œ๋กœ ์ œ์‹œํ•˜๋ฉฐ, ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ์ด๋ก ์  ํ‹€๊ณผ ์‹ค์ฆ์  ์ฆ๊ฑฐ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
โ€ข
ํ˜„์žฌ ์ œ์•ˆ๋œ ESA ๋ฐฉ๋ฒ•๋ก ์˜ ์‹ค์ œ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ ๋ฐ ํ™•์žฅ์„ฑ, ๊ทธ๋ฆฌ๊ณ  '์•ˆ์ „ ๊ณต๋ฆฌ'๋ฅผ ์–ด๋–ป๊ฒŒ ํšจ๊ณผ์ ์œผ๋กœ ์ •์˜ํ•˜๊ณ  ํ™œ์šฉํ• ์ง€์— ๋Œ€ํ•œ ์ถ”๊ฐ€์ ์ธ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ‘