Sign In

MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents

Created by
  • Haebom
Category
Empty

์ €์ž

Simon Rosen, Siddarth Singh, Ebenezer Gelo, Helen Sarah Robertson, Ibrahim Suder, Victoria Williams, Benjamin Rosman, Geraud Nangue Tasse, Steven James

๐Ÿ’ก ๊ฐœ์š”

AI ์•ˆ์ „, ์ฒ ํ•™, ์ธ์ง€ ๊ณผํ•™์˜ ๊ต์ฐจ์ ์—์„œ ์ธ๊ฐ„์˜ ๊ณ„์ธต์  ๋„๋• ๊ทœ๋ฒ” ์ถฉ๋Œ์„ ํ•ด๊ฒฐํ•˜๋Š” ์—์ด์ „ํŠธ์˜ ๋„๋•์  ์ •๋ ฌ ํ‰๊ฐ€๋Š” ์ค‘์š”ํ•œ ๊ณผ์ œ์ž…๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ๋„๋• ๊ทœ๋ฒ”์„ ์ˆœ์„œํ™”๋œ ์˜๋ฌด ์ œ์•ฝ์œผ๋กœ ํ‘œํ˜„ํ•˜๋Š” ์ƒˆ๋กœ์šด ํ˜•์‹์ฃผ์˜์ธ Morality Chains์™€ 98๊ฐœ์˜ ์œค๋ฆฌ์  ๋”œ๋ ˆ๋งˆ ๋ฌธ์ œ๋ฅผ ๋‹ด์€ ๋ฒค์น˜๋งˆํฌ์ธ MoralityGym์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”‘ ์‹œ์‚ฌ์  ๋ฐ ํ•œ๊ณ„

โ€ข
MoralityGym์€ ์‹ฌ๋ฆฌํ•™ ๋ฐ ์ฒ ํ•™์˜ ํ†ต์ฐฐ๋ ฅ์„ ๊ทœ๋ฒ” ๋ฏผ๊ฐ ์ถ”๋ก  ํ‰๊ฐ€์— ํ†ตํ•ฉํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
โ€ข
์•ˆ์ „ ๊ฐ•ํ™” ํ•™์Šต(Safe RL) ๋ฐฉ๋ฒ•๋ก ์˜ ์ดˆ๊ธฐ ํ‰๊ฐ€๋ฅผ ํ†ตํ•ด ์œค๋ฆฌ์  ์˜์‚ฌ๊ฒฐ์ •์— ๋Œ€ํ•œ ๋ณด๋‹ค ์›์น™์ ์ธ ์ ‘๊ทผ ๋ฐฉ์‹์˜ ํ•„์š”์„ฑ์ด ๊ฐ•์กฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
โ€ข
ํ˜„์žฌ ๋ฐฉ๋ฒ•๋ก ์€ ๋ณต์žกํ•œ ์‹ค์ œ ์ƒํ™ฉ์—์„œ AI ์—์ด์ „ํŠธ๊ฐ€ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๊ณ  ํˆฌ๋ช…ํ•˜๋ฉฐ ์œค๋ฆฌ์ ์œผ๋กœ ํ–‰๋™ํ•˜๋„๋ก ๊ฐœ๋ฐœํ•˜๋Š” ๋ฐ ์žˆ์–ด ๊ทผ๋ณธ์ ์ธ ํ•œ๊ณ„๋ฅผ ๋“œ๋Ÿฌ๋ƒˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ‘